CEC-Corpus
The Chinese Emergency Corpus is constructed by Shanghai University (Semantic Intelligence Lab) and includes news reports of five types of emergencies: earthquakes, fires, traffic accidents, terrorist attacks, and food poisoning. The dataset undergoes text preprocessing, analysis, annotation, etc., using XML as the annotation format, containing six tags: Event, Denoter, Time, Location, Participant, and Object, to comprehensively describe events and their elements.
Dataset description and usage context
Chinese Emergency Corpus (CEC) Overview
Dataset Construction
- Construction Institution: Semantic Intelligence Lab, Shanghai University
- Data Source: Internet news reports
- Event Categories: Earthquake, fire, traffic accident, terrorist attack, food poisoning (5 categories)
- Number of Texts: 332 articles
Data Processing
- Preprocessing Steps: Text preprocessing, text analysis, event annotation, consistency checking
- Annotation Format: XML
- Primary Data Structures: Event, Denoter, Time, Location, Participant, Object
- Attribute Definitions: Define relevant attributes for each tag
Research and Development Funding
- Funding Projects: National Natural Science Foundation projects “Key Issues in Event Reasoning based on Description Logic” (Grant No. 61305053) and “Event Ontology Model and Application Technology” (Grant No. 60975033)
Research Outcomes
-
Research Papers: Multiple papers published in Journal of Chinese Information Processing, Pattern Recognition and Artificial Intelligence, etc.
-
Doctoral Dissertations: Including studies on event‑oriented knowledge processing and event‑oriented text representation.
-
Master’s Theses: Covering intentional event research, extraction and reasoning of temporal event elements, etc.
Corpus Characteristics
- Scale: Slightly smaller than ACE and TimeBank corpora
- Annotation Completeness: Provides the most comprehensive annotation of events and event elements.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.