Back to datasets
Dataset assetOpen Source CommunityCorpusEmergency Events

CEC-Corpus

The Chinese Emergency Corpus is constructed by Shanghai University (Semantic Intelligence Lab) and includes news reports of five types of emergencies: earthquakes, fires, traffic accidents, terrorist attacks, and food poisoning. The dataset undergoes text preprocessing, analysis, annotation, etc., using XML as the annotation format, containing six tags: Event, Denoter, Time, Location, Participant, and Object, to comprehensively describe events and their elements.

Source
github
Created
Jan 22, 2015
Updated
May 24, 2024
Signals
570 views
Availability
Linked source ready
Overview

Dataset description and usage context

Chinese Emergency Corpus (CEC) Overview

Dataset Construction

  • Construction Institution: Semantic Intelligence Lab, Shanghai University
  • Data Source: Internet news reports
  • Event Categories: Earthquake, fire, traffic accident, terrorist attack, food poisoning (5 categories)
  • Number of Texts: 332 articles

Data Processing

  • Preprocessing Steps: Text preprocessing, text analysis, event annotation, consistency checking
  • Annotation Format: XML
  • Primary Data Structures: Event, Denoter, Time, Location, Participant, Object
  • Attribute Definitions: Define relevant attributes for each tag

Research and Development Funding

  • Funding Projects: National Natural Science Foundation projects “Key Issues in Event Reasoning based on Description Logic” (Grant No. 61305053) and “Event Ontology Model and Application Technology” (Grant No. 60975033)

Research Outcomes

  • Research Papers: Multiple papers published in Journal of Chinese Information Processing, Pattern Recognition and Artificial Intelligence, etc.

  • Doctoral Dissertations: Including studies on event‑oriented knowledge processing and event‑oriented text representation.

  • Master’s Theses: Covering intentional event research, extraction and reasoning of temporal event elements, etc.

Corpus Characteristics

  • Scale: Slightly smaller than ACE and TimeBank corpora
  • Annotation Completeness: Provides the most comprehensive annotation of events and event elements.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio