Explore high-quality datasets for your AI and machine learning projects.
The CoNLL04 dataset is a benchmark for relation extraction tasks, containing 1,437 sentences, each with at least one relation. Sentences are annotated with entities (e.g., `Peop`, `Loc`, `Org`, `Other`) and relation types (e.g., `Located_In`, `Work_For`, `OrgBased_In`, `Live_In`, `Kill`). The dataset is in English and formatted as JSONL.
Re‑DocRED‑CF is a counterfactual dataset for document‑level relation extraction, generated by entity replacement. It contains five counterfactual variants, each with training, development, and test splits, plus a mixed training set. Each example includes document title, relation labels, entity vertex sets, tokenized sentences, and the original document ID indicating its index in the seed dataset.