Explore high-quality datasets for your AI and machine learning projects.
The DWIE (Deutsche Welle Information Extraction) corpus is a new dataset designed for document‑level multi‑task information extraction. It combines four main IE subtasks: named entity recognition, coreference resolution, relation extraction, and entity linking. The dataset includes detailed entity and relation information, linked to Wikipedia, and is suitable for feature extraction and text classification tasks on English text.