leondz/wnut_17
The WNUT 17 dataset is a named entity recognition (NER) dataset focusing on identifying novel and rare entities in noisy text. It includes training (3,394 samples), validation (1,009 samples), and test (1,287 samples) sets. Each sample contains an ID, token list, and IOB2‑formatted NER labels covering entities such as companies, creative works, groups, locations, persons, and products. The dataset was created to provide definitions for emerging and rare entities and to support detection of such entities.
Description
Dataset Overview
Dataset Name
- Name: WNUT 17
- Alias: wnut_17
Dataset Description
- Task: Emerging and rare entity recognition
- Language: English (en)
- License: CC‑BY‑4.0
- Source: Original data
- Data Type: Monolingual
- Scale: 1K < n < 10K
- Task Category: Token Classification
- Task ID: Named Entity Recognition
Dataset Structure
- Features:
id: string, example identifiertokens: list of strings, example text tokensner_tags: list of labels, IOB2‑formatted NER tags
- Splits:
train: 3,394 examplesvalidation: 1,009 examplestest: 1,287 examples
Annotation
- Annotators: Crowd‑sourced
- Language Creators: Discovery
Usage Notes
-
Citation:
@inproceedings{derczynski-etal-2017-results, title = "Results of the {WNUT}2017 Shared Task on Novel and Emerging Entity Recognition", author = "Derczynski, Leon and Nichols, Eric and van Erp, Marieke and Limsopatham, Nut", booktitle = "Proceedings of the 3rd Workshop on Noisy User‑generated Text", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W17-4418", doi = "10.18653/v1/W17-4418", pages = "140--147", abstract = "This shared task focuses on identifying unusual, previously‑unseen entities in the context of emerging discussions. ..." }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.