DATASET
Open Source Community
tner/mit_movie_trivia
The MIT Movie NER dataset is part of the T‑NER project and is specifically designed for named entity recognition tasks in the movie domain. It includes 12 entity types such as Actor, Plot, Opinion, Award, Year, Genre, Origin, Director, Soundtrack, Relationship, Character_Name, and Quote. The dataset is split into training (6,816 instances), validation (1,000 instances), and test (1,953 instances).
Updated 7/18/2022
hugging_face
Description
Dataset Overview
Basic Information
- Name: MIT Movie
- Domain: Movies
- Number of Entity Types: 12
- Language: English
- License: Other
- Multilinguality: Monolingual
- Size: 1K < n < 10K
- Task Category: Token Classification
- Task ID: Named Entity Recognition
Dataset Structure
Data Instances
- Example:
{ "tags": [0, 13, 14, 0, 0, 0, 3, 4, 4, 4, 4, 4, 4, 4, 4], "tokens": ["a", "steven", "spielberg", "film", "featuring", "a", "bluff", "called", "devil", "s", "tower", "and", "a", "spectacular", "mothership"] }
Label IDs
- Label Mapping: see here
Data Splits
| Name | Train | Validation | Test |
|---|---|---|---|
| mit_movie_trivia | 6816 | 1000 | 1953 |
Entity Types
Actor,Plot,Opinion,Award,Year,Genre,Origin,Director,Soundtrack,Relationship,Character_Name,Quote
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Natural Language Processing
Entity Recognition
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.