Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingEntity Recognition
tner/mit_movie_trivia
The MIT Movie NER dataset is part of the T‑NER project and is specifically designed for named entity recognition tasks in the movie domain. It includes 12 entity types such as Actor, Plot, Opinion, Award, Year, Genre, Origin, Director, Soundtrack, Relationship, Character_Name, and Quote. The dataset is split into training (6,816 instances), validation (1,000 instances), and test (1,953 instances).
Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 18, 2022
Signals
104 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Basic Information
- Name: MIT Movie
- Domain: Movies
- Number of Entity Types: 12
- Language: English
- License: Other
- Multilinguality: Monolingual
- Size: 1K < n < 10K
- Task Category: Token Classification
- Task ID: Named Entity Recognition
Dataset Structure
Data Instances
- Example:
{ "tags": [0, 13, 14, 0, 0, 0, 3, 4, 4, 4, 4, 4, 4, 4, 4], "tokens": ["a", "steven", "spielberg", "film", "featuring", "a", "bluff", "called", "devil", "s", "tower", "and", "a", "spectacular", "mothership"] }
Label IDs
- Label Mapping: see here
Data Splits
| Name | Train | Validation | Test |
|---|---|---|---|
| mit_movie_trivia | 6816 | 1000 | 1953 |
Entity Types
Actor,Plot,Opinion,Award,Year,Genre,Origin,Director,Soundtrack,Relationship,Character_Name,Quote
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.