Dataset assetOpen Source CommunityNatural Language ProcessingEntity Recognition

tner/mit_movie_trivia

The MIT Movie NER dataset is part of the T‑NER project and is specifically designed for named entity recognition tasks in the movie domain. It includes 12 entity types such as Actor, Plot, Opinion, Award, Year, Genre, Origin, Director, Soundtrack, Relationship, Character_Name, and Quote. The dataset is split into training (6,816 instances), validation (1,000 instances), and test (1,953 instances).

Source

hugging_face

Created

Nov 28, 2025

Updated

Jul 18, 2022

Signals

104 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Basic Information

Name: MIT Movie
Domain: Movies
Number of Entity Types: 12
Language: English
License: Other
Multilinguality: Monolingual
Size: 1K < n < 10K
Task Category: Token Classification
Task ID: Named Entity Recognition

Dataset Structure

Data Instances

Example:

{
    "tags": [0, 13, 14, 0, 0, 0, 3, 4, 4, 4, 4, 4, 4, 4, 4],
    "tokens": ["a", "steven", "spielberg", "film", "featuring", "a", "bluff", "called", "devil", "s", "tower", "and", "a", "spectacular", "mothership"]
}

Label IDs

Label Mapping: see here

Data Splits

Name	Train	Validation	Test
mit_movie_trivia	6816	1000	1953

Entity Types

Actor, Plot, Opinion, Award, Year, Genre, Origin, Director, Soundtrack, Relationship, Character_Name, Quote

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio