Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingEntity Recognition

tner/mit_movie_trivia

The MIT Movie NER dataset is part of the T‑NER project and is specifically designed for named entity recognition tasks in the movie domain. It includes 12 entity types such as Actor, Plot, Opinion, Award, Year, Genre, Origin, Director, Soundtrack, Relationship, Character_Name, and Quote. The dataset is split into training (6,816 instances), validation (1,000 instances), and test (1,953 instances).

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 18, 2022
Signals
104 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • Name: MIT Movie
  • Domain: Movies
  • Number of Entity Types: 12
  • Language: English
  • License: Other
  • Multilinguality: Monolingual
  • Size: 1K < n < 10K
  • Task Category: Token Classification
  • Task ID: Named Entity Recognition

Dataset Structure

Data Instances

  • Example:
    {
        "tags": [0, 13, 14, 0, 0, 0, 3, 4, 4, 4, 4, 4, 4, 4, 4],
        "tokens": ["a", "steven", "spielberg", "film", "featuring", "a", "bluff", "called", "devil", "s", "tower", "and", "a", "spectacular", "mothership"]
    }
    

Label IDs

  • Label Mapping: see here

Data Splits

NameTrainValidationTest
mit_movie_trivia681610001953

Entity Types

  • Actor, Plot, Opinion, Award, Year, Genre, Origin, Director, Soundtrack, Relationship, Character_Name, Quote
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio