Back to datasets
Dataset assetOpen Source CommunityMovie DataIMDB

imdb-5000-movie-dataset

This dataset contains 5,000 randomly selected movie records from IMDB, with 28 attributes for each record.

Source
github
Created
Dec 31, 2016
Updated
Jun 23, 2023
Signals
842 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: imdb-5000-movie-dataset
  • Source: Kaggle

Dataset Content

  • Record Count: Over 5,000
  • Attribute Count: 28
  • File Format: CSV
  • File Name: movie_metadata.csv

Data Processing

  • Cleaning: The dataset was cleaned for analysis and visualization purposes.
  • Analysis:
    • linechart.py: Cleaned and analyzed director_name, genres, title_year, imdb_score, counting the number of movies released between 1916 and 2016.
    • histogram.py: Cleaned and analyzed title_year, num_critic_for_reviews, num_user_for_reviews, director_facebook_likes, counting review frequencies and director Facebook likes.

Visualization

  • Tool: matplotlib.pyplot
  • Output Files:
    • linechart.py:
      • linechart.png
      • linechart1.png
      • linechart2.png
      • linechart3.png
      • linechart4.png
    • histogram.py:
      • histogram.png
      • histogram1.png
      • histogram2.png
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio