Back to datasets
Dataset assetOpen Source CommunityFace RecognitionImage Dataset

IMDb-Face, Megaface

The IMDb‑Face dataset is used for face recognition and contains facial images gathered from IMDb. The Megaface dataset is a large‑scale face recognition benchmark comprising multiple subsets for various recognition tasks.

Source
github
Created
Nov 1, 2018
Updated
Jan 4, 2024
Signals
276 views
Availability
Linked source ready
Overview

Dataset description and usage context

IMDb‑Face Dataset

  • Location: https://github.com/fwang91/IMDb-Face (IMDb-Face.csv)
  • Run Command:
    • Download the IMDb-Face.csv file.
    • Execute python imdb_crawl.py, supporting multi‑process handling.
    • Parameters:
      • -c: whether to crop images.
      • -d: whether to delete the existing data directory.
    • If saving uncropped images, the corresponding bounding boxes will also be recorded in a bb.txt file.

Megaface Dataset

  • Download:

  • Structure:

    MEGAFACE -- distractors -- parent id -- ids -- images | |- json file for each image | |- facescrub -- ids -- images, bb.txt |- bb.txt

  • Pre‑processing:

    • Apply face detection/alignment models.
  • Generate bin Files:

    • Use gen_megaface.py with a trained face‑recognition model to generate bin files for distractor/facescrub images.
    • Parameters include paths for distractor and facescrub images, noise lists, output directories, checkpoint model, and file endings.
  • Run Megaface Devkit:

  • Notes:

    • Binary files (bin/Identification, bin/FuseResults) can only be run on OpenCV 2.4.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio