Back to datasets
Dataset assetOpen Source CommunityFace RecognitionImage Dataset
IMDb-Face, Megaface
The IMDb‑Face dataset is used for face recognition and contains facial images gathered from IMDb. The Megaface dataset is a large‑scale face recognition benchmark comprising multiple subsets for various recognition tasks.
Source
github
Created
Nov 1, 2018
Updated
Jan 4, 2024
Signals
276 views
Availability
Linked source ready
Overview
Dataset description and usage context
IMDb‑Face Dataset
- Location: https://github.com/fwang91/IMDb-Face (IMDb-Face.csv)
- Run Command:
- Download the IMDb-Face.csv file.
- Execute
python imdb_crawl.py, supporting multi‑process handling. - Parameters:
-c: whether to crop images.-d: whether to delete the existing data directory.
- If saving uncropped images, the corresponding bounding boxes will also be recorded in a
bb.txtfile.
Megaface Dataset
-
Download:
- Distractor and probe datasets.
- Access link: http://megaface.cs.washington.edu/participate/challenge.html
-
Structure:
MEGAFACE -- distractors -- parent id -- ids -- images | |- json file for each image | |- facescrub -- ids -- images, bb.txt |- bb.txt
-
Pre‑processing:
- Apply face detection/alignment models.
-
Generate bin Files:
- Use
gen_megaface.pywith a trained face‑recognition model to generate bin files for distractor/facescrub images. - Parameters include paths for distractor and facescrub images, noise lists, output directories, checkpoint model, and file endings.
- Use
-
Run Megaface Devkit:
- Execute
python run_experiment.pyin terminal (requires at least 32 GB RAM). - Devkit must be downloaded from http://megaface.cs.washington.edu/participate/challenge.html.
- Parameters include paths to distractor and probe feature files, file ending, and size settings.
- Execute
-
Notes:
- Binary files (
bin/Identification,bin/FuseResults) can only be run on OpenCV 2.4.
- Binary files (
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.