DATASET

Open Source Community

Danbooru2018 Anime Character Recognition Dataset

This dataset is based on the Danbooru2018 dataset for anime character recognition, containing 1 million images and 70,000 characters. The dataset has been processed to generate 1 million head images and their corresponding character labels. The character label distribution follows a long‑tail, with an average of 13.85 images per label.

Updated 5/18/2024

github

Description

Danbooru 2018 Anime Character Recognition Dataset Overview

Dataset Description

Dataset Name: Danbooru 2018 Anime Character Recognition Dataset
Dataset Source: Based on the Danbooru 2018 dataset.
Dataset Content: Contains 1,000,000 head images and their corresponding 70,000 character labels.
Dataset Purpose: Used for training and evaluating anime character recognition algorithms.

Data Processing Method

Label Filtering: Keep only character category labels.
Image Filtering: Retain images that contain only a single character label.
Head Detection: Use a specific model to extract head bounding boxes.
Image Deduplication: Remove images with multiple detected head bounding boxes.
Final Data Volume: 0.97M images, 70k labels.

Data Distribution and Visualization

Label‑Image Count Distribution: Visualized, showing only the top 100 labels.
Top 20 Popular Labels: Include hatsune_miku, hakurei_reimu, etc.
Distribution Characteristics: Long‑tail distribution, average 13.85 images per label.

Dataset Usage

Core Data File: faces.tsv, containing filename, label ID, and head detection results.
Label Text File: tagIds.tsv, providing text for each label ID.
Face Image Download: Pre‑processed face image archive can be downloaded via rsync.

Citation Information

Dataset Author: Yan Wang
Release Date: July 2019
Citation Format: Please refer to the README for the BibTeX format.

Baseline Model

Model Description: ResNet18 combined with ArcFace loss, achieving 37.3% accuracy.
Data Split: Training, validation, and test split files are provided.

Open Issues

Test Set Validation: Test set requires manual verification.
Face Alignment: Further optimization of face alignment is needed.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Anime Character Recognition

Image Processing

Source

Organization: github

Created: 7/2/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →