JUHE API Marketplace
DATASET
Open Source Community

Danbooru2018 Anime Character Recognition Dataset

This dataset is based on the Danbooru2018 dataset for anime character recognition, containing 1 million images and 70,000 characters. The dataset has been processed to generate 1 million head images and their corresponding character labels. The character label distribution follows a long‑tail, with an average of 13.85 images per label.

Updated 5/18/2024
github

Description

Danbooru 2018 Anime Character Recognition Dataset Overview

Dataset Description

  • Dataset Name: Danbooru 2018 Anime Character Recognition Dataset
  • Dataset Source: Based on the Danbooru 2018 dataset.
  • Dataset Content: Contains 1,000,000 head images and their corresponding 70,000 character labels.
  • Dataset Purpose: Used for training and evaluating anime character recognition algorithms.

Data Processing Method

  • Label Filtering: Keep only character category labels.
  • Image Filtering: Retain images that contain only a single character label.
  • Head Detection: Use a specific model to extract head bounding boxes.
  • Image Deduplication: Remove images with multiple detected head bounding boxes.
  • Final Data Volume: 0.97M images, 70k labels.

Data Distribution and Visualization

  • Label‑Image Count Distribution: Visualized, showing only the top 100 labels.
  • Top 20 Popular Labels: Include hatsune_miku, hakurei_reimu, etc.
  • Distribution Characteristics: Long‑tail distribution, average 13.85 images per label.

Dataset Usage

  • Core Data File: faces.tsv, containing filename, label ID, and head detection results.
  • Label Text File: tagIds.tsv, providing text for each label ID.
  • Face Image Download: Pre‑processed face image archive can be downloaded via rsync.

Citation Information

  • Dataset Author: Yan Wang
  • Release Date: July 2019
  • Citation Format: Please refer to the README for the BibTeX format.

Baseline Model

  • Model Description: ResNet18 combined with ArcFace loss, achieving 37.3% accuracy.
  • Data Split: Training, validation, and test split files are provided.

Open Issues

  • Test Set Validation: Test set requires manual verification.
  • Face Alignment: Further optimization of face alignment is needed.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Anime Character Recognition
Image Processing

Source

Organization: github

Created: 7/2/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.