Explore high-quality datasets for your AI and machine learning projects.
The dataset is the latest supplemental data from e621.net, containing the newest images and videos. Data formats include gif, jpg, png, swf, and webm. Due to content reasons, this dataset is not suitable for all audiences. It consists of 346,765 records, with IDs ranging from 117,744 to 5,083,602, last updated on 2024-10-01. The dataset primarily uses English and Japanese, and is applicable to image classification, zero‑shot image classification, and text‑to‑image generation tasks. Content involves art and anime, accompanied by detailed tags describing the images and videos.
Aiming to promote Chinese AI character creation, this project continuously collects typical anime character dialogue data (ACCD) and stores it in a public repository, which can be used for AI character training or literary creation learning.
This is an integrated anime database combining data from subsplease, MyAnimeList, and Nyaa.si. Users can discover the most popular anime and those with reliable torrent magnet links. The database updates daily and includes 770 anime titles and a total of 11,137 episodes, each with detailed information such as ID, title, type, episode count, status, rating, Nyaa search link, magnet links, seed count, download count, and last update time.
japanese‑anime‑speech‑v2 is an audio‑text dataset designed to train automatic speech recognition models. It contains 300,506 audio clips and their transcriptions sourced from visual novels. The goal is to improve ASR models (e.g., OpenAI's Whisper) for transcribing anime and similar Japanese media dialogue. Audio is in MP3 format, sampled at 16 kHz, with an average length of 5.5 seconds. This is the first release of the japanese‑anime‑speech‑v2 series; compared with the previous version, audio quality has been adjusted and NSFW content is not filtered. The dataset is predominantly female voices, with vocabularies around love, relationships, and fantasy, which may not fully reflect real‑world speech patterns. Future plans include separating safe and NSFW content, improving text formatting, and expanding data sources.