Explore high-quality datasets for your AI and machine learning projects.
The Pixiv Niji Journey dataset comprises 9,766 images and associated metadata collected from the online art platform Pixiv. It is provided in raw and preprocessed versions. The raw version contains the original data as scraped from Pixiv. The preprocessed version includes additional processing steps: conversion of images from RGB to RGBA, annotation with the BLIP tool, Danbooru tags generated by the wd‑v1‑4‑vit‑tagger, and thorough cleaning to remove low‑quality or irrelevant images. Images are in JPG and PNG formats; metadata is supplied in JSON, with preprocessed metadata also available as .txt and .caption files. The dataset is primarily intended for image classification and generation tasks, though users should be aware of potential biases originating from Pixiv content and the specific search terms used.