JUHE API Marketplace
DATASET
Open Source Community

Linaqruf/pixiv-niji-journey

The Pixiv Niji Journey dataset comprises 9,766 images and associated metadata collected from the online art platform Pixiv. It is provided in raw and preprocessed versions. The raw version contains the original data as scraped from Pixiv. The preprocessed version includes additional processing steps: conversion of images from RGB to RGBA, annotation with the BLIP tool, Danbooru tags generated by the wd‑v1‑4‑vit‑tagger, and thorough cleaning to remove low‑quality or irrelevant images. Images are in JPG and PNG formats; metadata is supplied in JSON, with preprocessed metadata also available as .txt and .caption files. The dataset is primarily intended for image classification and generation tasks, though users should be aware of potential biases originating from Pixiv content and the specific search terms used.

Updated 1/10/2023
hugging_face

Description

Dataset Overview

Dataset Name

Pixiv Niji Journey

Dataset Description

The Pixiv Niji Journey dataset contains 9,766 images and their metadata, which were collected from the online art platform Pixiv using the gallery-dl Python package with the search term "nijijourney" between 2022‑11‑06 and 2022‑12‑27.

Dataset Variants

  • raw: Original dataset directly scraped from Pixiv.
  • preprocessed: Preprocessed dataset that includes conversion of images from RGB to RGBA, annotation with the BLIP tool, and Danbooru tags generated by the wd‑v1‑4‑vit‑tagger. Additionally, careful cleaning and filtering were performed to discard low‑quality or unrelated images.

File Formats

  • Image formats: JPG and PNG
  • Metadata format: JSON; preprocessed metadata formats: .txt and .caption

Dataset Structure

  • raw: nijijourney_pixiv_2022110620221222_raw.zip containing an nijijourney/ directory with images and JSON metadata files.
  • preprocessed: nijijourney_pixiv_2022110620221222_preprocessed.zip containing a dataset/ directory with images, JSON metadata, .txt and .caption files, as well as several metadata files such as meta_cap.json, meta_dd.json, meta_clean.json.

Intended Uses

Mainly for machine‑learning tasks such as image classification and caption generation, and also suitable for image generation models like Stable Diffusion.

Dataset Limitations

  • Platform bias: The dataset may reflect biases of the Pixiv platform and its contributors.
  • Search term bias: Using the specific search term "nijijourney" may introduce bias.
  • Limited scope: The dataset only includes images scraped from Pixiv and may not represent a broader range of images or artistic styles.
  • Metadata errors: Metadata may contain inaccuracies or inconsistencies.

License

The dataset is released under the AGPL‑3.0 license, allowing free use, modification, and distribution, provided that any derivative works are also released under the same AGPL‑3.0 license.

Citation

@misc{pixiv_niji_journey, author = {Linaqruf}, title = {Pixiv Niji Journey}, year = {2022}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/Linaqruf/pixiv-niji-journey}, }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Image Analysis
Artworks

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.