Back to datasets
Dataset assetOpen Source CommunityComputer VisionImage Restoration

Multi-Mask Inpainting Dataset

The dataset is intended for multi‑mask image inpainting tasks. It contains images downloaded from the WikiArt API together with globally and object‑level annotations generated by the Kosmos‑2 and LLaVA models. Creation involved image download, mask generation, and construction of an entity dataset.

Source
github
Created
Dec 2, 2024
Updated
Dec 2, 2024
Signals
197 views
Availability
Linked source ready
Overview

Dataset description and usage context

I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text‑Guided Multi‑Mask Inpainting (WACV 2025)

Dataset Overview

Dataset Preparation

Image Download

  • Use the WikiArt API to download images with the following command:
    python -m inpainting.data.downloader download-and-save-images-wikiart-v2 -o data/mm_inp_dataset/images
    
  • After completion, the image count should be 116,475.

Dataset Construction

  • The dataset includes global image annotations and object‑level annotations.
  • Construction steps:
    1. Generate masks from annotations (≈10 min).
    2. Build the entity dataset (≈10 min).
    3. Extract noun‑phrase roots using SpaCy (≈2 min).
    4. Generate mask descriptions with LLaVA‑1.6‑Vicuna‑13B (optional, time‑consuming).
    5. Move LLaVA annotations to the entity directory (≈5 s).
    6. Clean and save LLaVA annotations (≈10 s).
    7. Split the dataset (skip if already split).

Dataset Structure

  • Each image is associated with multiple masks; each mask corresponds to an object crop and a LLaVA‑generated object‑level description.

Model Training and Testing

Model Download

  • Retrieve model weights from Google Drive:
    • LLaVA‑MultiMask: Extract and place in models/llava.
    • SD‑2‑Inp‑RCA‑FineTuned: Extract and place in models/sd.

Experimental Results

  • Commands are provided for training and testing various models, including LLaVA‑Prompt, LLaVA‑1Mask, LLaVA‑MultiMask, etc.
  • Multi‑mask inpainting results include metrics such as FID, LPIPS, PSNR, CLIP‑IQA, CLIPSim‑I2I, and CLIPSim‑T2I.

Citation

  • If you use this dataset, please cite the associated paper:
    @inproceedings{fanelli2025idream,
      title     = {I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text‑Guided Multi‑Mask Inpainting},
      author    = {Nicola Fanelli and Gennaro Vessio and Giovanna Castellano},
      year      = {2025},
      booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision}
    }
    
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio