JUHE API Marketplace
DATASET
Open Source Community

CoMix

CoMix is a comics dataset framework for comic understanding, comprising multiple comic datasets such as DCM, comics, eBDtheque, and PopManga. The framework allows users to employ validation set annotations and download images from the original sources without violating licenses.

Updated 11/9/2024
github

Description

CoMix: Comics Dataset Framework for Comics Understanding

Introduction

This project aims to reproduce (on the validation set) the following benchmarks:

The main limitation is the inability to share images. To address this, we created this framework, allowing the use of our (validation) annotations and downloading images from the original sources without violating licenses.

comix uses the following datasets:

  • DCM
  • comics
  • eBDtheque
  • PopManga
  • Manga109

Installation

The project is written in Python 3.8. Create a conda environment:

conda create --name myenv python=3.8
conda activate myenv

Install dependencies:

pip install -e .

Workflow

The project is divided into the following steps:

  • Manually obtain and place images and annotations into the correct folder (e.g., data/)
  • Process images into a unified naming and folder structure – comix/process
  • Model performance (using pretrained or custom models) – benchmarks
  • Evaluate model performance against the provided Ground Truth – comix/evaluators

Model Performance and Evaluation

In the benchmarks folder, multiple scripts are provided for benchmarking the dataset on various tasks. Detection scripts generate COCO‑format JSON files, which can be evaluated with the comix/evaluators/detection.py script. Captioning scripts generate multiple .txt files, which can be post‑processed into captions.csv and objects.csv and evaluated with the comix/evaluators/captioning.py script.

Documentation

Documentation is located in the /docs folder.

Main documents:

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Comic Understanding
Dataset Framework

Source

Organization: github

Created: 11/9/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.