Explore high-quality datasets for your AI and machine learning projects.
OmniGen, proposed by Beijing Zhiyuan Institute, is a novel diffusion model for unified image generation. The X2I dataset was built to train this model and is the first large‑scale unified image‑generation dataset, consolidating diverse tasks into a single format. It comprises roughly 100 million images covering tasks such as text‑to‑image, multimodal‑to‑image, theme‑driven generation, and computer‑vision tasks. By unifying the format, the dataset enables a single model to handle multiple image‑generation tasks, improving generalisation and multi‑task performance.