High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

Multi30k Dataset

The Multi30k dataset is a multilingual English‑German image description dataset, containing training, validation, and test sets, and supporting multiple languages such as English, German, French, and Czech. The dataset provides detailed statistics such as the number of sentences, word count, and average words per sentence. Additionally, it offers download links for visual features and original images.

github

View Details