JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

pubmed-en-quality-annotations-7

Document Classification
Quality Assessment

This dataset includes several features such as id, French translation, educational score, domain, and document type. 'Domain' and 'document type' are categorical variables with three and four categories respectively. The dataset is split into a training set and a validation set, containing 358,199 and 39,800 samples respectively. The total download size is 245,314,153 bytes, and the overall size is 438,787,962 bytes.

huggingface
View Details

wmt/wmt20_mlqe_task1

Machine Translation
Quality Assessment

This dataset is part of the WMT20 Multilingual Quality Estimation (MLQE) task, used to evaluate the quality of neural machine translation outputs without reference translations. It includes translation pairs for several language directions (e.g., en‑de, en‑zh) sourced from Wikipedia and Reddit. Each sentence is annotated with Direct Assessment (DA) scores ranging from 0 to 100 by professional translators. The dataset is split into training, validation, and test sets (7 k training, 1 k validation, 1 k test per configuration) and is intended for research on automatic quality estimation of NMT systems.

hugging_face
View Details

Qilin Watermelon Dataset

Watermelon Research
Quality Assessment

The Qilin Watermelon dataset is a unique collection exploring the relationship between watermelon appearance, knock sound, and sweetness. It aims to promote research in non‑destructive watermelon quality assessment. The dataset consists of two parts: (1) wav files capturing the sound produced when watermelons are tapped, reflecting acoustic characteristics that may indicate internal structure and maturity; (2) jpg files showing external appearance, including color, texture, and shape. Additionally, sugar content measured with a refractometer is provided, allowing correlation with acoustic and visual features.

github
View Details