Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Category index
Showing 3 of 3 datasets
Category: Quality Assessment

pubmed-en-quality-annotations-7

Document ClassificationQuality Assessment

This dataset includes several features such as id, French translation, educational score, domain, and document type. 'Domain' and 'document type' are categorical variables with three and four categories respectively. The dataset is split into a training set and a validation set, containing 358,199 and 39,800 samples respectively. The total download size is 245,314,153 bytes, and the overall size is 438,787,962 bytes.

Source huggingfaceUpdated Dec 12, 202481 viewsLinked
Inspect dataset

wmt/wmt20_mlqe_task1

Machine TranslationQuality Assessment

This dataset is part of the WMT20 Multilingual Quality Estimation (MLQE) task, used to evaluate the quality of neural machine translation outputs without reference translations. It includes translation pairs for several language directions (e.g., en‑de, en‑zh) sourced from Wikipedia and Reddit. Each sentence is annotated with Direct Assessment (DA) scores ranging from 0 to 100 by professional translators. The dataset is split into training, validation, and test sets (7 k training, 1 k validation, 1 k test per configuration) and is intended for research on automatic quality estimation of NMT systems.

Source hugging_faceUpdated Apr 4, 2024192 viewsLinked
Inspect dataset

Qilin Watermelon Dataset

Watermelon ResearchQuality Assessment

The Qilin Watermelon dataset is a unique collection exploring the relationship between watermelon appearance, knock sound, and sweetness. It aims to promote research in non‑destructive watermelon quality assessment. The dataset consists of two parts: (1) wav files capturing the sound produced when watermelons are tapped, reflecting acoustic characteristics that may indicate internal structure and maturity; (2) jpg files showing external appearance, including color, texture, and shape. Additionally, sugar content measured with a refractometer is provided, allowing correlation with acoustic and visual features.

Source githubUpdated Jul 11, 2024454 viewsLinked
Inspect dataset