Back to datasets
Dataset assetOpen Source CommunityMachine TranslationQuality Assessment

wmt/wmt20_mlqe_task1

This dataset is part of the WMT20 Multilingual Quality Estimation (MLQE) task, used to evaluate the quality of neural machine translation outputs without reference translations. It includes translation pairs for several language directions (e.g., en‑de, en‑zh) sourced from Wikipedia and Reddit. Each sentence is annotated with Direct Assessment (DA) scores ranging from 0 to 100 by professional translators. The dataset is split into training, validation, and test sets (7 k training, 1 k validation, 1 k test per configuration) and is intended for research on automatic quality estimation of NMT systems.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 4, 2024
Signals
192 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: WMT20 – MultiLingual Quality Estimation (MLQE) Task 1
  • Alias: MLQE‑Task1

Summary

  • Purpose: Evaluate neural‑machine‑translation output quality without reference translations.
  • Content: Multilingual translation data primarily from Wikipedia, with some from Reddit.
  • Languages: German, English, Estonian, Nepali, Romanian, Russian, Sinhalese, Chinese.

Structure

  • Configurations: en‑de, en‑zh, et‑en, ne‑en, ro‑en, ru‑en, si‑en
  • Features: segid, translation, scores, mean, z_scores, z_mean, model_score, doc_id, nmt_output, word_probas.
  • Splits: train (7 k), validation (1 k), test (1 k) per configuration.

Creation

  • Source: Wikipedia and Reddit; translated using fairseq NMT models; scored by professional translators using Direct Assessment.
  • Scoring: Each sentence receives at least three DA scores (0‑100).

Usage Considerations

  • License: Unknown.
  • Metrics: Pearson correlation between predicted scores and human DA.

Additional Information

  • Contributors: Thanks to @VictorSanh for adding the dataset.

Detailed File Information

  • File Sizes: (bytes) – en‑de: 4 539 012, en‑zh: 4 269 820, etc.
  • Download Sizes: (bytes) – en‑de: 3 293 699, en‑zh: 3 325 683, etc.

Feature Definitions

  • segid: int32 – segment identifier
  • translation: string – source and target text
  • scores: float32 – list of DA scores
  • mean: float32 – average score
  • z_scores: float32 – z‑standardized scores
  • z_mean: float32 – mean of z‑scores
  • model_score: float32 – model‑predicted score
  • doc_id: string – document identifier
  • nmt_output: string – NMT system output
  • word_probas: float32 – word‑level probabilities
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio