Back to datasets
Dataset assetOpen Source CommunityMachine Translation
haoranxu/WMT22-Test
The dataset provides configurations for multiple language pairs, including cs‑en (Czech‑English), de‑en (German‑English), en‑cs, en‑de, en‑is (English‑Icelandic), en‑ru, en‑zh, is‑en, ru‑en, and zh‑en. For each configuration, features consist of string columns for the two languages, and a test split with specified byte size and number of examples. The dataset is intended for machine translation tasks.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 17, 2024
Signals
105 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Configurations
cs‑en
- Features:
cs: stringen: string
- Split:
test: 325,040 bytes, 1,448 samples
- Download size: 224,193 bytes
- Dataset size: 325,040 bytes
de‑en
- Features:
de: stringen: string
- Split:
test: 403,424 bytes, 1,984 samples
- Download size: 267,107 bytes
- Dataset size: 403,424 bytes
en‑cs
- Features:
cs: stringen: string
- Split:
test: 422,875 bytes, 2,037 samples
- Download size: 281,086 bytes
- Dataset size: 422,875 bytes
en‑de
- Features:
de: stringen: string
- Split:
test: 442,576 bytes, 2,037 samples
- Download size: 280,415 bytes
- Dataset size: 442,576 bytes
en‑is
- Features:
en: stringis: string
- Split:
test: 310,807 bytes, 1,000 samples
- Download size: 197,437 bytes
- Dataset size: 310,807 bytes
en‑ru
- Features:
en: stringru: string
- Split:
test: 598,414 bytes, 2,037 samples
- Download size: 333,784 bytes
- Dataset size: 598,414 bytes
en‑zh
- Features:
en: stringzh: string
- Split:
test: 383,751 bytes, 2,037 samples
- Download size: 257,805 bytes
- Dataset size: 383,751 bytes
is‑en
- Features:
en: stringis: string
- Split:
test: 248,029 bytes, 1,000 samples
- Download size: 152,885 bytes
- Dataset size: 248,029 bytes
ru‑en
- Features:
en: stringru: string
- Split:
test: 579,656 bytes, 2,016 samples
- Download size: 340,830 bytes
- Dataset size: 579,656 bytes
zh‑en
- Features:
en: stringzh: string
- Split:
test: 526,074 bytes, 1,875 samples
- Download size: 333,078 bytes
- Dataset size: 526,074 bytes
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.