Back to datasets
Dataset assetOpen Source CommunityMachine Translation

haoranxu/WMT22-Test

The dataset provides configurations for multiple language pairs, including cs‑en (Czech‑English), de‑en (German‑English), en‑cs, en‑de, en‑is (English‑Icelandic), en‑ru, en‑zh, is‑en, ru‑en, and zh‑en. For each configuration, features consist of string columns for the two languages, and a test split with specified byte size and number of examples. The dataset is intended for machine translation tasks.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 17, 2024
Signals
105 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Configurations

cs‑en

  • Features:
    • cs: string
    • en: string
  • Split:
    • test: 325,040 bytes, 1,448 samples
  • Download size: 224,193 bytes
  • Dataset size: 325,040 bytes

de‑en

  • Features:
    • de: string
    • en: string
  • Split:
    • test: 403,424 bytes, 1,984 samples
  • Download size: 267,107 bytes
  • Dataset size: 403,424 bytes

en‑cs

  • Features:
    • cs: string
    • en: string
  • Split:
    • test: 422,875 bytes, 2,037 samples
  • Download size: 281,086 bytes
  • Dataset size: 422,875 bytes

en‑de

  • Features:
    • de: string
    • en: string
  • Split:
    • test: 442,576 bytes, 2,037 samples
  • Download size: 280,415 bytes
  • Dataset size: 442,576 bytes

en‑is

  • Features:
    • en: string
    • is: string
  • Split:
    • test: 310,807 bytes, 1,000 samples
  • Download size: 197,437 bytes
  • Dataset size: 310,807 bytes

en‑ru

  • Features:
    • en: string
    • ru: string
  • Split:
    • test: 598,414 bytes, 2,037 samples
  • Download size: 333,784 bytes
  • Dataset size: 598,414 bytes

en‑zh

  • Features:
    • en: string
    • zh: string
  • Split:
    • test: 383,751 bytes, 2,037 samples
  • Download size: 257,805 bytes
  • Dataset size: 383,751 bytes

is‑en

  • Features:
    • en: string
    • is: string
  • Split:
    • test: 248,029 bytes, 1,000 samples
  • Download size: 152,885 bytes
  • Dataset size: 248,029 bytes

ru‑en

  • Features:
    • en: string
    • ru: string
  • Split:
    • test: 579,656 bytes, 2,016 samples
  • Download size: 340,830 bytes
  • Dataset size: 579,656 bytes

zh‑en

  • Features:
    • en: string
    • zh: string
  • Split:
    • test: 526,074 bytes, 1,875 samples
  • Download size: 333,078 bytes
  • Dataset size: 526,074 bytes
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio