JUHE API Marketplace
DATASET
Open Source Community

haoranxu/WMT22-Test

The dataset provides configurations for multiple language pairs, including cs‑en (Czech‑English), de‑en (German‑English), en‑cs, en‑de, en‑is (English‑Icelandic), en‑ru, en‑zh, is‑en, ru‑en, and zh‑en. For each configuration, features consist of string columns for the two languages, and a test split with specified byte size and number of examples. The dataset is intended for machine translation tasks.

Updated 1/17/2024
hugging_face

Description

Dataset Overview

Configurations

cs‑en

  • Features:
    • cs: string
    • en: string
  • Split:
    • test: 325,040 bytes, 1,448 samples
  • Download size: 224,193 bytes
  • Dataset size: 325,040 bytes

de‑en

  • Features:
    • de: string
    • en: string
  • Split:
    • test: 403,424 bytes, 1,984 samples
  • Download size: 267,107 bytes
  • Dataset size: 403,424 bytes

en‑cs

  • Features:
    • cs: string
    • en: string
  • Split:
    • test: 422,875 bytes, 2,037 samples
  • Download size: 281,086 bytes
  • Dataset size: 422,875 bytes

en‑de

  • Features:
    • de: string
    • en: string
  • Split:
    • test: 442,576 bytes, 2,037 samples
  • Download size: 280,415 bytes
  • Dataset size: 442,576 bytes

en‑is

  • Features:
    • en: string
    • is: string
  • Split:
    • test: 310,807 bytes, 1,000 samples
  • Download size: 197,437 bytes
  • Dataset size: 310,807 bytes

en‑ru

  • Features:
    • en: string
    • ru: string
  • Split:
    • test: 598,414 bytes, 2,037 samples
  • Download size: 333,784 bytes
  • Dataset size: 598,414 bytes

en‑zh

  • Features:
    • en: string
    • zh: string
  • Split:
    • test: 383,751 bytes, 2,037 samples
  • Download size: 257,805 bytes
  • Dataset size: 383,751 bytes

is‑en

  • Features:
    • en: string
    • is: string
  • Split:
    • test: 248,029 bytes, 1,000 samples
  • Download size: 152,885 bytes
  • Dataset size: 248,029 bytes

ru‑en

  • Features:
    • en: string
    • ru: string
  • Split:
    • test: 579,656 bytes, 2,016 samples
  • Download size: 340,830 bytes
  • Dataset size: 579,656 bytes

zh‑en

  • Features:
    • en: string
    • zh: string
  • Split:
    • test: 526,074 bytes, 1,875 samples
  • Download size: 333,078 bytes
  • Dataset size: 526,074 bytes

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Machine Translation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.