DATASET
Open Source Community
haoranxu/WMT22-Test
The dataset provides configurations for multiple language pairs, including cs‑en (Czech‑English), de‑en (German‑English), en‑cs, en‑de, en‑is (English‑Icelandic), en‑ru, en‑zh, is‑en, ru‑en, and zh‑en. For each configuration, features consist of string columns for the two languages, and a test split with specified byte size and number of examples. The dataset is intended for machine translation tasks.
Updated 1/17/2024
hugging_face
Description
Dataset Overview
Configurations
cs‑en
- Features:
cs: stringen: string
- Split:
test: 325,040 bytes, 1,448 samples
- Download size: 224,193 bytes
- Dataset size: 325,040 bytes
de‑en
- Features:
de: stringen: string
- Split:
test: 403,424 bytes, 1,984 samples
- Download size: 267,107 bytes
- Dataset size: 403,424 bytes
en‑cs
- Features:
cs: stringen: string
- Split:
test: 422,875 bytes, 2,037 samples
- Download size: 281,086 bytes
- Dataset size: 422,875 bytes
en‑de
- Features:
de: stringen: string
- Split:
test: 442,576 bytes, 2,037 samples
- Download size: 280,415 bytes
- Dataset size: 442,576 bytes
en‑is
- Features:
en: stringis: string
- Split:
test: 310,807 bytes, 1,000 samples
- Download size: 197,437 bytes
- Dataset size: 310,807 bytes
en‑ru
- Features:
en: stringru: string
- Split:
test: 598,414 bytes, 2,037 samples
- Download size: 333,784 bytes
- Dataset size: 598,414 bytes
en‑zh
- Features:
en: stringzh: string
- Split:
test: 383,751 bytes, 2,037 samples
- Download size: 257,805 bytes
- Dataset size: 383,751 bytes
is‑en
- Features:
en: stringis: string
- Split:
test: 248,029 bytes, 1,000 samples
- Download size: 152,885 bytes
- Dataset size: 248,029 bytes
ru‑en
- Features:
en: stringru: string
- Split:
test: 579,656 bytes, 2,016 samples
- Download size: 340,830 bytes
- Dataset size: 579,656 bytes
zh‑en
- Features:
en: stringzh: string
- Split:
test: 526,074 bytes, 1,875 samples
- Download size: 333,078 bytes
- Dataset size: 526,074 bytes
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Machine Translation
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.