JUHE API Marketplace
API CatalogDatasetsDocsBlog
API CatalogDatasetsDocsBlog

Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Category index
Showing 1 of 1 datasets
Category: Japanese‑English Corpus

JESC

Machine TranslationJapanese‑English Corpus

The JESC dataset is a Japanese‑English subtitle corpus created by Stanford University, Google Brain, and Rakuten Institute of Technology. Sourced from movie and TV subtitles on the web, it is one of the largest free EN‑JA corpora, focusing on conversational language. It contains 2.8 million sentence pairs covering everyday language, slang, instructions, and narratives. Licensed under CC‑BY‑4.0, it includes pre‑processed data with tokenized train/dev/test splits, primarily intended for translation tasks.

Source huggingfaceUpdated Aug 27, 2024282 viewsLinked
Inspect dataset
JUHE API Marketplace

Accelerate development and ship production-grade integrations with APIs, MCP services, and AI-first infrastructure workflows.

For Developers

ConsoleDocumentation

Product

Browse APIsTemp Mail APIGlobal SMS

Company

What's NewContact SupportTerms Of ServicePrivacy Policy
Copyright © 2026 JUHEDATA HK LIMITED - All rights reserved