Explore high-quality datasets for your AI and machine learning projects.
This dataset comprises Japanese‑English translation pairs extracted from the Multitarget TED Talks (MTTT) dataset, based on TED talks. The data originates from WIT³ and is used in the IWSLT machine translation evaluation campaign. It contains a single training split with 158,535 examples, each consisting of an English sentence and a Japanese sentence. The dataset is released under the CC BY‑NC‑ND 4.0 license, requiring acknowledgment of TED's contribution.