mteb/stsb_multi_mt
Multilingual ProcessingSemantic Similarity Evaluation
STSb Multi MT is a multilingual semantic textual similarity benchmark containing sentence pairs and similarity scores for German, English, Spanish, French, Italian, Dutch, Polish, Portuguese, Russian, and Chinese. Built from the STS‑benchmark dataset and translated via deepl.com, it can be used to train sentence‑embedding models such as T‑Systems‑onsite/cross‑en‑de‑roberta‑sentence‑transformer. The collection includes a training set (5,749 pairs), development set (1,500 pairs), and test set (1,379 pairs).
Source hugging_faceUpdated May 4, 2025169 viewsLinked
Inspect dataset