High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

SEACrowd/toxicity_200

Toxicity-200 is a vocabulary list for detecting toxic content in 200 languages. It includes common profanity, insulting terms, hate speech, pornographic terms, and body‑part terms related to sexual activity. Supported languages include ind, ace, bjn, bug, jav.

hugging_face

View Details