Explore high-quality datasets for your AI and machine learning projects.
The SpamAssassin public email corpus is a collection of email messages assembled by members of the SpamAssassin project, suitable for testing spam‑filtering systems. The dataset contains various email samples divided into spam and ham categories, with further sub‑groups such as hard_ham, spam_2, spam, easy_ham, and easy_ham_2. Structure includes fields like label, group, text, and raw; only a training split is provided.