High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

bvk/ENRON-spam

After the Enron scandal in the United States, the Federal Energy Regulatory Commission released a dataset of 600,000 emails from 158 employees. The dataset was later purchased and processed by MIT, with some attachments removed. Different versions of the dataset remain available at the Library of Congress and specific websites. A commonly used subset was created by researchers at the Institute of Informatics and Telecommunications of Greece for analyzing and testing various spam filters, including several Naïve Bayes versions. The current CSV file contains this specific subset, comprising 33,716 emails, of which 17,171 are spam. The file includes a concatenated subject‑and‑body field and a separate column for the original filename.

hugging_face

View Details