Datasets | JuheAPI

bvk/ENRON-spam

Spam Detection

Naive Bayes

After the Enron scandal in the United States, the Federal Energy Regulatory Commission released a dataset of 600,000 emails from 158 employees. The dataset was later purchased and processed by MIT, with some attachments removed. Different versions of the dataset remain available at the Library of Congress and specific websites. A commonly used subset was created by researchers at the Institute of Informatics and Telecommunications of Greece for analyzing and testing various spam filters, including several Naïve Bayes versions. The current CSV file contains this specific subset, comprising 33,716 emails, of which 17,171 are spam. The file includes a concatenated subject‑and‑body field and a separate column for the original filename.

hugging_face

View Details

Dataset Hub

Browse by Category

bvk/ENRON-spam

CSDMC2010 SPAM corpus