JUHE API Marketplace
DATASET
Open Source Community

bvk/ENRON-spam

After the Enron scandal in the United States, the Federal Energy Regulatory Commission released a dataset of 600,000 emails from 158 employees. The dataset was later purchased and processed by MIT, with some attachments removed. Different versions of the dataset remain available at the Library of Congress and specific websites. A commonly used subset was created by researchers at the Institute of Informatics and Telecommunications of Greece for analyzing and testing various spam filters, including several Naïve Bayes versions. The current CSV file contains this specific subset, comprising 33,716 emails, of which 17,171 are spam. The file includes a concatenated subject‑and‑body field and a separate column for the original filename.

Updated 7/16/2024
hugging_face

Description

Enron Email Dataset

Overview

  • Source: The dataset originates from 600,000 emails released by the U.S. Federal Energy Regulatory Commission, involving 158 employees. It was later purchased and processed by MIT, with some attachments deleted or edited.
  • Versions: Versions of the dataset are available at the Library of Congress and https://www.cs.cmu.edu/~./enron/.

Subset

  • Subset Source: Multiple subsets of the dataset can be found online, including on GitHub, HuggingFace, and Kaggle.
  • Specific Subset: Researchers from the Institute of Informatics and Telecommunications of Greece described a commonly used subset in their paper [Metsis]. This subset selected six Enron employees with large email volumes, containing 33,716 emails, of which 17,171 are spam.

Data Content

  • File Format: CSV file.
  • Fields: Includes a concatenated subject‑and‑body field and a separate original filename column.

Research Purpose

  • Research Direction: Used to analyze and test various spam filters, including multiple Naïve Bayes versions.

References

  • [Metsis] Metsis, V., Androutsopoulos, I., & Paliouras, G. "Spam filtering with naive bayes‑which naive bayes?" Proceedings of the 3rd Conference on Email and Anti‑Spam (CEAS 2006), Mountain View, CA, USA, 2006.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Spam Detection
Naive Bayes

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.