Dataset assetOpen Source CommunityCybersecurityIntrusion Detection

CSIC 2010 Dataset

The project uses the CSIC 2010 Dataset, a comprehensive collection of HTTP request logs that includes both normal and malicious traffic. It is designed for network intrusion detection research and contains various attack types such as SQL injection, buffer overflow, and directory traversal.

Source

github

Created

Sep 4, 2024

Updated

Sep 4, 2024

Signals

1,421 views

Availability

Linked source ready

Overview

Dataset description and usage context

Web Application Attack Detection Using Machine Learning Models

Dataset Overview

Source

The dataset employed is CSIC 2010 Dataset, a comprehensive collection of HTTP request logs containing both normal and malicious traffic. It is intended for network intrusion detection research and includes multiple attack types such as SQL injection, buffer overflow, and directory traversal.

Dataset Details

Total records: 61,065
Columns: 17
- Method: HTTP request method (e.g., GET, POST).
- User-Agent: Client details.
- Pragma & Cache‑Control: Caching directives.
- Accept, Accept‑Encoding, Accept‑Charset: Accepted content types, encodings, and charsets.
- Language: Language preferences.
- Host: Server hostname.
- Cookie: Cookies sent with the request.
- Content‑Type: Media type of the request body.
- Connection: Indicates whether the connection should remain open.
- Length & Content: Length and content of the request/response body.
- Classification: Indicates whether the request is normal or anomalous.
- URL: Requested URL.

Data Pre‑processing

Given the nature of the dataset, especially the URL field, extensive preprocessing—parsing and tokenization—is required to extract features suitable for model training.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio