CSIC 2010 Dataset
The project uses the CSIC 2010 Dataset, a comprehensive collection of HTTP request logs that includes both normal and malicious traffic. It is designed for network intrusion detection research and contains various attack types such as SQL injection, buffer overflow, and directory traversal.
Description
Web Application Attack Detection Using Machine Learning Models
Dataset Overview
Source
The dataset employed is CSIC 2010 Dataset, a comprehensive collection of HTTP request logs containing both normal and malicious traffic. It is intended for network intrusion detection research and includes multiple attack types such as SQL injection, buffer overflow, and directory traversal.
Dataset Details
- Total records: 61,065
- Columns: 17
- Method: HTTP request method (e.g., GET, POST).
- User-Agent: Client details.
- Pragma & Cache‑Control: Caching directives.
- Accept, Accept‑Encoding, Accept‑Charset: Accepted content types, encodings, and charsets.
- Language: Language preferences.
- Host: Server hostname.
- Cookie: Cookies sent with the request.
- Content‑Type: Media type of the request body.
- Connection: Indicates whether the connection should remain open.
- Length & Content: Length and content of the request/response body.
- Classification: Indicates whether the request is normal or anomalous.
- URL: Requested URL.
Data Pre‑processing
Given the nature of the dataset, especially the URL field, extensive preprocessing—parsing and tokenization—is required to extract features suitable for model training.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 9/4/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.