Falco-Alerts-Dataset-with-APT-attacks
We have constructed a relatively large Falco alert dataset for Kubernetes, containing both normal and APT attack alerts to facilitate the training of attack prediction models and support future research. Attack alerts were generated by applying CALDERA, an adversary simulation platform developed by MITRE, to simulate attacks in a Kubernetes cluster using MITRE ATT&CK tactic sequences. Normal alerts were obtained from Falco's routine alerts generated in the absence of attacks. All alerts were labeled as 'attack' or 'normal'. The dataset comprises 231 K alerts, including 2,314 attack alerts and 228,686 normal alerts.
Description
Dataset Overview
Dataset Purpose
This dataset aims to provide a relatively large Falco alert collection for Kubernetes, containing both normal and APT attack data to facilitate the learning of attack‑prediction models and support future research.
Dataset Content
- Total alerts: 231,000
- Attack alerts: 2,314
- Normal alerts: 228,686
- Attack simulation: Conducted using the CALDERA platform, a MITRE‑developed adversary simulation tool, to emulate attacks in a Kubernetes cluster according to MITRE ATT&CK tactic sequences.
- Normal alerts: Generated by Falco during routine operation without any attacks.
- Labeling: Alerts are labeled as "attack" or "normal".
Dataset Processing
- Balancing: Normal alerts were down‑sampled and attack alerts up‑sampled to achieve a balanced dataset.
Dataset Structure
- Attack Falco alert configuration files: Sample Falco alerts for three simulated attacks.
- Collected alerts:
- Raw alerts: Collected from 11 Pods in a test environment.
- Processed and labeled alerts: After cleaning and labeling.
- Final balanced labeled Falco alert files: Post‑balancing dataset.
- Simulated attack Falco alerts: Falco alerts recorded during eight simulated attacks across the 11 Pods.
- MITRE tactic sequences: Stored as {container ID : MITRE ATT&CK tactics}, extracted from alerts collected per Pod.
Dataset Challenges
- Alert aggregation: Requires merging alerts from all cluster resources to reconstruct attack steps.
- Data imbalance: Normal alerts vastly outnumber attack alerts, necessitating sampling techniques for balancing.
Test Environment
- Kubernetes cluster: Deployed across 11 VMs (1 master, 10 workers).
- Hardware: Server equipped with 2× Intel Xeon Gold 5120 CPUs and 128 GB DDR4‑2933 RAM.
Research Applications
- ML prediction models: Predict attacks based on MITRE ATT&CK tactics; future work may extend to technique‑level analysis using NLP methods.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 2/20/2023
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.