Dataset assetOpen Source CommunityCybersecurityAPT Attacks

Falco-Alerts-Dataset-with-APT-attacks

We have constructed a relatively large Falco alert dataset for Kubernetes, containing both normal and APT attack alerts to facilitate the training of attack prediction models and support future research. Attack alerts were generated by applying CALDERA, an adversary simulation platform developed by MITRE, to simulate attacks in a Kubernetes cluster using MITRE ATT&CK tactic sequences. Normal alerts were obtained from Falco's routine alerts generated in the absence of attacks. All alerts were labeled as 'attack' or 'normal'. The dataset comprises 231 K alerts, including 2,314 attack alerts and 228,686 normal alerts.

Source

github

Created

Feb 20, 2023

Updated

Apr 4, 2024

Signals

370 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Purpose

This dataset aims to provide a relatively large Falco alert collection for Kubernetes, containing both normal and APT attack data to facilitate the learning of attack‑prediction models and support future research.

Dataset Content

Total alerts: 231,000
- Attack alerts: 2,314
- Normal alerts: 228,686
Attack simulation: Conducted using the CALDERA platform, a MITRE‑developed adversary simulation tool, to emulate attacks in a Kubernetes cluster according to MITRE ATT&CK tactic sequences.
Normal alerts: Generated by Falco during routine operation without any attacks.
Labeling: Alerts are labeled as "attack" or "normal".

Dataset Processing

Balancing: Normal alerts were down‑sampled and attack alerts up‑sampled to achieve a balanced dataset.

Dataset Structure

Attack Falco alert configuration files: Sample Falco alerts for three simulated attacks.
Collected alerts:
- Raw alerts: Collected from 11 Pods in a test environment.
- Processed and labeled alerts: After cleaning and labeling.
- Final balanced labeled Falco alert files: Post‑balancing dataset.
Simulated attack Falco alerts: Falco alerts recorded during eight simulated attacks across the 11 Pods.
MITRE tactic sequences: Stored as {container ID : MITRE ATT&CK tactics}, extracted from alerts collected per Pod.

Dataset Challenges

Alert aggregation: Requires merging alerts from all cluster resources to reconstruct attack steps.
Data imbalance: Normal alerts vastly outnumber attack alerts, necessitating sampling techniques for balancing.

Test Environment

Kubernetes cluster: Deployed across 11 VMs (1 master, 10 workers).
Hardware: Server equipped with 2× Intel Xeon Gold 5120 CPUs and 128 GB DDR4‑2933 RAM.

Research Applications

ML prediction models: Predict attacks based on MITRE ATT&CK tactics; future work may extend to technique‑level analysis using NLP methods.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio