Back to datasets
Dataset assetOpen Source CommunityCybersecurityArtificial Intelligence
autogenCTF/CTFAIA
CTFAIA is a benchmark dataset designed to evaluate next‑generation large language models on cybersecurity tasks, especially CTF competition problems. It contains over 100 non‑trivial challenges categorized into three difficulty levels based on required tool usage and logical reasoning. Each challenge has a public development split and a private test split.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 4, 2024
Signals
134 views
Availability
Linked source ready
Overview
Dataset description and usage context
CTFAIA Dataset Overview
Dataset Name
- Name: Capture The Flag (CTF) AI Assistants Benchmark
- Abbreviation: CTFAIA
Dataset Description
- Purpose: Evaluate next‑generation large language models in cybersecurity, particularly on CTF competition problems.
- Features: Over 100 non‑trivial problems with clear answers, requiring varying degrees of tool use and autonomous problem‑solving.
- Structure: Three difficulty levels, each with different tool‑usage and reasoning requirements.
- Data Split: Public development set for validation and a private test set with hidden answers and metadata.
Dataset Contents
- Problem Storage: Problems are stored in a
metadata.jsonlfile. - Auxiliary Files: Some problems include additional folders, identified by the
Annexfield.
Access and Contribution
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.