Dataset assetOpen Source CommunityCybersecurityArtificial Intelligence

autogenCTF/CTFAIA

CTFAIA is a benchmark dataset designed to evaluate next‑generation large language models on cybersecurity tasks, especially CTF competition problems. It contains over 100 non‑trivial challenges categorized into three difficulty levels based on required tool usage and logical reasoning. Each challenge has a public development split and a private test split.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jun 4, 2024

Signals

134 views

Availability

Linked source ready

Overview

Dataset description and usage context

CTFAIA Dataset Overview

Dataset Name

Name: Capture The Flag (CTF) AI Assistants Benchmark
Abbreviation: CTFAIA

Dataset Description

Purpose: Evaluate next‑generation large language models in cybersecurity, particularly on CTF competition problems.
Features: Over 100 non‑trivial problems with clear answers, requiring varying degrees of tool use and autonomous problem‑solving.
Structure: Three difficulty levels, each with different tool‑usage and reasoning requirements.
Data Split: Public development set for validation and a private test set with hidden answers and metadata.

Dataset Contents

Problem Storage: Problems are stored in a metadata.jsonl file.
Auxiliary Files: Some problems include additional folders, identified by the Annex field.

Access and Contribution

Leaderboard: View the CTFAIA leaderboard here.
Dataset Link: Access the dataset here.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio