Back to datasets
Dataset assetOpen Source CommunityCybersecurityArtificial Intelligence

autogenCTF/CTFAIA

CTFAIA is a benchmark dataset designed to evaluate next‑generation large language models on cybersecurity tasks, especially CTF competition problems. It contains over 100 non‑trivial challenges categorized into three difficulty levels based on required tool usage and logical reasoning. Each challenge has a public development split and a private test split.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 4, 2024
Signals
134 views
Availability
Linked source ready
Overview

Dataset description and usage context

CTFAIA Dataset Overview

Dataset Name

  • Name: Capture The Flag (CTF) AI Assistants Benchmark
  • Abbreviation: CTFAIA

Dataset Description

  • Purpose: Evaluate next‑generation large language models in cybersecurity, particularly on CTF competition problems.
  • Features: Over 100 non‑trivial problems with clear answers, requiring varying degrees of tool use and autonomous problem‑solving.
  • Structure: Three difficulty levels, each with different tool‑usage and reasoning requirements.
  • Data Split: Public development set for validation and a private test set with hidden answers and metadata.

Dataset Contents

  • Problem Storage: Problems are stored in a metadata.jsonl file.
  • Auxiliary Files: Some problems include additional folders, identified by the Annex field.

Access and Contribution

  • Leaderboard: View the CTFAIA leaderboard here.
  • Dataset Link: Access the dataset here.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio