DATASET
Open Source Community
autogenCTF/CTFAIA
CTFAIA is a benchmark dataset designed to evaluate next‑generation large language models on cybersecurity tasks, especially CTF competition problems. It contains over 100 non‑trivial challenges categorized into three difficulty levels based on required tool usage and logical reasoning. Each challenge has a public development split and a private test split.
Updated 6/4/2024
hugging_face
Description
CTFAIA Dataset Overview
Dataset Name
- Name: Capture The Flag (CTF) AI Assistants Benchmark
- Abbreviation: CTFAIA
Dataset Description
- Purpose: Evaluate next‑generation large language models in cybersecurity, particularly on CTF competition problems.
- Features: Over 100 non‑trivial problems with clear answers, requiring varying degrees of tool use and autonomous problem‑solving.
- Structure: Three difficulty levels, each with different tool‑usage and reasoning requirements.
- Data Split: Public development set for validation and a private test set with hidden answers and metadata.
Dataset Contents
- Problem Storage: Problems are stored in a
metadata.jsonlfile. - Auxiliary Files: Some problems include additional folders, identified by the
Annexfield.
Access and Contribution
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Cybersecurity
Artificial Intelligence
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.