JUHE API Marketplace
DATASET
Open Source Community

autogenCTF/CTFAIA

CTFAIA is a benchmark dataset designed to evaluate next‑generation large language models on cybersecurity tasks, especially CTF competition problems. It contains over 100 non‑trivial challenges categorized into three difficulty levels based on required tool usage and logical reasoning. Each challenge has a public development split and a private test split.

Updated 6/4/2024
hugging_face

Description

CTFAIA Dataset Overview

Dataset Name

  • Name: Capture The Flag (CTF) AI Assistants Benchmark
  • Abbreviation: CTFAIA

Dataset Description

  • Purpose: Evaluate next‑generation large language models in cybersecurity, particularly on CTF competition problems.
  • Features: Over 100 non‑trivial problems with clear answers, requiring varying degrees of tool use and autonomous problem‑solving.
  • Structure: Three difficulty levels, each with different tool‑usage and reasoning requirements.
  • Data Split: Public development set for validation and a private test set with hidden answers and metadata.

Dataset Contents

  • Problem Storage: Problems are stored in a metadata.jsonl file.
  • Auxiliary Files: Some problems include additional folders, identified by the Annex field.

Access and Contribution

  • Leaderboard: View the CTFAIA leaderboard here.
  • Dataset Link: Access the dataset here.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Cybersecurity
Artificial Intelligence

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.