Back to datasets
Dataset assetOpen Source CommunityCyber Threat IntelligenceLarge Language Model Evaluation

AI4Sec/cti-bench

CTIBench is a comprehensive benchmark suite and dataset designed to evaluate large language models (LLMs) on cyber‑threat intelligence (CTI) tasks. The dataset includes multiple tasks such as multiple‑choice questions (CTI‑MCQ), vulnerability classification (CTI‑RCM), vulnerability scoring (CTI‑VSP), and threat‑report analysis (CTI‑TAA). Each task is provided as a TSV file containing prompts and the correct answer. The data were curated by Md Tanvirul Alam and Dipkamal Bhusal, sourced from authoritative standards such as NIST, MITRE, and GDPR.

Source
hugging_face
Created
Nov 28, 2025
Updated
Aug 17, 2024
Signals
277 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Card: CTIBench

Dataset Overview

CTIBench is a suite of benchmark tasks and datasets for assessing LLMs on cyber‑threat intelligence (CTI) tasks.

Dataset Details

Dataset Description

CTIBench is a comprehensive benchmark suite designed to evaluate LLM performance in the CTI domain.

Components:

  • CTI‑MCQ: A knowledge‑assessment dataset of multiple‑choice questions evaluating LLM understanding of CTI standards, threats, detection strategies, mitigation plans, and best practices. Built from authoritative sources including NIST, MITRE, and GDPR.
  • CTI‑RCM: A practical task mapping Common Vulnerabilities and Exposures (CVE) descriptions to Common Weakness Enumeration (CWE) categories, testing LLM capability to understand and classify cyber threats.
  • CTI‑VSP: A task requiring the calculation of Common Vulnerability Scoring System (CVSS) scores, assessing LLM ability to evaluate vulnerability severity.
  • CTI‑TAA: A task involving analysis of public threat reports and attribution to specific threat actors or malware families, testing LLM comprehension of historical cyber‑threat behavior and meaningful correlation identification.

Dataset Source

Repository: https://github.com/xashru/cti-bench

Dataset Structure

The dataset consists of 5 TSV files, each corresponding to a different task. Each TSV includes a "Prompt" column posing the question to the LLM. Most files also contain a "GT" column with the ground‑truth answer, except for "cti‑taa.tsv". Evaluation scripts for each task are available in the associated GitHub repository.

Dataset Creation

Rationale

The dataset was created to benchmark LLMs' ability to understand and analyze various aspects of open‑source CTI.

Source Data

URLs indicating the origins of the collected data are included in the dataset.

Personal and Sensitive Information

The dataset contains no personal or sensitive information.

Citation

Paper link: https://arxiv.org/abs/2406.07599

BibTeX:

@misc{alam2024ctibench,
      title={CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence}, 
      author={Md Tanvirul Alam and Dipkamal Bhushal and Le Nguyen and Nidhi Rastogi},
      year={2024},
      eprint={2406.07599},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

Contact

Md Tanvirul Alam (ma8235@rit.edu)

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio