AI4Sec/cti-bench
CTIBench is a comprehensive benchmark suite and dataset designed to evaluate large language models (LLMs) on cyber‑threat intelligence (CTI) tasks. The dataset includes multiple tasks such as multiple‑choice questions (CTI‑MCQ), vulnerability classification (CTI‑RCM), vulnerability scoring (CTI‑VSP), and threat‑report analysis (CTI‑TAA). Each task is provided as a TSV file containing prompts and the correct answer. The data were curated by Md Tanvirul Alam and Dipkamal Bhusal, sourced from authoritative standards such as NIST, MITRE, and GDPR.
Description
Dataset Card: CTIBench
Dataset Overview
CTIBench is a suite of benchmark tasks and datasets for assessing LLMs on cyber‑threat intelligence (CTI) tasks.
Dataset Details
Dataset Description
CTIBench is a comprehensive benchmark suite designed to evaluate LLM performance in the CTI domain.
Components:
- CTI‑MCQ: A knowledge‑assessment dataset of multiple‑choice questions evaluating LLM understanding of CTI standards, threats, detection strategies, mitigation plans, and best practices. Built from authoritative sources including NIST, MITRE, and GDPR.
- CTI‑RCM: A practical task mapping Common Vulnerabilities and Exposures (CVE) descriptions to Common Weakness Enumeration (CWE) categories, testing LLM capability to understand and classify cyber threats.
- CTI‑VSP: A task requiring the calculation of Common Vulnerability Scoring System (CVSS) scores, assessing LLM ability to evaluate vulnerability severity.
- CTI‑TAA: A task involving analysis of public threat reports and attribution to specific threat actors or malware families, testing LLM comprehension of historical cyber‑threat behavior and meaningful correlation identification.
Dataset Source
Repository: https://github.com/xashru/cti-bench
Dataset Structure
The dataset consists of 5 TSV files, each corresponding to a different task. Each TSV includes a "Prompt" column posing the question to the LLM. Most files also contain a "GT" column with the ground‑truth answer, except for "cti‑taa.tsv". Evaluation scripts for each task are available in the associated GitHub repository.
Dataset Creation
Rationale
The dataset was created to benchmark LLMs' ability to understand and analyze various aspects of open‑source CTI.
Source Data
URLs indicating the origins of the collected data are included in the dataset.
Personal and Sensitive Information
The dataset contains no personal or sensitive information.
Citation
Paper link: https://arxiv.org/abs/2406.07599
BibTeX:
@misc{alam2024ctibench,
title={CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence},
author={Md Tanvirul Alam and Dipkamal Bhushal and Le Nguyen and Nidhi Rastogi},
year={2024},
eprint={2406.07599},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
Contact
Md Tanvirul Alam (ma8235@rit.edu)
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.