JUHE API Marketplace
DATASET
Open Source Community

AI4Sec/cti-bench

CTIBench is a comprehensive benchmark suite and dataset designed to evaluate large language models (LLMs) on cyber‑threat intelligence (CTI) tasks. The dataset includes multiple tasks such as multiple‑choice questions (CTI‑MCQ), vulnerability classification (CTI‑RCM), vulnerability scoring (CTI‑VSP), and threat‑report analysis (CTI‑TAA). Each task is provided as a TSV file containing prompts and the correct answer. The data were curated by Md Tanvirul Alam and Dipkamal Bhusal, sourced from authoritative standards such as NIST, MITRE, and GDPR.

Updated 8/17/2024
hugging_face

Description

Dataset Card: CTIBench

Dataset Overview

CTIBench is a suite of benchmark tasks and datasets for assessing LLMs on cyber‑threat intelligence (CTI) tasks.

Dataset Details

Dataset Description

CTIBench is a comprehensive benchmark suite designed to evaluate LLM performance in the CTI domain.

Components:

  • CTI‑MCQ: A knowledge‑assessment dataset of multiple‑choice questions evaluating LLM understanding of CTI standards, threats, detection strategies, mitigation plans, and best practices. Built from authoritative sources including NIST, MITRE, and GDPR.
  • CTI‑RCM: A practical task mapping Common Vulnerabilities and Exposures (CVE) descriptions to Common Weakness Enumeration (CWE) categories, testing LLM capability to understand and classify cyber threats.
  • CTI‑VSP: A task requiring the calculation of Common Vulnerability Scoring System (CVSS) scores, assessing LLM ability to evaluate vulnerability severity.
  • CTI‑TAA: A task involving analysis of public threat reports and attribution to specific threat actors or malware families, testing LLM comprehension of historical cyber‑threat behavior and meaningful correlation identification.

Dataset Source

Repository: https://github.com/xashru/cti-bench

Dataset Structure

The dataset consists of 5 TSV files, each corresponding to a different task. Each TSV includes a "Prompt" column posing the question to the LLM. Most files also contain a "GT" column with the ground‑truth answer, except for "cti‑taa.tsv". Evaluation scripts for each task are available in the associated GitHub repository.

Dataset Creation

Rationale

The dataset was created to benchmark LLMs' ability to understand and analyze various aspects of open‑source CTI.

Source Data

URLs indicating the origins of the collected data are included in the dataset.

Personal and Sensitive Information

The dataset contains no personal or sensitive information.

Citation

Paper link: https://arxiv.org/abs/2406.07599

BibTeX:

@misc{alam2024ctibench,
      title={CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence}, 
      author={Md Tanvirul Alam and Dipkamal Bhushal and Le Nguyen and Nidhi Rastogi},
      year={2024},
      eprint={2406.07599},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

Contact

Md Tanvirul Alam (ma8235@rit.edu)

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Cyber Threat Intelligence
Large Language Model Evaluation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.