pAILabs/base-security-qa
This foundational dataset is a collection of question‑answer pairs focused on the cybersecurity domain, primarily concerning threat hunting, threat intelligence, and malware content. The answers in the foundational dataset are concise, roughly 10% the length of those in the main dataset. The Q‑A pairs are generated from 2023–2024 data and selected semi‑randomly. The (unreleased) main dataset is expected to contain about 75,000–80,000 Q‑A pairs on its launch day, covering data from 2020 to present, with approximately 500 new pairs added weekly, and its answers are more detailed than those in the foundational dataset.
Description
Dataset Overview
Basic Information
- License: Apache-2.0
- Task Type: Question Answering (question-answering)
- Language: English (en)
- Tags: Infosec, Security, Cybersecurity
- Size Category: 1K<n<10K
Dataset Content
- Topic: Focused on cybersecurity, especially threat hunting, threat intelligence, and malware content.
- Foundational Dataset Characteristics:
- Answers are shorter than those in the main dataset, only half the length.
- Size is about 10% of the main dataset.
- Q‑A pairs are generated from 2023 and 2024 data.
- Selection process is semi‑random.
Main Dataset (Unreleased)
- Scale: The first day will contain approximately 75,000 to 80,000 Q‑A pairs.
- Temporal Span: From 2020 to present (4 years).
- Update Frequency: About 500 new Q‑A pairs added weekly.
- Answer Length: Answers are twice as long as those in the foundational dataset.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.