JailbreakV-28K/JailBreakV-28k
JailBreakV_28K is a benchmark dataset for evaluating the robustness of multimodal large language models (MLLMs) against jailbreak attacks. It contains 28,000 jailbreak text‑image pairs, including 20,000 text‑based LLM transfer jailbreak attacks and 8,000 image‑based MLLM jailbreak attacks, covering 16 security policies and 5 different jailbreak methods. Additionally, the RedTeam_2K dataset provides 2,000 harmful queries to identify alignment vulnerabilities in LLMs and MLLMs, encompassing 16 security policies and 8 data sources.
Description
Dataset Overview
Dataset Name
- JailBreakV_28K
License
- MIT
Task Categories
- Text Generation
- Question Answering
Dataset Size
- 10K < n < 100K
Configurations
- config_name: JailBreakV_28K
- split: mini_JailBreakV_28K
- path: JailBreakV_28K/mini_JailBreakV_28K.csv
- split: JailBreakV_28K
- path: JailBreakV_28K/JailBreakV_28K.csv
- split: mini_JailBreakV_28K
- config_name: RedTeam_2K
- split: RedTeam_2K
- path: JailBreakV_28K/RedTeam_2K.csv
- split: RedTeam_2K
Dataset Details
JailBreakV_28K and mini_JailBreakV_28K Columns
- id: Unique identifier
- jailbreak_query: Jailbreak query generated by various jailbreak attacks
- redteam_query: Harmful query from RedTeam_2K
- format: Jailbreak attack method, including template, persuasion, logic, figstep, query‑related
- policy: Security policy violated by the red‑team query
- image_path: Image file path
- from: Data source
- selected_mini: "True" if the data is in the mini_JailBreakV_28K set, otherwise "False"
- transfer_from_llm: "True" if the jailbreak query is transferred from an LLM jailbreak attack, otherwise "False"
RedTeam_2K Columns
- id: Unique identifier
- question: Harmful query
- policy: Security policy violated by the red‑team query
- from: Data source
Data Composition
RedTeam_2K
- Description: Curated dataset of 2,000 harmful queries designed to identify alignment vulnerabilities in LLMs and MLLMs. Covers 16 security policies and includes queries from 8 different sources.
JailBreakV_28K
- Description: Dataset containing 28,000 jailbreak text‑image pairs, including 20,000 text‑based LLM transfer jailbreak attacks and 8,000 image‑based MLLM jailbreak attacks. Covers 16 security policies and 5 different jailbreak methods.
Dataset Overview
- RedTeam_2K: 2,000 harmful queries covering 16 security policies from 8 sources.
- JailBreakV_28K: 28,000 jailbreak text‑image pairs covering 16 security policies and 5 jailbreak methods.
Usage Example
from datasets import load_dataset
mini_JailBreakV_28K = load_dataset("JailbreakV-28K/JailBreakV-28k", "JailBreakV_28K")["mini_JailBreakV_28K"]
JailBreakV_28K = load_dataset("JailbreakV-28K/JailBreakV-28k", "JailBreakV_28K")["JailBreakV_28K"]
RedTeam_2K = load_dataset("JailbreakV-28K/JailBreakV-28k", "RedTeam_2K")["RedTeam_2K"]
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.