Back to datasets
Dataset assetOpen Source CommunityMultimodal Language ModelsSecurity Evaluation

JailbreakV-28K/JailBreakV-28k

JailBreakV_28K is a benchmark dataset for evaluating the robustness of multimodal large language models (MLLMs) against jailbreak attacks. It contains 28,000 jailbreak text‑image pairs, including 20,000 text‑based LLM transfer jailbreak attacks and 8,000 image‑based MLLM jailbreak attacks, covering 16 security policies and 5 different jailbreak methods. Additionally, the RedTeam_2K dataset provides 2,000 harmful queries to identify alignment vulnerabilities in LLMs and MLLMs, encompassing 16 security policies and 8 data sources.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 10, 2024
Signals
290 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • JailBreakV_28K

License

  • MIT

Task Categories

  • Text Generation
  • Question Answering

Dataset Size

  • 10K < n < 100K

Configurations

  • config_name: JailBreakV_28K
    • split: mini_JailBreakV_28K
      • path: JailBreakV_28K/mini_JailBreakV_28K.csv
    • split: JailBreakV_28K
      • path: JailBreakV_28K/JailBreakV_28K.csv
  • config_name: RedTeam_2K
    • split: RedTeam_2K
      • path: JailBreakV_28K/RedTeam_2K.csv

Dataset Details

JailBreakV_28K and mini_JailBreakV_28K Columns

  • id: Unique identifier
  • jailbreak_query: Jailbreak query generated by various jailbreak attacks
  • redteam_query: Harmful query from RedTeam_2K
  • format: Jailbreak attack method, including template, persuasion, logic, figstep, query‑related
  • policy: Security policy violated by the red‑team query
  • image_path: Image file path
  • from: Data source
  • selected_mini: "True" if the data is in the mini_JailBreakV_28K set, otherwise "False"
  • transfer_from_llm: "True" if the jailbreak query is transferred from an LLM jailbreak attack, otherwise "False"

RedTeam_2K Columns

  • id: Unique identifier
  • question: Harmful query
  • policy: Security policy violated by the red‑team query
  • from: Data source

Data Composition

RedTeam_2K

  • Description: Curated dataset of 2,000 harmful queries designed to identify alignment vulnerabilities in LLMs and MLLMs. Covers 16 security policies and includes queries from 8 different sources.

JailBreakV_28K

  • Description: Dataset containing 28,000 jailbreak text‑image pairs, including 20,000 text‑based LLM transfer jailbreak attacks and 8,000 image‑based MLLM jailbreak attacks. Covers 16 security policies and 5 different jailbreak methods.

Dataset Overview

  • RedTeam_2K: 2,000 harmful queries covering 16 security policies from 8 sources.
  • JailBreakV_28K: 28,000 jailbreak text‑image pairs covering 16 security policies and 5 jailbreak methods.

Usage Example

from datasets import load_dataset

mini_JailBreakV_28K = load_dataset("JailbreakV-28K/JailBreakV-28k", "JailBreakV_28K")["mini_JailBreakV_28K"]
JailBreakV_28K = load_dataset("JailbreakV-28K/JailBreakV-28k", "JailBreakV_28K")["JailBreakV_28K"]
RedTeam_2K = load_dataset("JailbreakV-28K/JailBreakV-28k", "RedTeam_2K")["RedTeam_2K"]
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio