JUHE API Marketplace
DATASET
Open Source Community

JailbreakV-28K/JailBreakV-28k

JailBreakV_28K is a benchmark dataset for evaluating the robustness of multimodal large language models (MLLMs) against jailbreak attacks. It contains 28,000 jailbreak text‑image pairs, including 20,000 text‑based LLM transfer jailbreak attacks and 8,000 image‑based MLLM jailbreak attacks, covering 16 security policies and 5 different jailbreak methods. Additionally, the RedTeam_2K dataset provides 2,000 harmful queries to identify alignment vulnerabilities in LLMs and MLLMs, encompassing 16 security policies and 8 data sources.

Updated 7/10/2024
hugging_face

Description

Dataset Overview

Dataset Name

  • JailBreakV_28K

License

  • MIT

Task Categories

  • Text Generation
  • Question Answering

Dataset Size

  • 10K < n < 100K

Configurations

  • config_name: JailBreakV_28K
    • split: mini_JailBreakV_28K
      • path: JailBreakV_28K/mini_JailBreakV_28K.csv
    • split: JailBreakV_28K
      • path: JailBreakV_28K/JailBreakV_28K.csv
  • config_name: RedTeam_2K
    • split: RedTeam_2K
      • path: JailBreakV_28K/RedTeam_2K.csv

Dataset Details

JailBreakV_28K and mini_JailBreakV_28K Columns

  • id: Unique identifier
  • jailbreak_query: Jailbreak query generated by various jailbreak attacks
  • redteam_query: Harmful query from RedTeam_2K
  • format: Jailbreak attack method, including template, persuasion, logic, figstep, query‑related
  • policy: Security policy violated by the red‑team query
  • image_path: Image file path
  • from: Data source
  • selected_mini: "True" if the data is in the mini_JailBreakV_28K set, otherwise "False"
  • transfer_from_llm: "True" if the jailbreak query is transferred from an LLM jailbreak attack, otherwise "False"

RedTeam_2K Columns

  • id: Unique identifier
  • question: Harmful query
  • policy: Security policy violated by the red‑team query
  • from: Data source

Data Composition

RedTeam_2K

  • Description: Curated dataset of 2,000 harmful queries designed to identify alignment vulnerabilities in LLMs and MLLMs. Covers 16 security policies and includes queries from 8 different sources.

JailBreakV_28K

  • Description: Dataset containing 28,000 jailbreak text‑image pairs, including 20,000 text‑based LLM transfer jailbreak attacks and 8,000 image‑based MLLM jailbreak attacks. Covers 16 security policies and 5 different jailbreak methods.

Dataset Overview

  • RedTeam_2K: 2,000 harmful queries covering 16 security policies from 8 sources.
  • JailBreakV_28K: 28,000 jailbreak text‑image pairs covering 16 security policies and 5 jailbreak methods.

Usage Example

from datasets import load_dataset

mini_JailBreakV_28K = load_dataset("JailbreakV-28K/JailBreakV-28k", "JailBreakV_28K")["mini_JailBreakV_28K"]
JailBreakV_28K = load_dataset("JailbreakV-28K/JailBreakV-28k", "JailBreakV_28K")["JailBreakV_28K"]
RedTeam_2K = load_dataset("JailbreakV-28K/JailBreakV-28k", "RedTeam_2K")["RedTeam_2K"]

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Multimodal Language Models
Security Evaluation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.