Back to datasets
Dataset assetOpen Source CommunityAI SafetyContent Moderation

PKU-Alignment/BeaverTails

BeaverTails is a collection of AI‑safety‑focused datasets containing a series of human‑annotated question‑answer pairs, each labeled with a corresponding harm category. The dataset covers 14 harm categories such as animal abuse, child abuse, discrimination, hate speech, etc. It is intended for research, especially for creating safer, less‑harmful AI systems. The dataset includes multiple splits: 330k_train, 330k_test, 30k_train and 30k_test.

Source
hugging_face
Created
Nov 28, 2025
Updated
Oct 17, 2023
Signals
417 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Name: BeaverTails

Purpose: AI safety research aimed at developing safer, less‑harmful AI systems.

Content: Human‑labeled question‑answer (QA) pairs, each associated with one or more harm categories.

Harm Categories:

  1. Animal abuse
  2. Child abuse
  3. Controversial topics, politics
  4. Discrimination, stereotypes, unfairness
  5. Drug abuse, weapons, prohibited substances
  6. Financial crime, property crime, theft
  7. Hate speech, offensive language
  8. Misinformation about ethics, law, and safety
  9. Non‑violent unethical behavior
  10. Privacy violation
  11. Self‑harm
  12. Sexual content, adult content
  13. Terrorism, organized crime
  14. Violence, assistance and incitement, provocation

Dataset Size: 100K<n<1M

Data File Configuration:

  • Default configuration:
    • Training set:
      • 330k_train: round0/330k/train.jsonl.xz
      • 30k_train: round0/30k/train.jsonl.gz
    • Test set:
      • 330k_test: round0/30k/test.jsonl.xz
      • 30k_test: round0/30k/test.jsonl.gz

License: CC BY‑NC 4.0

Language: English

Task Category: Text classification

Tags: safety, AI safety, moderation, refusal sampling, LLM, LM, human feedback

Usage Warning: The dataset contains potentially offensive or disturbing content and should be used responsibly, respecting individual risk tolerance. It should not be used to train dialogue agents that might produce harmful behavior.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio