Dataset assetOpen Source CommunityAI SafetyContent Moderation

PKU-Alignment/BeaverTails

BeaverTails is a collection of AI‑safety‑focused datasets containing a series of human‑annotated question‑answer pairs, each labeled with a corresponding harm category. The dataset covers 14 harm categories such as animal abuse, child abuse, discrimination, hate speech, etc. It is intended for research, especially for creating safer, less‑harmful AI systems. The dataset includes multiple splits: 330k_train, 330k_test, 30k_train and 30k_test.

Source

hugging_face

Created

Nov 28, 2025

Updated

Oct 17, 2023

Signals

417 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Name: BeaverTails

Purpose: AI safety research aimed at developing safer, less‑harmful AI systems.

Content: Human‑labeled question‑answer (QA) pairs, each associated with one or more harm categories.

Harm Categories:

Animal abuse
Child abuse
Controversial topics, politics
Discrimination, stereotypes, unfairness
Drug abuse, weapons, prohibited substances
Financial crime, property crime, theft
Hate speech, offensive language
Misinformation about ethics, law, and safety
Non‑violent unethical behavior
Privacy violation
Self‑harm
Sexual content, adult content
Terrorism, organized crime
Violence, assistance and incitement, provocation

Dataset Size: 100K<n<1M

Data File Configuration:

Default configuration:
- Training set:
  - 330k_train: round0/330k/train.jsonl.xz
  - 30k_train: round0/30k/train.jsonl.gz
- Test set:
  - 330k_test: round0/30k/test.jsonl.xz
  - 30k_test: round0/30k/test.jsonl.gz

License: CC BY‑NC 4.0

Language: English

Task Category: Text classification

Tags: safety, AI safety, moderation, refusal sampling, LLM, LM, human feedback

Usage Warning: The dataset contains potentially offensive or disturbing content and should be used responsibly, respecting individual risk tolerance. It should not be used to train dialogue agents that might produce harmful behavior.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio