Dataset assetOpen Source CommunityText ClassificationBias Evaluation

nyu-mll/crows_pairs

CrowS‑Pairs is a challenging dataset for evaluating social bias in masked language models. It contains 1,508 test samples, each comprising two sentences—one more biased and one less biased. The dataset covers various bias types such as race, gender, religion, etc. It was constructed from the fictional portions of ROCStories and MNLI and annotated via crowdsourcing.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jan 18, 2024

Signals

292 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Name: CrowS-Pairs

Language: English (en)

License: Creative Commons Attribution‑ShareAlike 4.0 International License (cc-by-sa-4.0)

Multilinguality: Monolingual

Size: 1K < n < 10K

Source: Original data

Task Category: Text Classification

Task ID: Text Scoring

Labels: Bias Evaluation

Dataset Structure

Features:

id: int32
sent_more: string
sent_less: string
stereo_antistereo: categorical (stereo, antistereo)
bias_type: categorical (race‑color, socioeconomic, gender, disability, nationality, sexual‑orientation, physical‑appearance, religion, age)
annotations: sequence of bias_type labels
anon_writer: string
anon_annotators: sequence of strings

Splits:

Test set: 1,508 samples, 419,976 bytes

Download Size: 437,764 bytes

Dataset Size: 419,976 bytes

Dataset Creation

License Information: Distributed under CC‑BY‑SA 4.0.

Source Data: Created using prompts derived from the fictional portions of ROCStories and MNLI.

Contributors: Thanks to @patil‑suraj for adding this dataset.

Citation:

@inproceedings{nangia-etal-2020-crows,
    title = "{C}row{S}-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models",
    author = "Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R.",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.154",
    doi = "10.18653/v1/2020.emnlp-main.154",
    pages = "1953--1967",
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.