Back to datasets
Dataset assetOpen Source CommunityText ClassificationBias Evaluation

nyu-mll/crows_pairs

CrowS‑Pairs is a challenging dataset for evaluating social bias in masked language models. It contains 1,508 test samples, each comprising two sentences—one more biased and one less biased. The dataset covers various bias types such as race, gender, religion, etc. It was constructed from the fictional portions of ROCStories and MNLI and annotated via crowdsourcing.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 18, 2024
Signals
292 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Name: CrowS-Pairs

Language: English (en)

License: Creative Commons Attribution‑ShareAlike 4.0 International License (cc-by-sa-4.0)

Multilinguality: Monolingual

Size: 1K < n < 10K

Source: Original data

Task Category: Text Classification

Task ID: Text Scoring

Labels: Bias Evaluation

Dataset Structure

Features:

  • id: int32
  • sent_more: string
  • sent_less: string
  • stereo_antistereo: categorical (stereo, antistereo)
  • bias_type: categorical (race‑color, socioeconomic, gender, disability, nationality, sexual‑orientation, physical‑appearance, religion, age)
  • annotations: sequence of bias_type labels
  • anon_writer: string
  • anon_annotators: sequence of strings

Splits:

  • Test set: 1,508 samples, 419,976 bytes

Download Size: 437,764 bytes

Dataset Size: 419,976 bytes

Dataset Creation

License Information: Distributed under CC‑BY‑SA 4.0.

Source Data: Created using prompts derived from the fictional portions of ROCStories and MNLI.

Contributors: Thanks to @patil‑suraj for adding this dataset.

Citation:

@inproceedings{nangia-etal-2020-crows,
    title = "{C}row{S}-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models",
    author = "Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R.",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.154",
    doi = "10.18653/v1/2020.emnlp-main.154",
    pages = "1953--1967",
}
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.