nyu-mll/crows_pairs
CrowS‑Pairs is a challenging dataset for evaluating social bias in masked language models. It contains 1,508 test samples, each comprising two sentences—one more biased and one less biased. The dataset covers various bias types such as race, gender, religion, etc. It was constructed from the fictional portions of ROCStories and MNLI and annotated via crowdsourcing.
Dataset description and usage context
Dataset Overview
Name: CrowS-Pairs
Language: English (en)
License: Creative Commons Attribution‑ShareAlike 4.0 International License (cc-by-sa-4.0)
Multilinguality: Monolingual
Size: 1K < n < 10K
Source: Original data
Task Category: Text Classification
Task ID: Text Scoring
Labels: Bias Evaluation
Dataset Structure
Features:
- id: int32
- sent_more: string
- sent_less: string
- stereo_antistereo: categorical (stereo, antistereo)
- bias_type: categorical (race‑color, socioeconomic, gender, disability, nationality, sexual‑orientation, physical‑appearance, religion, age)
- annotations: sequence of bias_type labels
- anon_writer: string
- anon_annotators: sequence of strings
Splits:
- Test set: 1,508 samples, 419,976 bytes
Download Size: 437,764 bytes
Dataset Size: 419,976 bytes
Dataset Creation
License Information: Distributed under CC‑BY‑SA 4.0.
Source Data: Created using prompts derived from the fictional portions of ROCStories and MNLI.
Contributors: Thanks to @patil‑suraj for adding this dataset.
Citation:
@inproceedings{nangia-etal-2020-crows,
title = "{C}row{S}-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models",
author = "Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R.",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.emnlp-main.154",
doi = "10.18653/v1/2020.emnlp-main.154",
pages = "1953--1967",
}
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.