nyu-mll/crows_pairs
CrowS‑Pairs is a challenging dataset for evaluating social bias in masked language models. It contains 1,508 test samples, each comprising two sentences—one more biased and one less biased. The dataset covers various bias types such as race, gender, religion, etc. It was constructed from the fictional portions of ROCStories and MNLI and annotated via crowdsourcing.
Description
Dataset Overview
Name: CrowS-Pairs
Language: English (en)
License: Creative Commons Attribution‑ShareAlike 4.0 International License (cc-by-sa-4.0)
Multilinguality: Monolingual
Size: 1K < n < 10K
Source: Original data
Task Category: Text Classification
Task ID: Text Scoring
Labels: Bias Evaluation
Dataset Structure
Features:
- id: int32
- sent_more: string
- sent_less: string
- stereo_antistereo: categorical (stereo, antistereo)
- bias_type: categorical (race‑color, socioeconomic, gender, disability, nationality, sexual‑orientation, physical‑appearance, religion, age)
- annotations: sequence of bias_type labels
- anon_writer: string
- anon_annotators: sequence of strings
Splits:
- Test set: 1,508 samples, 419,976 bytes
Download Size: 437,764 bytes
Dataset Size: 419,976 bytes
Dataset Creation
License Information: Distributed under CC‑BY‑SA 4.0.
Source Data: Created using prompts derived from the fictional portions of ROCStories and MNLI.
Contributors: Thanks to @patil‑suraj for adding this dataset.
Citation:
@inproceedings{nangia-etal-2020-crows,
title = "{C}row{S}-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models",
author = "Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R.",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.emnlp-main.154",
doi = "10.18653/v1/2020.emnlp-main.154",
pages = "1953--1967",
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.