JUHE API Marketplace
DATASET
Open Source Community

nyu-mll/crows_pairs

CrowS‑Pairs is a challenging dataset for evaluating social bias in masked language models. It contains 1,508 test samples, each comprising two sentences—one more biased and one less biased. The dataset covers various bias types such as race, gender, religion, etc. It was constructed from the fictional portions of ROCStories and MNLI and annotated via crowdsourcing.

Updated 1/18/2024
hugging_face

Description

Dataset Overview

Name: CrowS-Pairs

Language: English (en)

License: Creative Commons Attribution‑ShareAlike 4.0 International License (cc-by-sa-4.0)

Multilinguality: Monolingual

Size: 1K < n < 10K

Source: Original data

Task Category: Text Classification

Task ID: Text Scoring

Labels: Bias Evaluation

Dataset Structure

Features:

  • id: int32
  • sent_more: string
  • sent_less: string
  • stereo_antistereo: categorical (stereo, antistereo)
  • bias_type: categorical (race‑color, socioeconomic, gender, disability, nationality, sexual‑orientation, physical‑appearance, religion, age)
  • annotations: sequence of bias_type labels
  • anon_writer: string
  • anon_annotators: sequence of strings

Splits:

  • Test set: 1,508 samples, 419,976 bytes

Download Size: 437,764 bytes

Dataset Size: 419,976 bytes

Dataset Creation

License Information: Distributed under CC‑BY‑SA 4.0.

Source Data: Created using prompts derived from the fictional portions of ROCStories and MNLI.

Contributors: Thanks to @patil‑suraj for adding this dataset.

Citation:

@inproceedings{nangia-etal-2020-crows,
    title = "{C}row{S}-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models",
    author = "Nangia, Nikita and Vania, Clara and Bhalerao, Rasika and Bowman, Samuel R.",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.154",
    doi = "10.18653/v1/2020.emnlp-main.154",
    pages = "1953--1967",
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Bias Evaluation
Text Classification

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.