High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

LabHC/bias_in_bios

The Bias in Bios dataset was created by De-Artega et al. in 2019 to study bias in NLP models. It contains textual biographies for occupation prediction, with gender (binary) as the sensitive attribute. Ravgofel et al. introduced a slightly smaller version in 2020 due to the loss of 5,557 biographies. The dataset is split into a training set (257,000 samples), a test set (99,000 samples) and a development set (40,000 samples). Classification labels comprise 28 occupations, each with a numerical label and proportion. Sensitive attribute labels are Male (label 0, 53.9%) and Female (label 1, 46.1%).

hugging_face

View Details