LabHC/bias_in_bios
The Bias in Bios dataset was created by De-Artega et al. in 2019 to study bias in NLP models. It contains textual biographies for occupation prediction, with gender (binary) as the sensitive attribute. Ravgofel et al. introduced a slightly smaller version in 2020 due to the loss of 5,557 biographies. The dataset is split into a training set (257,000 samples), a test set (99,000 samples) and a development set (40,000 samples). Classification labels comprise 28 occupations, each with a numerical label and proportion. Sensitive attribute labels are Male (label 0, 53.9%) and Female (label 1, 46.1%).
Description
Bias in Bios Dataset Overview
Basic Information
- License: MIT
- Task Category: Text Classification
- Language: English
Dataset Features
- Feature List:
hard_text: string, the biography textprofession: 64‑bit integer, occupation labelgender: 64‑bit integer, gender label
Dataset Splits
- Training Set:
- Bytes: 107487885
- Samples: 257478
- Test Set:
- Bytes: 41312256
- Samples: 99069
- Development Set:
- Bytes: 16504417
- Samples: 39642
Dataset Size
- Download Size: 99808338 bytes
- Total Size: 165304558 bytes
Classification Labels
| Occupation | Numerical Label | Proportion (%) |
|---|---|---|
| accountant | 0 | 1.42 |
| architect | 1 | 2.55 |
| attorney | 2 | 8.22 |
| chiropractor | 3 | 0.67 |
| comedian | 4 | 0.71 |
| composer | 5 | 1.41 |
| dentist | 6 | 3.68 |
| dietitian | 7 | 1.0 |
| dj | 8 | 0.38 |
| filmmaker | 9 | 1.77 |
| interior_designer | 10 | 0.37 |
| journalist | 11 | 5.03 |
| model | 12 | 1.89 |
| nurse | 13 | 4.78 |
| painter | 14 | 1.95 |
| paralegal | 15 | 0.45 |
| pastor | 16 | 0.64 |
| personal_trainer | 17 | 0.36 |
| photographer | 18 | 6.13 |
| physician | 19 | 10.35 |
| poet | 20 | 1.77 |
| professor | 21 | 29.8 |
| psychologist | 22 | 4.64 |
| rapper | 23 | 0.35 |
| software_engineer | 24 | 1.74 |
| surgeon | 25 | 3.43 |
| teacher | 26 | 4.09 |
| yoga_teacher | 27 | 0.42 |
Sensitive Attribute
| Gender | Numerical Label | Proportion (%) |
|---|---|---|
| Male | 0 | 53.9 |
| Female | 1 | 46.1 |
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.