Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingGender Bias

GenderAlign

GenderAlign is a dataset co‑developed by Southern University of Science and Technology and the PaZhou Laboratory, focusing on mitigating gender bias in large language models. It contains 8,000 single‑turn dialogues, each paired with a 'chosen' and a 'rejected' response, intended to contrast unbiased and biased conversational patterns. In its creation, researchers first collected seed texts exhibiting gender bias or describing gender differences from existing datasets and books, then automatically generated dialogues using GPT‑3.5. GenderAlign's primary application area is improving gender bias in language models by providing high‑quality unbiased dialogue samples to help models better understand and generate fair text.

Source
arXiv
Created
Jun 20, 2024
Updated
Jun 20, 2024
Signals
457 views
Availability
Linked source ready
Overview

Dataset description and usage context

GenderAlign: Alignment Dataset for Mitigating Gender Bias in Large Language Models

Dataset Description

The dataset is described in the paper "GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models".

If you find this dataset useful, please cite the paper.

The dataset format is very simple—each entry contains a pair of texts, a 'chosen' and a 'rejected'.

Disclaimer

The dataset contains content that may be offensive or unsettling. Topics include, but are not limited to, gender bias, gender stereotypes, gender‑based violence, and other potentially disturbing subjects. Interact with the dataset according to your personal risk tolerance. The dataset is intended for research purposes, specifically for studies aiming to reduce gender bias in models. The views expressed in the data do not represent the authors' views.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio