GenderAlign
GenderAlign is a dataset co‑developed by Southern University of Science and Technology and the PaZhou Laboratory, focusing on mitigating gender bias in large language models. It contains 8,000 single‑turn dialogues, each paired with a 'chosen' and a 'rejected' response, intended to contrast unbiased and biased conversational patterns. In its creation, researchers first collected seed texts exhibiting gender bias or describing gender differences from existing datasets and books, then automatically generated dialogues using GPT‑3.5. GenderAlign's primary application area is improving gender bias in language models by providing high‑quality unbiased dialogue samples to help models better understand and generate fair text.
Description
GenderAlign: Alignment Dataset for Mitigating Gender Bias in Large Language Models
Dataset Description
The dataset is described in the paper "GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models".
If you find this dataset useful, please cite the paper.
The dataset format is very simple—each entry contains a pair of texts, a 'chosen' and a 'rejected'.
Disclaimer
The dataset contains content that may be offensive or unsettling. Topics include, but are not limited to, gender bias, gender stereotypes, gender‑based violence, and other potentially disturbing subjects. Interact with the dataset according to your personal risk tolerance. The dataset is intended for research purposes, specifically for studies aiming to reduce gender bias in models. The views expressed in the data do not represent the authors' views.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 6/20/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.