JUHE API Marketplace
DATASET
Open Source Community

GenderAlign

GenderAlign is a dataset co‑developed by Southern University of Science and Technology and the PaZhou Laboratory, focusing on mitigating gender bias in large language models. It contains 8,000 single‑turn dialogues, each paired with a 'chosen' and a 'rejected' response, intended to contrast unbiased and biased conversational patterns. In its creation, researchers first collected seed texts exhibiting gender bias or describing gender differences from existing datasets and books, then automatically generated dialogues using GPT‑3.5. GenderAlign's primary application area is improving gender bias in language models by providing high‑quality unbiased dialogue samples to help models better understand and generate fair text.

Updated 6/20/2024
arXiv

Description

GenderAlign: Alignment Dataset for Mitigating Gender Bias in Large Language Models

Dataset Description

The dataset is described in the paper "GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models".

If you find this dataset useful, please cite the paper.

The dataset format is very simple—each entry contains a pair of texts, a 'chosen' and a 'rejected'.

Disclaimer

The dataset contains content that may be offensive or unsettling. Topics include, but are not limited to, gender bias, gender stereotypes, gender‑based violence, and other potentially disturbing subjects. Interact with the dataset according to your personal risk tolerance. The dataset is intended for research purposes, specifically for studies aiming to reduce gender bias in models. The views expressed in the data do not represent the authors' views.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Gender Bias
Natural Language Processing

Source

Organization: arXiv

Created: 6/20/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.