Explore high-quality datasets for your AI and machine learning projects.
GenderAlign is a dataset co‑developed by Southern University of Science and Technology and the PaZhou Laboratory, focusing on mitigating gender bias in large language models. It contains 8,000 single‑turn dialogues, each paired with a 'chosen' and a 'rejected' response, intended to contrast unbiased and biased conversational patterns. In its creation, researchers first collected seed texts exhibiting gender bias or describing gender differences from existing datasets and books, then automatically generated dialogues using GPT‑3.5. GenderAlign's primary application area is improving gender bias in language models by providing high‑quality unbiased dialogue samples to help models better understand and generate fair text.