Explore high-quality datasets for your AI and machine learning projects.
The AttaQ dataset contains 1,402 carefully crafted adversarial questions designed to assess the propensity of large language models (LLMs) to produce harmful or undesirable responses. The dataset is divided into seven categories: deception, discrimination, harmful information, substance abuse, sexual content, personally identifying information (PII), and violence. It can be used to evaluate LLM behavior and explore factors influencing their responses, ultimately aiming to improve their harmlessness and ethical use.