Explore high-quality datasets for your AI and machine learning projects.
RM-BENCH is a novel benchmark dataset developed by Tsinghua University to evaluate reward models' sensitivity to fine-grained content differences and resistance to style bias. The dataset covers four key domains: chat, code, mathematics, and safety, encompassing a wide range of real-world scenarios. It is constructed by generating selected and rejected responses using the same powerful language model and introducing style-controlled variants to assess reward model bias. RM-BENCH is designed to address shortcomings of existing reward models in evaluating subtle content changes and style bias, thereby improving alignment accuracy of language models.