Datasets | JuheAPI

stanfordnlp/SHP

Machine LearningDataset Difficulty Evaluation

The Stanford Human Preferences Dataset (SHP) comprises 385 K Reddit user preference records across 18 distinct topical domains, intended for training RLHF reward models and NLG evaluation models. Each example consists of a Reddit post containing a question or instruction and a pair of top‑level comments, where one comment is preferred by the Reddit community. Preferences are inferred from timestamp information to ensure they reflect comment helpfulness rather than harmfulness. The dataset includes training, validation, and test splits for 18 sub‑forums, with each sub‑forum stored as JSONL files.

Source hugging_faceUpdated Oct 10, 2023365 viewsLinked

Inspect dataset

Dataset Catalog