High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

qwq-misguided-attention

The dataset contains responses from the QwQ‑32B‑Preview model to the MisguidedAttention prompt challenge. MisguidedAttention challenges are carefully designed prompts that test large language models' reasoning abilities when presented with misleading information. These prompts are modified versions of well‑known thought experiments, puzzles, and paradoxes, requiring step‑by‑step logical analysis rather than pattern matching. The dataset showcases the model's performance on these prompts, including both successful and failed cases.

huggingface

View Details

botp/Open-Platypus

Logical Reasoning

Large Language Models

OpenPlatypus数据集专注于提高大型语言模型（LLM）的逻辑推理能力，并用于训练Platypus2模型。该数据集由多个子数据集组成，包括PRM800K、ScienceQA、SciBench、ReClor、TheoremQA等，这些数据集通过关键词搜索和Sentence Transformers进行过滤，去除相似度超过80%的问题。此外，还移除了大约200个出现在Hugging Face基准测试集中的问题。数据集的特征包括输入、输出和指令，均为字符串类型，训练集包含24,926个示例，总大小为30,418,784字节。

hugging_face

View Details

CBT-Bench

Cognitive Behavioral Therapy

Large Language Models

CBT‑Bench is a benchmark dataset aimed at evaluating large language models (LLMs) in assisting cognitive behavioral therapy (CBT). The dataset consists of three levels, each focusing on different key aspects of CBT, including basic knowledge recall, cognitive model understanding, and therapeutic response generation. Its goal is to assess LLMs’ support capability across various stages of professional mental health care, particularly CBT. The dataset includes multiple tasks and data files such as multiple‑choice questions, cognitive distortion classification, and therapeutic response generation exercises.

huggingface

View Details

ProcessTBench

Process Mining

Large Language Models

ProcessTBench is a synthetic dataset for evaluating the planning capabilities of large language models (LLMs) within a process mining framework. Built upon TaskBench, it contains 532 base queries, each paraphrased 5–6 times, with an average of 4.08 solution plans per query. The dataset involves action sequences using 40 distinct tools and provides corresponding ground‑truth plans in Petri‑net format. Creation involved selecting the most challenging subset from TaskBench, generating plans with LLMs, and processing them using an event‑log parser and a plan‑conformance checker. ProcessTBench aims to support research on LLM plan generation in complex and dynamic environments, especially regarding multilingual and paraphrased queries.

arXiv

View Details