Explore high-quality datasets for your AI and machine learning projects.
LogicGame is a benchmark for evaluating large language models' (LLMs) understanding, execution, and planning of logical rules. It includes a diverse set of games with predefined rules, specifically designed to assess logical reasoning independently of factual knowledge. The benchmark measures model performance across varying difficulty levels, aiming for a comprehensive evaluation of rule‑based reasoning and multi‑step execution and planning capabilities.
The dataset contains responses from the QwQ‑32B‑Preview model to the MisguidedAttention prompt challenge. MisguidedAttention challenges are carefully designed prompts that test large language models' reasoning abilities when presented with misleading information. These prompts are modified versions of well‑known thought experiments, puzzles, and paradoxes, requiring step‑by‑step logical analysis rather than pattern matching. The dataset showcases the model's performance on these prompts, including both successful and failed cases.
OpenPlatypus数据集专注于提高大型语言模型(LLM)的逻辑推理能力,并用于训练Platypus2模型。该数据集由多个子数据集组成,包括PRM800K、ScienceQA、SciBench、ReClor、TheoremQA等,这些数据集通过关键词搜索和Sentence Transformers进行过滤,去除相似度超过80%的问题。此外,还移除了大约200个出现在Hugging Face基准测试集中的问题。数据集的特征包括输入、输出和指令,均为字符串类型,训练集包含24,926个示例,总大小为30,418,784字节。
The proofwriter dataset includes training, validation, and test splits, each with corresponding file paths and statistics. Features include facts, rules, question, answer, depth, length, used_facts, and used_rules.