JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

LogicGame

Logical Reasoning
Large Language Model Evaluation

LogicGame is a benchmark for evaluating large language models' (LLMs) understanding, execution, and planning of logical rules. It includes a diverse set of games with predefined rules, specifically designed to assess logical reasoning independently of factual knowledge. The benchmark measures model performance across varying difficulty levels, aiming for a comprehensive evaluation of rule‑based reasoning and multi‑step execution and planning capabilities.

github
View Details

qwq-misguided-attention

Large Language Models
Logical Reasoning

The dataset contains responses from the QwQ‑32B‑Preview model to the MisguidedAttention prompt challenge. MisguidedAttention challenges are carefully designed prompts that test large language models' reasoning abilities when presented with misleading information. These prompts are modified versions of well‑known thought experiments, puzzles, and paradoxes, requiring step‑by‑step logical analysis rather than pattern matching. The dataset showcases the model's performance on these prompts, including both successful and failed cases.

huggingface
View Details

botp/Open-Platypus

Logical Reasoning
Large Language Models

OpenPlatypus数据集专注于提高大型语言模型(LLM)的逻辑推理能力,并用于训练Platypus2模型。该数据集由多个子数据集组成,包括PRM800K、ScienceQA、SciBench、ReClor、TheoremQA等,这些数据集通过关键词搜索和Sentence Transformers进行过滤,去除相似度超过80%的问题。此外,还移除了大约200个出现在Hugging Face基准测试集中的问题。数据集的特征包括输入、输出和指令,均为字符串类型,训练集包含24,926个示例,总大小为30,418,784字节。

hugging_face
View Details

wentingzhao/proofwriter

Logical Reasoning
Automated Theorem Generation

The proofwriter dataset includes training, validation, and test splits, each with corresponding file paths and statistics. Features include facts, rules, question, answer, depth, length, used_facts, and used_rules.

hugging_face
View Details