JUHE API Marketplace
DATASET
Open Source Community

SimpleQA-Bench

SimpleQA‑Bench combines the SimpleQA and Chinese‑SimpleQA datasets into a multiple‑choice question (MCQ) format. The original datasets contain a large amount of long‑tail and niche knowledge, yielding low direct answer accuracy. To facilitate factuality evaluation, GPT‑4o generated three plausible yet incorrect options for each question, converting the QA pairs into MCQ format. A total of 7,324 samples were transformed, with fields including dataset name, metadata, question, answer, messages, options, and the correct option ID.

Updated 12/17/2024
huggingface

Description

SimpleQA‑Bench

Basic Information

  • Language: English (en)
  • License: MIT
  • Tags: factuality, EN, ZH, short-form-answer, human-label
  • Copyright: © 2024 alibaba‑pai

Data Sources

Dataset Description

  • Format: Multiple‑choice question (MCQ)
  • Processing: Merged SimpleQA and Chinese‑SimpleQA collections, then converted to MCQ. GPT‑4o generated three reasonable incorrect options per item.
  • Sample Count: 4,326 (SimpleQA) + 2,998 (Chinese‑SimpleQA) = 7,324 samples

Data Fields

FieldDescriptionSimpleQA ExampleChinese‑SimpleQA Example
dataset (str)Dataset nameopenai/SimpleQAOpenStellarTeam/Chinese‑SimpleQA
metadata (str)Metadata such as topics, source URLs{"topic": "Science and technology", "answer_type": "Person", "urls": ["https://en.wikipedia.org/wiki/IEEE_Frank_Rosenblatt_Award", "https://ieeexplore.ieee.org/author/37271220500", "https://en.wikipedia.org/wiki/IEEE_Frank_Rosenblatt_Award", "https://www.nxtbook.com/nxtbooks/ieee/awards_2010/index.php?startid=21#/p/20"]}{"id": "6fd2645ad3994c89a01acae98cf04f90", "primary_category": "Natural and Physical Sciences", "secondary_category": "Information Science", "urls": ["https://zh.wikipedia.org/wiki/%E8%92%99%E7%89%B9%E5%8D%A1%E6%B4%9B%E6%A0%91%E6%90%9C%E7%B4%A2"]}
question (str)Question textWho received the IEEE Frank Rosenblatt Award in 2010?Which researcher first explored Monte‑Carlo tree search in their 1987 PhD thesis and first presented its key characteristics?
answer (str)Human‑verified short answerMichio SugenoBruce Abramson
messages (List[Dict])Standard OpenAI messages used for answering the MCQ[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "# Objective ... Answers: "}]same
options (List[str])All answer options with IDs A/B/C/D["Lotfi Zadeh", "Michio Sugeno", "John McCarthy", "Stephen Grossberg"]["Bruce Abramson", "Lennart Batsch‑Fischer", "Chris Watkins", "Martin Hansen"]
answer_option (str)Correct option ID (A/B/C/D)BA

Prompts

  • GEN_WA_RROMPT: Prompt used to generate MCQs, requesting three plausible incorrect answers.
  • ANSWER_MCQ_PROMPT: Prompt used to answer MCQs, directing the model to select the correct option.

Performance Comparison

LLMSimpleQA (4,326)SimpleQA‑MCQChinese‑SimpleQA (2,998)Chinese‑SimpleQA‑MCQ
gpt‑4o‑mini‑2024‑07‑189.541.2 (1,781/4,326)37.652.9 (1,586/2,997)
qwen‑max/52.5 (2,256/4,300)54.172.7 (2,177/2,996)

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Question Answering Systems

Source

Organization: huggingface

Created: 12/6/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.