JUHE API Marketplace
DATASET
Open Source Community

bigbio/med_qa

We present MedQA, the first free‑form multiple‑choice open‑domain QA dataset for medicine, derived from professional medical examinations. It covers three languages—English, Simplified Chinese, and Traditional Chinese (Taiwan)—with 12 723, 34 251, and 14 123 questions respectively. In addition to the QA pairs, we release a large corpus of medical‑text extracted from textbooks to support reading‑comprehension models.

Updated 4/6/2024
hugging_face

Description

数据集概述

基本信息

  • 名称: MedQA
  • 语言: 英语、简体中文、繁体中文(台湾)
  • 许可证: 未知
  • 多语言支持: 是
  • 任务类型: 问答(QA)

数据集详情

  • 主页: MedQA
  • 是否公开: 是
  • 是否包含PubMed数据: 否
  • 数据规模:
    • 英语: 12,723 个问题
    • 简体中文: 34,251 个问题
    • 繁体中文(台湾): 14,123 个问题
  • 数据来源: 专业医学考试
  • 附加资源: 包含来自医学教科书的大规模语料库,供阅读理解模型使用以回答问题。

引用信息

@article{jin2021disease, title={What disease does this patient have? a large-scale open domain question answering dataset from medical exams}, author={Jin, Di and Pan, Eileen and Oufattole, Nassim and Weng, Wei-Hung and Fang, Hanyi and Szolovits, Peter}, journal={Applied Sciences}, volume={11}, number={14}, pages={6421}, year={2021}, publisher={MDPI} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Medical QA
Multilingual Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.