bigbio/med_qa
We present MedQA, the first free‑form multiple‑choice open‑domain QA dataset for medicine, derived from professional medical examinations. It covers three languages—English, Simplified Chinese, and Traditional Chinese (Taiwan)—with 12 723, 34 251, and 14 123 questions respectively. In addition to the QA pairs, we release a large corpus of medical‑text extracted from textbooks to support reading‑comprehension models.
Description
数据集概述
基本信息
- 名称: MedQA
- 语言: 英语、简体中文、繁体中文(台湾)
- 许可证: 未知
- 多语言支持: 是
- 任务类型: 问答(QA)
数据集详情
- 主页: MedQA
- 是否公开: 是
- 是否包含PubMed数据: 否
- 数据规模:
- 英语: 12,723 个问题
- 简体中文: 34,251 个问题
- 繁体中文(台湾): 14,123 个问题
- 数据来源: 专业医学考试
- 附加资源: 包含来自医学教科书的大规模语料库,供阅读理解模型使用以回答问题。
引用信息
@article{jin2021disease, title={What disease does this patient have? a large-scale open domain question answering dataset from medical exams}, author={Jin, Di and Pan, Eileen and Oufattole, Nassim and Weng, Wei-Hung and Fang, Hanyi and Szolovits, Peter}, journal={Applied Sciences}, volume={11}, number={14}, pages={6421}, year={2021}, publisher={MDPI} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.