JUHE API Marketplace
DATASET
Open Source Community

MLCE(Medical-LLMs-Chinese-Exam)

The MLCE dataset gathers Chinese medical examination and competition datasets to support evaluation of large language models on specialized medical abilities and to enable targeted training, aiming to promote development of comprehensive medical LLMs.

Updated 7/7/2024
github

Description

Medical Large‑Model Chinese Exam Evaluation

Dataset Introduction

MLCE (Medical‑LLMs‑Chinese‑Exam): Medical Large‑Model Chinese Exam Evaluation aggregates data from various Chinese medical examinations and related competitions to assist in assessing large models' specialized abilities and to facilitate targeted training, with the goal of fostering comprehensive medical LLM development.

Dataset Progress

  • [2024/7/7] 2017‑2021 National Physician Qualification Exam, National Pharmacist Qualification Exam, and National Nurse Qualification Exam questions.
  • [2024/7/7] First open‑source release of the MLCE dataset.

Sample Data

Questions are stored in the following JSON format:

{
    "id": "",
    "question": "",
    "options": {},
    "answer": "",
    "question_type": ""
}

Example:

{
    "id": "2017-Unit1-1",
    "question": "Male, 40 years old, dizziness and headache for two weeks, three consecutive blood pressure readings 21",
    "options": {
        "A": "Acute hypertension",
        "B": "Chronic nephritis",
        "C": "Hyperthyroidism",
        "D": "Primary hypertension",
        "E": "SLE"
    },
    "answer": "D",
    "question_type": "Single‑choice"
}

Dataset Details

Dataset NameSample CountSourceData Origin
2017‑2021physician.json3000National Physician Qualification ExamLLM‑Chinese‑NMLE
2017‑2021pharmacist.json2400National Pharmacist Qualification ExamLLM‑Chinese‑NMLE
2017‑2021nurse.json1200National Nurse Qualification ExamLLM‑Chinese‑NMLE
Total6600

Thanks to all open‑source contributors! Processed data are available under data/.

More data are being prepared.

Contact

If you are interested in this work, have data contributions, or any questions, please contact: jingnant@163.com

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Medical Domain
Model Training

Source

Organization: github

Created: 7/7/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.