Back to datasets
Dataset assetOpen Source CommunityMedical DomainModel Training

MLCE(Medical-LLMs-Chinese-Exam)

The MLCE dataset gathers Chinese medical examination and competition datasets to support evaluation of large language models on specialized medical abilities and to enable targeted training, aiming to promote development of comprehensive medical LLMs.

Source
github
Created
Jul 7, 2024
Updated
Jul 7, 2024
Signals
176 views
Availability
Linked source ready
Overview

Dataset description and usage context

Medical Large‑Model Chinese Exam Evaluation

Dataset Introduction

MLCE (Medical‑LLMs‑Chinese‑Exam): Medical Large‑Model Chinese Exam Evaluation aggregates data from various Chinese medical examinations and related competitions to assist in assessing large models' specialized abilities and to facilitate targeted training, with the goal of fostering comprehensive medical LLM development.

Dataset Progress

  • [2024/7/7] 2017‑2021 National Physician Qualification Exam, National Pharmacist Qualification Exam, and National Nurse Qualification Exam questions.
  • [2024/7/7] First open‑source release of the MLCE dataset.

Sample Data

Questions are stored in the following JSON format:

{
    "id": "",
    "question": "",
    "options": {},
    "answer": "",
    "question_type": ""
}

Example:

{
    "id": "2017-Unit1-1",
    "question": "Male, 40 years old, dizziness and headache for two weeks, three consecutive blood pressure readings 21",
    "options": {
        "A": "Acute hypertension",
        "B": "Chronic nephritis",
        "C": "Hyperthyroidism",
        "D": "Primary hypertension",
        "E": "SLE"
    },
    "answer": "D",
    "question_type": "Single‑choice"
}

Dataset Details

Dataset NameSample CountSourceData Origin
2017‑2021physician.json3000National Physician Qualification ExamLLM‑Chinese‑NMLE
2017‑2021pharmacist.json2400National Pharmacist Qualification ExamLLM‑Chinese‑NMLE
2017‑2021nurse.json1200National Nurse Qualification ExamLLM‑Chinese‑NMLE
Total6600

Thanks to all open‑source contributors! Processed data are available under data/.

More data are being prepared.

Contact

If you are interested in this work, have data contributions, or any questions, please contact: jingnant@163.com

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio