Datasets | JuheAPI

PediaBench

Language Model Evaluation

Pediatric Medicine

PediaBench is a Chinese dataset specifically designed to evaluate large language models (LLMs) on pediatric question‑answering tasks. Created by research teams at Guizhou University and East China Normal University, it contains 4,565 objective questions and 1,632 subjective questions covering 12 pediatric diseases. Sources include the Chinese National Medical Licensing Examination, university final exams, and pediatric diagnostic and treatment standards. The dataset was built by collecting questions from multiple reliable sources and applying comprehensive scoring criteria to assess LLMs in instruction following, knowledge understanding, and clinical case analysis. PediaBench addresses the lack of pediatric coverage in existing medical QA datasets, providing a thorough benchmark for LLMs in the pediatric domain.

arXiv

View Details

Dataset Hub

Browse by Category

PediaBench