Explore high-quality datasets for your AI and machine learning projects.
The BiomixQA dataset is a biomedical question answering collection featuring two question types: multiple‑choice and true/false. It is used to evaluate the performance of knowledge‑graph‑enhanced retrieval‑augmented generation (KG‑RAG) frameworks across various large language models (LLMs). The dataset’s diversity lies in question formats and the covered biomedical concepts, making it especially suitable for assessing KG‑RAG performance. Additionally, the dataset supports research and development in biomedical NLP, knowledge graph reasoning, and QA systems. Sources include multiple biomedical knowledge graphs and databases such as SPOKE, DisGeNET, MONDO, SemMedDB, Monarch Initiative, and ROBOKOP.
DAHL is a long‑form biomedical text generation hallucination evaluation benchmark curated by Seoul National University. It comprises 8,573 questions across 29 categories sourced from PubMed Central biomedical research papers. Questions were automatically generated and manually filtered to ensure high quality and answerability. DAHL evaluates large language models' hallucination in the biomedical domain by decomposing model responses into atomic units for factual accuracy assessment, offering a deeper evaluation than traditional multiple‑choice tasks. Its primary applications lie in biomedical and clinical research to address factual conflicts in generated texts.
This dataset is primarily used for question answering and sentence similarity tasks in the biomedical domain. It includes two configurations: text‑corpus and question‑answer‑passages, each corresponding to different data file paths. The dataset originates from the training set of BioASQ Task 11b and subsets were generated using the `generate.py` script.
The PQAref dataset is a reference question‑answering dataset for the biomedical domain, designed for fine‑tuning large language models. It comprises three components: an instruction (question), abstracts (relevant abstracts retrieved from PubMed, including PubMed ID, abstract title, and content), and an answer (expected answer with references in PubMed ID format). The dataset was created semi‑automatically, leveraging questions from the PubMedQA dataset.
MedPix 2.0 is a comprehensive multimodal biomedical dataset for advanced AI applications. The dataset includes detailed clinical case information and images, supporting CT and MRI scans.