Back to datasets
Dataset assetOpen Source CommunityBiomedicalQuestion Answering Systems

PQAref

The PQAref dataset is a reference question‑answering dataset for the biomedical domain, designed for fine‑tuning large language models. It comprises three components: an instruction (question), abstracts (relevant abstracts retrieved from PubMed, including PubMed ID, abstract title, and content), and an answer (expected answer with references in PubMed ID format). The dataset was created semi‑automatically, leveraging questions from the PubMedQA dataset.

Source
huggingface
Created
Jul 2, 2024
Updated
Jul 2, 2024
Signals
533 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

PubMed Referenced Question Answering Dataset

Dataset Description

PQAref is a dataset for fine‑tuning large language models on reference‑based question answering in the biomedical domain.

Dataset Content

The dataset includes three parts:

  • Instruction: The question to be answered.
  • Abstracts: Ten relevant PubMed abstracts, each containing PubMed ID, abstract title, and abstract content.
  • Answer: The expected answer, containing references formatted as PubMed IDs.

Dataset Creation Method

The dataset was created semi‑automatically, reusing questions from the PubMedQA dataset.

Dataset Features

  • Input: string type

Dataset Splits

  • Training set: 7,260 samples, 136,602,851.95652175 bytes.
  • Validation set: 907 samples, 17,065,948.584650856 bytes.
  • Test set: 908 samples, 17,084,764.40447958 bytes.

Dataset Size

  • Download size: 82,888,007 bytes
  • Total size: 170,753,564.9456522 bytes

Task Categories

  • Text Generation
  • Question Answering
  • Summarization

Language

  • English

Tags

  • Biology
  • Biomedical

Scale

  • 10M < n < 100M
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio