JUHE API Marketplace
DATASET
Open Source Community

PQAref

The PQAref dataset is a reference question‑answering dataset for the biomedical domain, designed for fine‑tuning large language models. It comprises three components: an instruction (question), abstracts (relevant abstracts retrieved from PubMed, including PubMed ID, abstract title, and content), and an answer (expected answer with references in PubMed ID format). The dataset was created semi‑automatically, leveraging questions from the PubMedQA dataset.

Updated 7/2/2024
huggingface

Description

Dataset Overview

Dataset Name

PubMed Referenced Question Answering Dataset

Dataset Description

PQAref is a dataset for fine‑tuning large language models on reference‑based question answering in the biomedical domain.

Dataset Content

The dataset includes three parts:

  • Instruction: The question to be answered.
  • Abstracts: Ten relevant PubMed abstracts, each containing PubMed ID, abstract title, and abstract content.
  • Answer: The expected answer, containing references formatted as PubMed IDs.

Dataset Creation Method

The dataset was created semi‑automatically, reusing questions from the PubMedQA dataset.

Dataset Features

  • Input: string type

Dataset Splits

  • Training set: 7,260 samples, 136,602,851.95652175 bytes.
  • Validation set: 907 samples, 17,065,948.584650856 bytes.
  • Test set: 908 samples, 17,084,764.40447958 bytes.

Dataset Size

  • Download size: 82,888,007 bytes
  • Total size: 170,753,564.9456522 bytes

Task Categories

  • Text Generation
  • Question Answering
  • Summarization

Language

  • English

Tags

  • Biology
  • Biomedical

Scale

  • 10M < n < 100M

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Biomedical
Question Answering Systems

Source

Organization: huggingface

Created: 7/2/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.