Dataset assetOpen Source CommunityBiomedicalQuestion Answering Systems

rag-datasets/rag-mini-bioasq

This dataset is primarily used for question answering and sentence similarity tasks in the biomedical domain. It includes two configurations: text‑corpus and question‑answer‑passages, each corresponding to different data file paths. The dataset originates from the training set of BioASQ Task 11b and subsets were generated using the `generate.py` script.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jun 17, 2024

Signals

627 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

License

This dataset follows the CC-BY-2.5 license.

Task Categories

Question Answering (question-answering)
Sentence Similarity (sentence-similarity)

Language

English (en)

Configurations

Configuration Name: text-corpus
- Data File:
  - Split: passages
  - Path: "data/passages.parquet/*"
Configuration Name: question-answer-passages
- Data File:
  - Split: test
  - Path: "data/test.parquet/*"

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio