Back to datasets
Dataset assetOpen Source CommunityBiomedicalQuestion Answering Systems

rag-datasets/rag-mini-bioasq

This dataset is primarily used for question answering and sentence similarity tasks in the biomedical domain. It includes two configurations: text‑corpus and question‑answer‑passages, each corresponding to different data file paths. The dataset originates from the training set of BioASQ Task 11b and subsets were generated using the `generate.py` script.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 17, 2024
Signals
627 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

License

  • This dataset follows the CC-BY-2.5 license.

Task Categories

  • Question Answering (question-answering)
  • Sentence Similarity (sentence-similarity)

Language

  • English (en)

Tags

  • RAG
  • DPR
  • Information Retrieval (information-retrieval)
  • Question Answering (question-answering)
  • Biomedical (biomedical)

Configurations

  • Configuration Name: text-corpus

    • Data File:
      • Split: passages
      • Path: "data/passages.parquet/*"
  • Configuration Name: question-answer-passages

    • Data File:
      • Split: test
      • Path: "data/test.parquet/*"
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio