JUHE API Marketplace
DATASET
Open Source Community

medical-qa-id-filtered-split

This dataset is a medical question‑answering collection containing system prompts, question IDs, question texts, original answer texts, answer lengths, and other features. It is split into training, validation, and test sets with 89,101, 4,950 and 4,951 samples respectively. The download size is 42,351,649 bytes and the total size is 83,382,248 bytes. The source is https://huggingface.co/datasets/lintangbs/medical-qa-id-llama, and preprocessing steps include removing empty lines and limiting the maximum token count to 1,024.

Updated 11/30/2024
huggingface

Description

Dataset Overview

Dataset Information

  • Feature Fields:

    • Unnamed: 0: data type int64
    • system_prompt: data type string
    • qas_id: data type string
    • question_text: data type string
    • orig_answer_texts: data type string
    • answer_lengths: data type float64
    • __index_level_0__: data type int64
  • Dataset Split:

    • Training Set:
      • Sample count: 89,101
      • Bytes: 74,957,465
    • Validation Set:
      • Sample count: 4,950
      • Bytes: 4,202,516
    • Test Set:
      • Sample count: 4,951
      • Bytes: 4,222,267
  • Dataset Size:

    • Download size: 42,351,649 bytes
    • Total size: 83,382,248 bytes

Configuration Information

  • Configuration Name: default
    • Data File Paths:
      • Training: data/train-*
      • Validation: data/validation-*
      • Test: data/test-*

Dataset Processing

  • Original Dataset: lintangbs/medical-qa-id-llama
  • Processing Details:
    • Removed empty lines
    • Limited maximum token count to 1,024 to fit smaller models

Dataset Split Ratios

  • Training: 90%
  • Validation: 5%
  • Test: 5%

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Medical QA
Natural Language Processing

Source

Organization: huggingface

Created: 11/19/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.