JUHE API Marketplace
DATASET
Open Source Community

legacy107/qa_wikipedia

The qa_wikipedia dataset is a question‑answering dataset containing multiple documents extracted from Wikipedia along with associated questions. Features include document ID, title, context, question, answer start position, answer text, and the full article. The dataset is split into training, test, and validation subsets for different modeling stages.

Updated 9/18/2023
hugging_face

Description

Dataset Overview

Configuration

  • Default configuration (default)
    • Data file paths:
      • Training set (train): data/train-*
      • Test set (test): data/test-*
      • Validation set (validation): data/validation-*

Data Features

  • id: string
  • title: string
  • context: string
  • question: string
  • answer_start: 64‑bit integer
  • answer: string
  • article: string

Data Splits

  • Training set (train)
    • Size: 7,477,859,892 bytes
    • Samples: 138,712
  • Test set (test)
    • Size: 898,641,134 bytes
    • Samples: 17,341
  • Validation set (validation)
    • Size: 926,495,549 bytes
    • Samples: 17,291

Dataset Size

  • Download size: 498,772,569 bytes
  • Total dataset size: 9,302,996,575 bytes

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Question Answering Systems
Wikipedia

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.