DATASET
Open Source Community
legacy107/qa_wikipedia
The qa_wikipedia dataset is a question‑answering dataset containing multiple documents extracted from Wikipedia along with associated questions. Features include document ID, title, context, question, answer start position, answer text, and the full article. The dataset is split into training, test, and validation subsets for different modeling stages.
Updated 9/18/2023
hugging_face
Description
Dataset Overview
Configuration
- Default configuration (
default)- Data file paths:
- Training set (
train):data/train-* - Test set (
test):data/test-* - Validation set (
validation):data/validation-*
- Training set (
- Data file paths:
Data Features
id: stringtitle: stringcontext: stringquestion: stringanswer_start: 64‑bit integeranswer: stringarticle: string
Data Splits
- Training set (
train)- Size: 7,477,859,892 bytes
- Samples: 138,712
- Test set (
test)- Size: 898,641,134 bytes
- Samples: 17,341
- Validation set (
validation)- Size: 926,495,549 bytes
- Samples: 17,291
Dataset Size
- Download size: 498,772,569 bytes
- Total dataset size: 9,302,996,575 bytes
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Question Answering Systems
Wikipedia
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.