mandarjoshi/trivia_qa
TriviaQA is a reading‑comprehension dataset containing over 650,000 question‑answer‑evidence triples. It includes 95,000 question‑answer pairs authored by trivia enthusiasts and independently collected evidence documents, with an average of six documents per question, providing high‑quality distant supervision. The dataset is monolingual (English) and is suitable for QA and text‑generation tasks.
Description
Dataset Overview
Basic Information
- Name: TriviaQA
- Language: English
- Multilinguality: Monolingual
- License: Unknown
- Annotation creators: Crowdsourcing
- Language creators: Machine‑generated
- Size categories:
- 10K < n < 100K
- 100K < n < 1M
Dataset Structure
Configuration Details
Configuration: rc
- Features:
- question: string
- question_id: string
- question_source: string
- entity_pages: sequence
- document_source: string
- filename: string
- title: string
- wiki_context: string
- search_results: sequence ... (structure omitted for brevity)
- Splits:
- train: 138,384 examples, 12,749,651,131 bytes
- validation: 17,944 examples, 1,662,321,188 bytes
- test: 17,210 examples, 1,577,710,503 bytes
- Download size: 8,998,808,983 bytes
- Dataset size: 15,989,682,822 bytes
... (additional configurations omitted for brevity)
Task Types
- Question Answering
- Text‑to‑Text Generation
Task IDs
- Open‑domain QA
- Open‑domain Abstract QA
- Extractive QA
- Abstractive QA
Dataset Information
- Paper code ID: triviaqa
- Pretty name: TriviaQA
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.