Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingReading Comprehension
mandarjoshi/trivia_qa
TriviaQA is a reading‑comprehension dataset containing over 650,000 question‑answer‑evidence triples. It includes 95,000 question‑answer pairs authored by trivia enthusiasts and independently collected evidence documents, with an average of six documents per question, providing high‑quality distant supervision. The dataset is monolingual (English) and is suitable for QA and text‑generation tasks.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 5, 2024
Signals
532 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Basic Information
- Name: TriviaQA
- Language: English
- Multilinguality: Monolingual
- License: Unknown
- Annotation creators: Crowdsourcing
- Language creators: Machine‑generated
- Size categories:
- 10K < n < 100K
- 100K < n < 1M
Dataset Structure
Configuration Details
Configuration: rc
- Features:
- question: string
- question_id: string
- question_source: string
- entity_pages: sequence
- document_source: string
- filename: string
- title: string
- wiki_context: string
- search_results: sequence ... (structure omitted for brevity)
- Splits:
- train: 138,384 examples, 12,749,651,131 bytes
- validation: 17,944 examples, 1,662,321,188 bytes
- test: 17,210 examples, 1,577,710,503 bytes
- Download size: 8,998,808,983 bytes
- Dataset size: 15,989,682,822 bytes
... (additional configurations omitted for brevity)
Task Types
- Question Answering
- Text‑to‑Text Generation
Task IDs
- Open‑domain QA
- Open‑domain Abstract QA
- Extractive QA
- Abstractive QA
Dataset Information
- Paper code ID: triviaqa
- Pretty name: TriviaQA
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.