DATASET
Open Source Community
community-datasets/yahoo_answers_topics
This dataset is a Yahoo Answers topic‑classification dataset for text‑classification tasks. It contains 1.4 million training examples and 60 000 test examples. Each example includes a question title, question content, the best answer, and the corresponding topic label. The topic labels cover ten categories such as Society & Culture, Science & Mathematics, Health, etc. The dataset language is English and it is monolingual.
Updated 6/24/2024
hugging_face
Description
Dataset Card for "Yahoo Answers Topics"
Dataset Description
Dataset Summary
- annotations_creators: found
- language_creators: found
- language: en
- license: unknown
- multilinguality: monolingual
- size_categories: 1M<n<10M
- source_datasets: extended|other-yahoo-answers-corpus
- task_categories: text-classification
- task_ids: topic-classification
- pretty_name: YahooAnswersTopics
Dataset Structure
Data Fields
- id: int32
- topic: class_label
- names:
- 0: Society & Culture
- 1: Science & Mathematics
- 2: Health
- 3: Education & Reference
- 4: Computers & Internet
- 5: Sports
- 6: Business & Finance
- 7: Entertainment & Music
- 8: Family & Relationships
- 9: Politics & Government
- names:
- question_title: string
- question_content: string
- best_answer: string
Data Splits
- train:
- num_bytes: 760285695
- num_examples: 1400000
- test:
- num_bytes: 32653862
- num_examples: 60000
Dataset Creation
Dataset Information
- config_name: yahoo_answers_topics
- download_size: 533429663
- dataset_size: 792939557
Configuration
- config_name: yahoo_answers_topics
- data_files:
- split: train path: yahoo_answers_topics/train-*
- split: test path: yahoo_answers_topics/test-*
- default: true
- data_files:
Training‑Evaluation Index
- config: yahoo_answers_topics
- task: text-classification
- task_id: multi_class_classification
- splits:
- train_split: train
- eval_split: test
- col_mapping:
- question_content: text
- topic: target
- metrics:
- type: accuracy name: Accuracy
- type: f1 name: F1 macro args: average: macro
- type: f1 name: F1 micro args: average: micro
- type: f1 name: F1 weighted args: average: weighted
- type: precision name: Precision macro args: average: macro
- type: precision name: Precision micro args: average: micro
- type: precision name: Precision weighted args: average: weighted
- type: recall name: Recall macro args: average: macro
- type: recall name: Recall micro args: average: micro
- type: recall name: Recall weighted args: average: weighted
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Text Classification
Topic Classification
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.