JUHE API Marketplace
DATASET
Open Source Community

community-datasets/yahoo_answers_topics

This dataset is a Yahoo Answers topic‑classification dataset for text‑classification tasks. It contains 1.4 million training examples and 60 000 test examples. Each example includes a question title, question content, the best answer, and the corresponding topic label. The topic labels cover ten categories such as Society & Culture, Science & Mathematics, Health, etc. The dataset language is English and it is monolingual.

Updated 6/24/2024
hugging_face

Description

Dataset Card for "Yahoo Answers Topics"

Dataset Description

Dataset Summary

  • annotations_creators: found
  • language_creators: found
  • language: en
  • license: unknown
  • multilinguality: monolingual
  • size_categories: 1M<n<10M
  • source_datasets: extended|other-yahoo-answers-corpus
  • task_categories: text-classification
  • task_ids: topic-classification
  • pretty_name: YahooAnswersTopics

Dataset Structure

Data Fields

  • id: int32
  • topic: class_label
    • names:
      • 0: Society & Culture
      • 1: Science & Mathematics
      • 2: Health
      • 3: Education & Reference
      • 4: Computers & Internet
      • 5: Sports
      • 6: Business & Finance
      • 7: Entertainment & Music
      • 8: Family & Relationships
      • 9: Politics & Government
  • question_title: string
  • question_content: string
  • best_answer: string

Data Splits

  • train:
    • num_bytes: 760285695
    • num_examples: 1400000
  • test:
    • num_bytes: 32653862
    • num_examples: 60000

Dataset Creation

Dataset Information

  • config_name: yahoo_answers_topics
  • download_size: 533429663
  • dataset_size: 792939557

Configuration

  • config_name: yahoo_answers_topics
    • data_files:
      • split: train path: yahoo_answers_topics/train-*
      • split: test path: yahoo_answers_topics/test-*
    • default: true

Training‑Evaluation Index

  • config: yahoo_answers_topics
    • task: text-classification
    • task_id: multi_class_classification
    • splits:
      • train_split: train
      • eval_split: test
    • col_mapping:
      • question_content: text
      • topic: target
    • metrics:
      • type: accuracy name: Accuracy
      • type: f1 name: F1 macro args: average: macro
      • type: f1 name: F1 micro args: average: micro
      • type: f1 name: F1 weighted args: average: weighted
      • type: precision name: Precision macro args: average: macro
      • type: precision name: Precision micro args: average: micro
      • type: precision name: Precision weighted args: average: weighted
      • type: recall name: Recall macro args: average: macro
      • type: recall name: Recall micro args: average: micro
      • type: recall name: Recall weighted args: average: weighted

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Text Classification
Topic Classification

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.