JUHE API Marketplace
DATASET
Open Source Community

Yahoo_Answers_10_categories_for_NLP

The Yahoo Answers topic classification dataset is constructed using the 10 largest primary categories. Each category contains 140,000 training samples and 6,000 test samples, totaling 1,400,000 training samples and 60,000 test samples. The dataset files include classes.txt, train.csv, and test.csv, where each sample has four columns: category index, question title, question content, and best answer.

Updated 7/27/2024
huggingface

Description

Dataset Card

Dataset Overview

  • Dataset Name: Yahoo Answers 10 categories for NLP
  • Task Type: Text Classification
  • Tags: categories, text data, nlp, yelp, fine-grained, 10 classes, yahoo, answers
  • Language: English
  • Data Scale: 1M<n<10M
  • License: Apache 2.0

Dataset Description

  • Dataset Construction: Built using the 10 largest primary categories of Yahoo! Answers.
  • Data Content: Only the best answer content and primary category information are used.
  • File Description:
    • classes.txt: Contains the list of categories corresponding to each label.
    • train.csv and test.csv: Contain all training and test samples in CSV format. Each row has 4 columns: category index (1 to 10), question title, question content, and best answer. Text fields are escaped with double quotes; internal double quotes are escaped by two double quotes; newline characters are escaped with a backslash followed by "n".

Dataset Source

Dataset Structure

  • File List:
    • Readme.md
    • test.csv
    • train.csv
    • classes.txt

Dataset Usage

  • Direct Use: Fine-grained text classification

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Text Classification

Source

Organization: huggingface

Created: 7/27/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.