DATASET
Open Source Community
Yahoo_Answers_10_categories_for_NLP
The Yahoo Answers topic classification dataset is constructed using the 10 largest primary categories. Each category contains 140,000 training samples and 6,000 test samples, totaling 1,400,000 training samples and 60,000 test samples. The dataset files include classes.txt, train.csv, and test.csv, where each sample has four columns: category index, question title, question content, and best answer.
Updated 7/27/2024
huggingface
Description
Dataset Card
Dataset Overview
- Dataset Name: Yahoo Answers 10 categories for NLP
- Task Type: Text Classification
- Tags: categories, text data, nlp, yelp, fine-grained, 10 classes, yahoo, answers
- Language: English
- Data Scale: 1M<n<10M
- License: Apache 2.0
Dataset Description
- Dataset Construction: Built using the 10 largest primary categories of Yahoo! Answers.
- Data Content: Only the best answer content and primary category information are used.
- File Description:
classes.txt: Contains the list of categories corresponding to each label.train.csvandtest.csv: Contain all training and test samples in CSV format. Each row has 4 columns: category index (1 to 10), question title, question content, and best answer. Text fields are escaped with double quotes; internal double quotes are escaped by two double quotes; newline characters are escaped with a backslash followed by "n".
Dataset Source
- Kaggle Link: https://www.kaggle.com/datasets/yacharki/yahoo-answers-10-categories-for-nlp-csv
- DOI: 10.34740/KAGGLE/DSV/5339321
- Authors: Xiang Zhang and Acharki Yassir
- Year: 2023
Dataset Structure
- File List:
Readme.mdtest.csvtrain.csvclasses.txt
Dataset Usage
- Direct Use: Fine-grained text classification
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Natural Language Processing
Text Classification
Source
Organization: huggingface
Created: 7/27/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.