g-ronimo/oasst2_top4k_en
The dataset contains two primary features: messages, each comprising the sub‑features content and role. It is split into a training set with 4,000 samples and a test set with 400 samples. The data were selected from top‑ranked dialogues in OpenAssistant/oasst2, followed by deduplication and similarity filtering (long answers with similarity > 0.8 were excluded). The dataset includes only English content and was generated using a specific script.
Description
Dataset Overview
Dataset Information
- Features:
messages:content: data type is stringrole: data type is string
- Splits:
train:- Bytes: 7,744,472.411884111
- Samples: 4,000
test:- Bytes: 774,447.2411884111
- Samples: 400
- Download size: 4,492,003 bytes
- Dataset size: 8,518,919.653072523 bytes
Configuration
- Default configuration:
data_files:train: path isdata/train-*test: path isdata/test-*
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.