JUHE API Marketplace
DATASET
Open Source Community

g-ronimo/oasst2_top4k_en

The dataset contains two primary features: messages, each comprising the sub‑features content and role. It is split into a training set with 4,000 samples and a test set with 400 samples. The data were selected from top‑ranked dialogues in OpenAssistant/oasst2, followed by deduplication and similarity filtering (long answers with similarity > 0.8 were excluded). The dataset includes only English content and was generated using a specific script.

Updated 3/5/2024
hugging_face

Description

Dataset Overview

Dataset Information

  • Features:
    • messages:
      • content: data type is string
      • role: data type is string
  • Splits:
    • train:
      • Bytes: 7,744,472.411884111
      • Samples: 4,000
    • test:
      • Bytes: 774,447.2411884111
      • Samples: 400
  • Download size: 4,492,003 bytes
  • Dataset size: 8,518,919.653072523 bytes

Configuration

  • Default configuration:
    • data_files:
      • train: path is data/train-*
      • test: path is data/test-*

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Dialogue Generation
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.