Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Category index
Showing 3 of 3 datasets
Category: Dialogue Generation

g-ronimo/oasst2_top4k_en

Dialogue GenerationNatural Language Processing

The dataset contains two primary features: messages, each comprising the sub‑features content and role. It is split into a training set with 4,000 samples and a test set with 400 samples. The data were selected from top‑ranked dialogues in OpenAssistant/oasst2, followed by deduplication and similarity filtering (long answers with similarity > 0.8 were excluded). The dataset includes only English content and was generated using a specific script.

Source hugging_faceUpdated Mar 5, 2024115 viewsLinked
Inspect dataset

Education Dialogue Dataset

Education DialogueDialogue Generation

The Education Dialogue dataset comprises dialogues generated by Gemini Ultra, occurring between teachers and students. Teachers are prompted to teach specific topics, while students are prompted with their learning preferences. The dataset includes 40,000 training examples and 7,234 test examples, each consisting of a complete teacher‑student conversation with metadata on the topic and teacher/student preferences.

Source githubUpdated Oct 29, 2024509 viewsLinked
Inspect dataset

noobmaster29/Verified-Camel-zh

Dialogue GenerationMultidisciplinary QA

This is a Chinese version of the Verified‑Camel dataset translated directly with GPT‑4. The dataset covers tasks such as dialogue, question answering, and text generation, in English and Chinese, with labels spanning physics, chemistry, mathematics, biology, culture, and logic. The dataset size is less than 1 K.

Source hugging_faceUpdated Dec 10, 202342 viewsLinked
Inspect dataset