High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

dim/lmsys_chatbot_arena_conversations

The dataset includes multiple features such as question ID, dialogue content from two models, winner label, judge, turn count, anonymity flag, language, timestamp, and OpenAI moderation results. Each dialogue records content and role, turn number, and anonymity status. Toxicity detection results from two large models are also captured, including binary flags and probabilities. The dataset is provided as a training split with 33,000 samples.

hugging_face

View Details

google-research-datasets/time_dial

Temporal Commonsense Reasoning

Dialogue Analysis

TimeDial is an English dataset focusing on temporal commonsense reasoning in dialogue, containing approximately 1.5k carefully curated multiple‑choice cloze dialogue tasks. Extracted from DailyDialog, it challenges models with complex temporal reasoning. It includes a test set of 1,104 dialogue instances, each featuring a multiple‑choice cloze task involving temporal expressions. The dataset was annotated by English linguists.

hugging_face

View Details

OpenRL/daily_dialog

Dialogue Analysis

Emotion Recognition

--- dataset_info: features: - name: dialog sequence: string - name: act sequence: class_label: names: '0': __dummy__ '1': inform '2': question '3': directive '4': commissive - name: emotion sequence: class_label: names: '0': no emotion '1': anger '2': disgust '3': fear '4': happiness '5': sadness '6': surprise splits: - name: train num_bytes: 7296683 num_examples: 11118 - name: validation num_bytes: 673927 num_examples: 1000 - name: test num_bytes: 655828 num_examples: 1000 download_size: 0 dataset_size: 8626438 --- # Dataset Card for "daily_dialog" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

hugging_face

View Details