dim/lmsys_chatbot_arena_conversations
The dataset includes multiple features such as question ID, dialogue content from two models, winner label, judge, turn count, anonymity flag, language, timestamp, and OpenAI moderation results. Each dialogue records content and role, turn number, and anonymity status. Toxicity detection results from two large models are also captured, including binary flags and probabilities. The dataset is provided as a training split with 33,000 samples.
Description
Dataset Overview
Dataset Information
- Feature List:
question_id: stringmodel_a: stringmodel_b: stringwinner: stringjudge: stringconversation_a: list of{content: string, role: string}conversation_b: list of{content: string, role: string}turn: 64‑bit integeranony: booleanlanguage: stringtstamp: 64‑bit floatopenai_moderation: struct withcategories(multiple booleans),category_scores(multiple 64‑bit floats), andflagged(boolean)toxic_chat_tag: struct withroberta-large(flaggedboolean,probabilityfloat) andt5-large(flaggedboolean,scorefloat)
Dataset Split
- Training Set:
- Size: 81,159,839 bytes
- Samples: 33,000
Dataset Size
- Download Size: 41,573,740 bytes
- Total Size: 81,159,839 bytes
Configuration
- Default Config:
- Data file pattern:
data/train-*
- Data file pattern:
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.