JUHE API Marketplace
DATASET
Open Source Community

dim/lmsys_chatbot_arena_conversations

The dataset includes multiple features such as question ID, dialogue content from two models, winner label, judge, turn count, anonymity flag, language, timestamp, and OpenAI moderation results. Each dialogue records content and role, turn number, and anonymity status. Toxicity detection results from two large models are also captured, including binary flags and probabilities. The dataset is provided as a training split with 33,000 samples.

Updated 11/8/2023
hugging_face

Description

Dataset Overview

Dataset Information

  • Feature List:
    • question_id: string
    • model_a: string
    • model_b: string
    • winner: string
    • judge: string
    • conversation_a: list of {content: string, role: string}
    • conversation_b: list of {content: string, role: string}
    • turn: 64‑bit integer
    • anony: boolean
    • language: string
    • tstamp: 64‑bit float
    • openai_moderation: struct with categories (multiple booleans), category_scores (multiple 64‑bit floats), and flagged (boolean)
    • toxic_chat_tag: struct with roberta-large (flagged boolean, probability float) and t5-large (flagged boolean, score float)

Dataset Split

  • Training Set:
    • Size: 81,159,839 bytes
    • Samples: 33,000

Dataset Size

  • Download Size: 41,573,740 bytes
  • Total Size: 81,159,839 bytes

Configuration

  • Default Config:
    • Data file pattern: data/train-*

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Chatbot
Dialogue Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.