Back to datasets
Dataset assetOpen Source CommunityMachine LearningNatural Language Processing

llm-blender/mix-instruct

MixInstruct is a dataset released for the LLM‑Blender project. It contains responses from 11 currently popular instruction‑following LLMs, including Stanford Alpaca, FastChat Vicuna, Dolly V2, StableLM, Open Assistant, Koala, Baize, Flan‑T5, ChatGLM, MOSS, and Mosaic MPT. The dataset is evaluated with automatic metrics (BLEU, ROUGE, BERTScore, BARTScore) and pairwise comparisons of 4,771 test samples performed by ChatGPT. The format is JSON, with fields for instruction, input, output, and candidate responses, each accompanied by detailed scores.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 9, 2023
Signals
133 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • Dataset Name: MixInstruct
  • Project: LLM‑Blender
  • License: MIT
  • Task Category: Text Generation
  • Language: English
  • Dataset Size: 100K<n<1M

Data Content

  • Included Models: The dataset includes 11 responses from popular instruction‑following LLMs, namely Stanford Alpaca, FastChat Vicuna, Dolly V2, StableLM, Open Assistant, Koala, Baize, Flan‑T5, ChatGLM, MOSS, and Mosaic MPT.
  • Evaluation Metrics: Automatic metrics such as BLEU, ROUGE, BERTScore, BARTScore are provided, along with pairwise comparison results generated by ChatGPT.

Data Format

  • Structure: JSON, each entry contains id, instruction, input, output, and candidates fields.
  • Additional Fields: cmp_results records model‑to‑model comparison outcomes produced by ChatGPT.

Evaluation Results

  • Automatic Metrics: Detailed performance metrics for training, validation, and test splits are supplied for each model.
  • ChatGPT Comparison Results: Includes BERTScore, BARTScore, BLEURT, GPT‑Rank and other scores for model pairwise comparisons.

Best Model Performance

  • Top Model: Open Assistant achieves the best scores across multiple metrics.
  • Oracle Model: An oracle model's performance is provided for reference and comparison with the top model.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio