Dataset assetOpen Source CommunityMachine LearningNatural Language Processing

llm-blender/mix-instruct

MixInstruct is a dataset released for the LLM‑Blender project. It contains responses from 11 currently popular instruction‑following LLMs, including Stanford Alpaca, FastChat Vicuna, Dolly V2, StableLM, Open Assistant, Koala, Baize, Flan‑T5, ChatGLM, MOSS, and Mosaic MPT. The dataset is evaluated with automatic metrics (BLEU, ROUGE, BERTScore, BARTScore) and pairwise comparisons of 4,771 test samples performed by ChatGPT. The format is JSON, with fields for instruction, input, output, and candidate responses, each accompanied by detailed scores.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jun 9, 2023

Signals

133 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Basic Information

Dataset Name: MixInstruct
Project: LLM‑Blender
License: MIT
Task Category: Text Generation
Language: English
Dataset Size: 100K<n<1M

Data Content

Included Models: The dataset includes 11 responses from popular instruction‑following LLMs, namely Stanford Alpaca, FastChat Vicuna, Dolly V2, StableLM, Open Assistant, Koala, Baize, Flan‑T5, ChatGLM, MOSS, and Mosaic MPT.
Evaluation Metrics: Automatic metrics such as BLEU, ROUGE, BERTScore, BARTScore are provided, along with pairwise comparison results generated by ChatGPT.

Data Format

Structure: JSON, each entry contains id, instruction, input, output, and candidates fields.
Additional Fields: cmp_results records model‑to‑model comparison outcomes produced by ChatGPT.

Evaluation Results

Automatic Metrics: Detailed performance metrics for training, validation, and test splits are supplied for each model.
ChatGPT Comparison Results: Includes BERTScore, BARTScore, BLEURT, GPT‑Rank and other scores for model pairwise comparisons.

Best Model Performance

Top Model: Open Assistant achieves the best scores across multiple metrics.
Oracle Model: An oracle model's performance is provided for reference and comparison with the top model.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio