Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingModel Evaluation
open-llm-leaderboard-old/details_CultriX__MonaTrix-v4-7B-DPO
This dataset was automatically generated during the evaluation of model CultriX/MonaTrix‑v4‑7B‑DPO. It comprises 63 configurations, each mapping to a specific evaluation task. Each run creates a split named after its timestamp; the `train` split always points to the latest results. An additional `results` configuration aggregates outcomes from all runs for metric computation on the Open LLM Leaderboard.
Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 18, 2024
Signals
74 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Introduction
The dataset was automatically created while evaluating the model CultriX/MonaTrix‑v4‑7B‑DPO on the Open LLM Leaderboard.
Dataset Composition
- Contains 63 configurations, each representing an evaluation task.
- Generated from a single run; each configuration holds a split named with the run's timestamp.
- The
trainsplit always points to the most recent results. - An extra
resultsconfiguration stores aggregated outcomes for computing and displaying metrics on the Open LLM Leaderboard.
Data Loading Example
from datasets import load_dataset
data = load_dataset(
"open-llm-leaderboard/details_CultriX__MonaTrix-v4-7B-DPO",
"harness_winogrande_5",
split="train"
)
Latest Results
The latest results from the run at 2024‑04‑18T20:01:35.901312 are:
{
"all": {
"acc": 0.6474083513393422,
"acc_stderr": 0.03222227240976208,
"acc_norm": 0.6467168780981731,
"acc_norm_stderr": 0.03289740340162837,
"mc1": 0.627906976744186,
"mc1_stderr": 0.01692109011881403,
"mc2": 0.7821613278665662,
"mc2_stderr": 0.013679248945795038
},
"harness|arc:challenge|25": {
"acc": 0.7081911262798635,
"acc_stderr": 0.01328452529240351,
"acc_norm": 0.734641638225256,
"acc_norm_stderr": 0.01290255476231396
}
// ... (remaining results omitted for brevity)
}
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.