open-llm-leaderboard-old/details_CultriX__MonaTrix-v4-7B-DPO
This dataset was automatically generated during the evaluation of model CultriX/MonaTrix‑v4‑7B‑DPO. It comprises 63 configurations, each mapping to a specific evaluation task. Each run creates a split named after its timestamp; the `train` split always points to the latest results. An additional `results` configuration aggregates outcomes from all runs for metric computation on the Open LLM Leaderboard.
Description
Dataset Overview
Dataset Introduction
The dataset was automatically created while evaluating the model CultriX/MonaTrix‑v4‑7B‑DPO on the Open LLM Leaderboard.
Dataset Composition
- Contains 63 configurations, each representing an evaluation task.
- Generated from a single run; each configuration holds a split named with the run's timestamp.
- The
trainsplit always points to the most recent results. - An extra
resultsconfiguration stores aggregated outcomes for computing and displaying metrics on the Open LLM Leaderboard.
Data Loading Example
from datasets import load_dataset
data = load_dataset(
"open-llm-leaderboard/details_CultriX__MonaTrix-v4-7B-DPO",
"harness_winogrande_5",
split="train"
)
Latest Results
The latest results from the run at 2024‑04‑18T20:01:35.901312 are:
{
"all": {
"acc": 0.6474083513393422,
"acc_stderr": 0.03222227240976208,
"acc_norm": 0.6467168780981731,
"acc_norm_stderr": 0.03289740340162837,
"mc1": 0.627906976744186,
"mc1_stderr": 0.01692109011881403,
"mc2": 0.7821613278665662,
"mc2_stderr": 0.013679248945795038
},
"harness|arc:challenge|25": {
"acc": 0.7081911262798635,
"acc_stderr": 0.01328452529240351,
"acc_norm": 0.734641638225256,
"acc_norm_stderr": 0.01290255476231396
}
// ... (remaining results omitted for brevity)
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.