open-llm-leaderboard-old/details_CultriX__MonaTrix-v4-7B-DPO

This dataset was automatically generated during the evaluation of model CultriX/MonaTrix‑v4‑7B‑DPO. It comprises 63 configurations, each mapping to a specific evaluation task. Each run creates a split named after its timestamp; the `train` split always points to the latest results. An additional `results` configuration aggregates outcomes from all runs for metric computation on the Open LLM Leaderboard.

Updated 4/18/2024

hugging_face

Dataset Overview

Dataset Introduction

The dataset was automatically created while evaluating the model CultriX/MonaTrix‑v4‑7B‑DPO on the Open LLM Leaderboard.

Dataset Composition

Contains 63 configurations, each representing an evaluation task.
Generated from a single run; each configuration holds a split named with the run's timestamp.
The train split always points to the most recent results.
An extra results configuration stores aggregated outcomes for computing and displaying metrics on the Open LLM Leaderboard.

Data Loading Example

from datasets import load_dataset

data = load_dataset(
    "open-llm-leaderboard/details_CultriX__MonaTrix-v4-7B-DPO",
    "harness_winogrande_5",
    split="train"
)

Latest Results

The latest results from the run at 2024‑04‑18T20:01:35.901312 are:

{
    "all": {
        "acc": 0.6474083513393422,
        "acc_stderr": 0.03222227240976208,
        "acc_norm": 0.6467168780981731,
        "acc_norm_stderr": 0.03289740340162837,
        "mc1": 0.627906976744186,
        "mc1_stderr": 0.01692109011881403,
        "mc2": 0.7821613278665662,
        "mc2_stderr": 0.013679248945795038
    },
    "harness|arc:challenge|25": {
        "acc": 0.7081911262798635,
        "acc_stderr": 0.01328452529240351,
        "acc_norm": 0.734641638225256,
        "acc_norm_stderr": 0.01290255476231396
    }
    // ... (remaining results omitted for brevity)
}

open-llm-leaderboard-old/details_CultriX__MonaTrix-v4-7B-DPO

Description

Dataset Overview

Dataset Introduction

Dataset Composition

Data Loading Example

Latest Results

AI studio

Access Dataset

Topics

Source