JUHE API Marketplace
DATASET
Open Source Community

open-llm-leaderboard-old/details_CultriX__MonaTrix-v4-7B-DPO

This dataset was automatically generated during the evaluation of model CultriX/MonaTrix‑v4‑7B‑DPO. It comprises 63 configurations, each mapping to a specific evaluation task. Each run creates a split named after its timestamp; the `train` split always points to the latest results. An additional `results` configuration aggregates outcomes from all runs for metric computation on the Open LLM Leaderboard.

Updated 4/18/2024
hugging_face

Description

Dataset Overview

Dataset Introduction

The dataset was automatically created while evaluating the model CultriX/MonaTrix‑v4‑7B‑DPO on the Open LLM Leaderboard.

Dataset Composition

  • Contains 63 configurations, each representing an evaluation task.
  • Generated from a single run; each configuration holds a split named with the run's timestamp.
  • The train split always points to the most recent results.
  • An extra results configuration stores aggregated outcomes for computing and displaying metrics on the Open LLM Leaderboard.

Data Loading Example

from datasets import load_dataset

data = load_dataset(
    "open-llm-leaderboard/details_CultriX__MonaTrix-v4-7B-DPO",
    "harness_winogrande_5",
    split="train"
)

Latest Results

The latest results from the run at 2024‑04‑18T20:01:35.901312 are:

{
    "all": {
        "acc": 0.6474083513393422,
        "acc_stderr": 0.03222227240976208,
        "acc_norm": 0.6467168780981731,
        "acc_norm_stderr": 0.03289740340162837,
        "mc1": 0.627906976744186,
        "mc1_stderr": 0.01692109011881403,
        "mc2": 0.7821613278665662,
        "mc2_stderr": 0.013679248945795038
    },
    "harness|arc:challenge|25": {
        "acc": 0.7081911262798635,
        "acc_stderr": 0.01328452529240351,
        "acc_norm": 0.734641638225256,
        "acc_norm_stderr": 0.01290255476231396
    }
    // ... (remaining results omitted for brevity)
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Model Evaluation
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.