Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingModel Evaluation

open-llm-leaderboard-old/details_yleo__EmertonOmniBeagle-7B-dpo

This dataset was automatically created during the evaluation run of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It comprises 63 configurations, each corresponding to an evaluated task, containing results from a single run. The "train" split always points to the latest results. An additional configuration named "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard. The README also provides a Python example for loading the dataset using the 🤗 datasets library and includes the latest results for a specific run.

Source
hugging_face
Created
Nov 28, 2025
Updated
Feb 14, 2024
Signals
64 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Summary

The dataset was automatically created during the evaluation of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It contains 63 configurations, each corresponding to an evaluation task.

Dataset Structure

  • Number of configurations: 63
  • Creation source: From a single run, each configuration contains a specific split identified by the run timestamp.
  • Training split: Always points to the latest results.
  • Additional configuration: "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard.

Data Loading Example

from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_yleo__EmertonOmniBeagle-7B-dpo",
    "harness_winogrande_5",
    split="train")

Latest Results

The following are the latest results from the 2024-02-14T10:17:01.661454 run:

{
    "all": {
        "acc": 0.6503063779591207,
        "acc_stderr": 0.03221551316026954,
        "acc_norm": 0.6499041876908436,
        "acc_norm_stderr": 0.03288821519239377,
        "mc1": 0.6034271725826194,
        "mc1_stderr": 0.01712493094202351,
        "mc2": 0.7562392596040229,
        "mc2_stderr": 0.01399559383226538
    },
    ...
}
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio