open-llm-leaderboard-old/details_yleo__EmertonOmniBeagle-7B-dpo

This dataset was automatically created during the evaluation run of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It comprises 63 configurations, each corresponding to an evaluated task, containing results from a single run. The "train" split always points to the latest results. An additional configuration named "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard. The README also provides a Python example for loading the dataset using the 🤗 datasets library and includes the latest results for a specific run.

Updated 2/14/2024

hugging_face

Description

Dataset Overview

Dataset Summary

The dataset was automatically created during the evaluation of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It contains 63 configurations, each corresponding to an evaluation task.

Dataset Structure

Number of configurations: 63
Creation source: From a single run, each configuration contains a specific split identified by the run timestamp.
Training split: Always points to the latest results.
Additional configuration: "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard.

Data Loading Example

from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_yleo__EmertonOmniBeagle-7B-dpo",
    "harness_winogrande_5",
    split="train")

Latest Results

The following are the latest results from the 2024-02-14T10:17:01.661454 run:

{
    "all": {
        "acc": 0.6503063779591207,
        "acc_stderr": 0.03221551316026954,
        "acc_norm": 0.6499041876908436,
        "acc_norm_stderr": 0.03288821519239377,
        "mc1": 0.6034271725826194,
        "mc1_stderr": 0.01712493094202351,
        "mc2": 0.7562392596040229,
        "mc2_stderr": 0.01399559383226538
    },
    ...
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Model Evaluation

Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →