open-llm-leaderboard-old/details_yleo__EmertonOmniBeagle-7B-dpo
This dataset was automatically created during the evaluation run of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It comprises 63 configurations, each corresponding to an evaluated task, containing results from a single run. The "train" split always points to the latest results. An additional configuration named "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard. The README also provides a Python example for loading the dataset using the 🤗 datasets library and includes the latest results for a specific run.
Description
Dataset Overview
Dataset Summary
The dataset was automatically created during the evaluation of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It contains 63 configurations, each corresponding to an evaluation task.
Dataset Structure
- Number of configurations: 63
- Creation source: From a single run, each configuration contains a specific split identified by the run timestamp.
- Training split: Always points to the latest results.
- Additional configuration: "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard.
Data Loading Example
from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_yleo__EmertonOmniBeagle-7B-dpo",
"harness_winogrande_5",
split="train")
Latest Results
The following are the latest results from the 2024-02-14T10:17:01.661454 run:
{
"all": {
"acc": 0.6503063779591207,
"acc_stderr": 0.03221551316026954,
"acc_norm": 0.6499041876908436,
"acc_norm_stderr": 0.03288821519239377,
"mc1": 0.6034271725826194,
"mc1_stderr": 0.01712493094202351,
"mc2": 0.7562392596040229,
"mc2_stderr": 0.01399559383226538
},
...
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.