open-llm-leaderboard-old/details_yleo__EmertonOmniBeagle-7B-dpo
This dataset was automatically created during the evaluation run of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It comprises 63 configurations, each corresponding to an evaluated task, containing results from a single run. The "train" split always points to the latest results. An additional configuration named "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard. The README also provides a Python example for loading the dataset using the 🤗 datasets library and includes the latest results for a specific run.
Dataset description and usage context
Dataset Overview
Dataset Summary
The dataset was automatically created during the evaluation of model yleo/EmertonOmniBeagle-7B-dpo on the Open LLM Leaderboard. It contains 63 configurations, each corresponding to an evaluation task.
Dataset Structure
- Number of configurations: 63
- Creation source: From a single run, each configuration contains a specific split identified by the run timestamp.
- Training split: Always points to the latest results.
- Additional configuration: "results" stores aggregated results from all runs, used to compute and display aggregated metrics on the Open LLM Leaderboard.
Data Loading Example
from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_yleo__EmertonOmniBeagle-7B-dpo",
"harness_winogrande_5",
split="train")
Latest Results
The following are the latest results from the 2024-02-14T10:17:01.661454 run:
{
"all": {
"acc": 0.6503063779591207,
"acc_stderr": 0.03221551316026954,
"acc_norm": 0.6499041876908436,
"acc_norm_stderr": 0.03288821519239377,
"mc1": 0.6034271725826194,
"mc1_stderr": 0.01712493094202351,
"mc2": 0.7562392596040229,
"mc2_stderr": 0.01399559383226538
},
...
}
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.