open-llm-leaderboard-old/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO
This dataset was automatically generated during the evaluation runs of the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO. It comprises 63 configurations, each representing a distinct evaluation task. For each run, a split named after the run’s timestamp is created; the "train" split always points to the latest results. An additional "results" configuration stores aggregated metrics for all runs, which are used to compute and display aggregate scores on the Open LLM Leaderboard.
Dataset description and usage context
Dataset Overview
Dataset Introduction
The dataset was automatically created for evaluating the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO on the Open LLM Leaderboard.
Structure
- Number of Configurations: 63, each corresponding to a specific evaluation task.
- Creation Source: Generated from a single run. Each run yields a split named with the run timestamp; the "train" split always points to the most recent results.
- Additional Configuration: The "results" config stores aggregated results of all runs for computing and displaying metrics on the Open LLM Leaderboard.
Loading Example
from datasets import load_dataset
data = load_dataset(
"open-llm-leaderboard/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO",
"harness_winogrande_5",
split="train"
)
Latest Results
The latest run on 2024‑01‑20T20:17:56.012496 produced the following scores:
{
"all": {"acc": 0.2555, "acc_stderr": 0.0307, "acc_norm": 0.2564, "acc_norm_stderr": 0.0315, ...},
"harness|arc:challenge|25": {"acc": 0.3464, "acc_stderr": 0.0139, "acc_norm": 0.3626, "acc_norm_stderr": 0.0140},
"harness|hellaswag|10": {"acc": 0.4566, "acc_stderr": 0.0050, "acc_norm": 0.6120, "acc_norm_stderr": 0.0049},
...
}
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.