open-llm-leaderboard-old/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO
This dataset was automatically generated during the evaluation runs of the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO. It comprises 63 configurations, each representing a distinct evaluation task. For each run, a split named after the run’s timestamp is created; the "train" split always points to the latest results. An additional "results" configuration stores aggregated metrics for all runs, which are used to compute and display aggregate scores on the Open LLM Leaderboard.
Description
Dataset Overview
Dataset Introduction
The dataset was automatically created for evaluating the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO on the Open LLM Leaderboard.
Structure
- Number of Configurations: 63, each corresponding to a specific evaluation task.
- Creation Source: Generated from a single run. Each run yields a split named with the run timestamp; the "train" split always points to the most recent results.
- Additional Configuration: The "results" config stores aggregated results of all runs for computing and displaying metrics on the Open LLM Leaderboard.
Loading Example
from datasets import load_dataset
data = load_dataset(
"open-llm-leaderboard/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO",
"harness_winogrande_5",
split="train"
)
Latest Results
The latest run on 2024‑01‑20T20:17:56.012496 produced the following scores:
{
"all": {"acc": 0.2555, "acc_stderr": 0.0307, "acc_norm": 0.2564, "acc_norm_stderr": 0.0315, ...},
"harness|arc:challenge|25": {"acc": 0.3464, "acc_stderr": 0.0139, "acc_norm": 0.3626, "acc_norm_stderr": 0.0140},
"harness|hellaswag|10": {"acc": 0.4566, "acc_stderr": 0.0050, "acc_norm": 0.6120, "acc_norm_stderr": 0.0049},
...
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.