Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingModel Evaluation

open-llm-leaderboard-old/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO

This dataset was automatically generated during the evaluation runs of the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO. It comprises 63 configurations, each representing a distinct evaluation task. For each run, a split named after the run’s timestamp is created; the "train" split always points to the latest results. An additional "results" configuration stores aggregated metrics for all runs, which are used to compute and display aggregate scores on the Open LLM Leaderboard.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 20, 2024
Signals
59 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Introduction

The dataset was automatically created for evaluating the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO on the Open LLM Leaderboard.

Structure

  • Number of Configurations: 63, each corresponding to a specific evaluation task.
  • Creation Source: Generated from a single run. Each run yields a split named with the run timestamp; the "train" split always points to the most recent results.
  • Additional Configuration: The "results" config stores aggregated results of all runs for computing and displaying metrics on the Open LLM Leaderboard.

Loading Example

from datasets import load_dataset

data = load_dataset(
    "open-llm-leaderboard/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO",
    "harness_winogrande_5",
    split="train"
)

Latest Results

The latest run on 2024‑01‑20T20:17:56.012496 produced the following scores:

{
    "all": {"acc": 0.2555, "acc_stderr": 0.0307, "acc_norm": 0.2564, "acc_norm_stderr": 0.0315, ...},
    "harness|arc:challenge|25": {"acc": 0.3464, "acc_stderr": 0.0139, "acc_norm": 0.3626, "acc_norm_stderr": 0.0140},
    "harness|hellaswag|10": {"acc": 0.4566, "acc_stderr": 0.0050, "acc_norm": 0.6120, "acc_norm_stderr": 0.0049},
    ...
}
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio