JUHE API Marketplace
DATASET
Open Source Community

open-llm-leaderboard-old/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO

This dataset was automatically generated during the evaluation runs of the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO. It comprises 63 configurations, each representing a distinct evaluation task. For each run, a split named after the run’s timestamp is created; the "train" split always points to the latest results. An additional "results" configuration stores aggregated metrics for all runs, which are used to compute and display aggregate scores on the Open LLM Leaderboard.

Updated 1/20/2024
hugging_face

Description

Dataset Overview

Dataset Introduction

The dataset was automatically created for evaluating the model Danielbrdz/Barcenas‑Tiny‑1.1b‑DPO on the Open LLM Leaderboard.

Structure

  • Number of Configurations: 63, each corresponding to a specific evaluation task.
  • Creation Source: Generated from a single run. Each run yields a split named with the run timestamp; the "train" split always points to the most recent results.
  • Additional Configuration: The "results" config stores aggregated results of all runs for computing and displaying metrics on the Open LLM Leaderboard.

Loading Example

from datasets import load_dataset

data = load_dataset(
    "open-llm-leaderboard/details_Danielbrdz__Barcenas-Tiny-1.1b-DPO",
    "harness_winogrande_5",
    split="train"
)

Latest Results

The latest run on 2024‑01‑20T20:17:56.012496 produced the following scores:

{
    "all": {"acc": 0.2555, "acc_stderr": 0.0307, "acc_norm": 0.2564, "acc_norm_stderr": 0.0315, ...},
    "harness|arc:challenge|25": {"acc": 0.3464, "acc_stderr": 0.0139, "acc_norm": 0.3626, "acc_norm_stderr": 0.0140},
    "harness|hellaswag|10": {"acc": 0.4566, "acc_stderr": 0.0050, "acc_norm": 0.6120, "acc_norm_stderr": 0.0049},
    ...
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Model Evaluation
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.