open-llm-leaderboard-old/details_CalderaAI__13B-Legerdemain-L2
This dataset was automatically created during the evaluation of the model CalderaAI/13B‑Legerdemain‑L2 on the Open LLM Leaderboard. It consists of 64 configurations, each corresponding to an evaluation task. The dataset was generated from two runs, with each run represented as a specific split within each configuration. The "train" split always points to the latest results. An additional "results" configuration stores aggregated results from all runs for computing and displaying aggregated metrics on the Open LLM Leaderboard. The README also provides an example of how to load run details using the `load_dataset` function from the `datasets` library. The latest run results are provided in JSON format, showing metrics such as EM, F1, and accuracy for various tasks.
Description
Dataset Overview
Dataset Introduction
The dataset is automatically generated during the evaluation of the model CalderaAI/13B‑Legerdemain‑L2 on the Open LLM Leaderboard.
Dataset Structure
- The dataset contains 64 configurations, each corresponding to an evaluation task.
- It is created from two runs; each run appears as a specific split within each configuration, with split names using timestamps.
- The "train" split always points to the latest results.
- An extra configuration named "results" stores aggregated results from all runs for computing and displaying aggregated metrics on the Open LLM Leaderboard.
Data Loading Example
from datasets import load_dataset
data = load_dataset(
"open-llm-leaderboard/details_CalderaAI__13B-Legerdemain-L2",
"harness_winogrande_5",
split="train"
)
Latest Results
The most recent run (2023‑10‑12T20:33:10.328879) yields:
{
"all": {
"em": 0.002726510067114094,
"em_stderr": 0.0005340111700415904,
"f1": 0.06216547818791966,
"f1_stderr": 0.0013785278979549318,
"acc": 0.4412861505062612,
"acc_stderr": 0.010705008172209724
},
"harness|drop|3": {
"em": 0.002726510067114094,
"em_stderr": 0.0005340111700415904,
"f1": 0.06216547818791966,
"f1_stderr": 0.0013785278979549318
},
"harness|gsm8k|5": {
"acc": 0.13040181956027294,
"acc_stderr": 0.0092756303245541
},
"harness|winogrande|5": {
"acc": 0.7521704814522494,
"acc_stderr": 0.01213438601986535
}
}
Configuration Details
Examples of configuration entries:
-
harness_arc_challenge_25
- Split: 2023_08_09T11_34_37.986977
- Path:
**/details_harness|arc:challenge|25_2023-08-09T11:34:37.986977.parquet - Split: latest
- Path: same as above
-
harness_drop_3
- Split: 2023_10_12T20_33_10.328879
- Path:
**/details_harness|drop|3_2023-10-12T20-33-10.328879.parquet - Split: latest
- Path: same as above
(Additional configurations follow the same pattern.)
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.