open-llm-leaderboard-old/details_OpenBuddy__openbuddy-qwen1.5-14b-v21.1-32k
This dataset was automatically created during the evaluation run of the model OpenBuddy/openbuddy‑qwen1.5‑14b‑v21.1‑32k for evaluation on the Open LLM Leaderboard. The dataset comprises 63 configurations, each corresponding to an evaluation task. The dataset is generated from a single run; each run can be found in each configuration, with splits named after the run timestamp. The 'train' split always points to the latest results. Additionally, a 'results' configuration stores aggregated results of all runs for computing and displaying aggregated metrics on the Open LLM Leaderboard.
Description
Dataset Overview
Dataset Name
Evaluation run of OpenBuddy/openbuddy-qwen1.5-14b-v21.1-32k
Dataset Summary
This dataset was automatically created during the evaluation of the model OpenBuddy/openbuddy-qwen1.5-14b-v21.1-32k on the Open LLM Leaderboard.
Dataset Composition
- The dataset contains 63 configurations, each corresponding to an evaluation task.
- It is generated from a single run; each run can be accessed as a specific split within each configuration, with the split name using the run timestamp.
- The "train" split always points to the latest results.
- An additional configuration "results" stores aggregated results of all runs, used for computing and displaying aggregated metrics on the Open LLM Leaderboard.
Data Loading Example
from datasets import load_dataset
data = load_dataset(
"open-llm-leaderboard/details_OpenBuddy__openbuddy-qwen1.5-14b-v21.1-32k",
"harness_winogrande_5",
split="train"
)
Latest Results
The latest results are from the 2024-04-09T06:57:17.996714 run:
{
"all": {
"acc": 0.6783573743130548,
"acc_stderr": 0.031630411639720406,
"acc_norm": 0.6843006798291303,
"acc_norm_stderr": 0.032244439733683676,
"mc1": 0.39657282741738065,
"mc1_stderr": 0.017124930942023518,
"mc2": 0.5584410548633238,
"mc2_stderr": 0.014920454151130717
},
"harness|arc:challenge|25": {
"acc": 0.5358361774744027,
"acc_stderr": 0.01457381366473572,
"acc_norm": 0.5793515358361775,
"acc_norm_stderr": 0.014426211252508403
},
"harness|hellaswag|10": {
"acc": 0.5926110336586338,
"acc_stderr": 0.004903441680003823,
"acc_norm": 0.788388767177853,
"acc_norm_stderr": 0.004076158744346766
},
"harness|hendrycksTest-abstract_algebra|5": {
"acc": 0.38,
"acc_stderr": 0.048783173121456316,
"acc_norm": 0.38,
"acc_norm_stderr": 0.048783173121456316
},
"harness|hendrycksTest-anatomy|5": {
"acc": 0.6222222222222222,
"acc_stderr": 0.04188307537595852,
"acc_norm": 0.6222222222222222,
"acc_norm_stderr": 0.04188307537595852
},
"harness|hendrycksTest-astronomy|5": {
"acc": 0.7763157894736842,
"acc_stderr": 0.033911609343436025,
"acc_norm": 0.7763157894736842,
"acc_norm_stderr": 0.033911609343436025
},
"harness|hendrycksTest-business_ethics|5": {
"acc": 0.75,
"acc_stderr": 0.04351941398892446,
"acc_norm": 0.75,
"acc_norm_stderr": 0.04351941398892446
},
"harness|hendrycksTest-clinical_knowledge|5": {
"acc": 0.7245283018867924,
"acc_stderr": 0.027495663683724057,
"acc_norm": 0.7245283018867924,
"acc_norm_stderr": 0.027495663683724057
},
"harness|hendrycksTest-college_biology|5": {
"acc": 0.7222222222222222,
"acc_stderr": 0.03745554791462457,
"acc_norm": 0.7222222222222222,
"acc_norm_stderr": 0.03745554791462457
},
"harness|hendrycksTest-college_chemistry|5": {
"acc": 0.55,
"acc_stderr": 0.05,
"acc_norm": 0.55,
"acc_norm_stderr": 0.05
},
"harness|hendrycksTest-college_computer_science|5": {
"acc": 0.6,
"acc_stderr": 0.04923659639173309,
"acc_norm": 0.6,
"acc_norm_stderr": 0.04923659639173309
},
"harness|hendrycksTest-college_mathematics|5": {
"acc": 0.48,
"acc_stderr": 0.05021167315686779,
"acc_norm": 0.48,
"acc_norm_stderr": 0.05021167315686779
},
"harness|hendrycksTest-college_medicine|5": {
"acc": 0.6994219653179191,
"acc_stderr": 0.0349610148119118,
"acc_norm": 0.6994219653179191,
"acc_norm_stderr": 0.0349610148119118
},
"harness|hendrycksTest-college_physics|5": {
"acc": 0.4215686274509804,
"acc_stderr": 0.049135952012744975,
"acc_norm": 0.4215686274509804,
"acc_norm_stderr": 0.049135952012744975
},
"harness|hendrycksTest-computer_security|5": {
"acc": 0.81,
"acc_stderr": 0.039427724440366234,
"acc_norm": 0.81,
"acc_norm_stderr": 0.039427724440366234
},
"harness|hendrycksTest-conceptual_physics|5": {
"acc": 0.6723404255319149,
"acc_stderr": 0.030683020843231004,
"acc_norm": 0.6723404255319149,
"acc_norm_stderr": 0.030683020843231004
},
"harness|hendrycksTest-econometrics|5": {
"acc": 0.5614035087719298,
"acc_stderr": 0.04668000738510455,
"acc_norm": 0.5614035087719298,
"acc_norm_stderr": 0.04668000738510455
},
"harness|hendrycksTest-electrical_engineering|5": {
"acc": 0.7103448275862069,
"acc_stderr": 0.03780019230438014,
"acc_norm": 0.7103448275862069,
"acc_norm_stderr": 0.03780019230438014
},
"harness|hendrycksTest-elementary_mathematics|5": {
"acc": 0.5555555555555556,
"acc_stderr": 0.02559185776138218,
"acc_norm": 0.5555555555555556,
"acc_norm_stderr": 0.02559185776138218
},
"harness|hendrycksTest-formal_logic|5": {
"acc": 0.5317460317460317,
"acc_stderr": 0.04463112720677172,
"acc_norm": 0.5317460317460317,
"acc_norm_stderr": 0.04463112720677172
},
"harness|hendrycksTest-global_facts|5": {
"acc": 0.44,
"acc_stderr": 0.04988876515698589,
"acc_norm": 0.44,
"acc_norm_stderr": 0.04988876515698589
},
"harness|hendrycksTest-high_school_biology|5": {
"acc": 0.8161290322580645,
"acc_stderr": 0.02203721734026782,
"acc_norm": 0.8161290322580645,
"acc_norm_stderr": 0.02203721734026782
},
"harness|hendrycksTest-high_school_chemistry|5": {
"acc": 0.5960591133004927,
"acc_stderr": 0.03452453903822032,
"acc_norm": 0.5960591133004927,
"acc_norm_stderr": 0.03452453903822032
},
"harness|hendrycksTest-high_school_computer_science|5": {
"acc": 0.75,
"acc_stderr": 0.04351941398892446,
"acc_norm": 0.75,
"acc_norm_stderr": 0.04351941398892446
},
"harness|hendrycksTest-high_school_european_history|5": {
"acc": 0.8363636363636363,
"acc_stderr": 0.02888787239548795,
"acc_norm": 0.8363636363636363,
"acc_norm_stderr": 0.02888787239548795
},
"harness|hendrycksTest-high_school_geography|5": {
"acc": 0.8737373737373737,
"acc_stderr": 0.023664359402880215,
"acc_norm": 0.8737373737373737,
"acc_norm_stderr": 0.023664359402880215
},
"harness|hendrycksTest-high_school_government_and_politics|5": {
"acc": 0.8911917098445595,
"acc_stderr": 0.02247325333276875,
"acc_norm": 0.8911917098445595,
"acc_norm_stderr": 0.02247325333276875
},
"harness|hendrycksTest-high_school_macroeconomics|5": {
"acc": 0.6974358974358974,
"acc_stderr": 0.023290888053772732,
"acc_norm": 0.6974358974358974,
"acc_norm_stderr": 0.023290888053772732
},
"harness|hendrycksTest-high_school_mathematics|5": {
"acc": 0.4074074074074074"}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.