JUHE API Marketplace
DATASET
Open Source Community

lukaemon/bbh

The BIG-Bench Hard dataset comprises multiple sub‑tasks, each associated with a configuration name such as boolean expressions, causal judgement, date understanding, etc. Each sub‑task contains input and target features, and every configuration has a test set with 250 examples (unless otherwise noted). The dataset is primarily used to evaluate and challenge the performance of natural language processing models on complex tasks.

Updated 2/2/2023
hugging_face

Description

BIG-Bench Hard Dataset Overview

Dataset List

1. boolean_expressions

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 11790
    • number of examples: 250

2. causal_judgement

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 198021
    • number of examples: 187

3. date_understanding

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 54666
    • number of examples: 250

4. disambiguation_qa

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 78620
    • number of examples: 250

5. dyck_languages

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 38432
    • number of examples: 250

6. formal_fallacies

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 138224
    • number of examples: 250

7. geometric_shapes

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 68560
    • number of examples: 250

8. hyperbaton

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 38574
    • number of examples: 250

9. logical_deduction_five_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 148595
    • number of examples: 250

10. logical_deduction_seven_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 191022
    • number of examples: 250

11. logical_deduction_three_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 105831
    • number of examples: 250

12. movie_recommendation

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 50985
    • number of examples: 250

13. multistep_arithmetic_two

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 12943
    • number of examples: 250

14. navigate

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 49031
    • number of examples: 250

15. object_counting

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 30508
    • number of examples: 250

16. penguins_in_a_table

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 70062
    • number of examples: 146

17. reasoning_about_colored_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 89579
    • number of examples: 250

18. ruin_names

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 46537
    • number of examples: 250

19. salient_translation_error_detection

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 277110
    • number of examples: 250

20. snarks

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 38223
    • number of examples: 178

21. sports_understanding

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 22723
    • number of examples: 250

22. temporal_sequences

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 139546
    • number of examples: 250

23. tracking_shuffled_objects_five_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 162590
    • number of examples: 250

24. tracking_shuffled_objects_seven_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 207274
    • number of examples: 250

25. tracking_shuffled_objects_three_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 122104
    • number of examples: 250

26. web_of_lies

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 47582
    • number of examples: 250

27. word_sorting

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 60918
    • number of examples: 250

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Machine Learning Benchmarking
Data Science

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.