Back to datasets
Dataset assetOpen Source CommunityMachine Learning BenchmarkingData Science

lukaemon/bbh

The BIG-Bench Hard dataset comprises multiple sub‑tasks, each associated with a configuration name such as boolean expressions, causal judgement, date understanding, etc. Each sub‑task contains input and target features, and every configuration has a test set with 250 examples (unless otherwise noted). The dataset is primarily used to evaluate and challenge the performance of natural language processing models on complex tasks.

Source
hugging_face
Created
Nov 28, 2025
Updated
Feb 2, 2023
Signals
410 views
Availability
Linked source ready
Overview

Dataset description and usage context

BIG-Bench Hard Dataset Overview

Dataset List

1. boolean_expressions

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 11790
    • number of examples: 250

2. causal_judgement

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 198021
    • number of examples: 187

3. date_understanding

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 54666
    • number of examples: 250

4. disambiguation_qa

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 78620
    • number of examples: 250

5. dyck_languages

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 38432
    • number of examples: 250

6. formal_fallacies

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 138224
    • number of examples: 250

7. geometric_shapes

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 68560
    • number of examples: 250

8. hyperbaton

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 38574
    • number of examples: 250

9. logical_deduction_five_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 148595
    • number of examples: 250

10. logical_deduction_seven_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 191022
    • number of examples: 250

11. logical_deduction_three_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 105831
    • number of examples: 250

12. movie_recommendation

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 50985
    • number of examples: 250

13. multistep_arithmetic_two

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 12943
    • number of examples: 250

14. navigate

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 49031
    • number of examples: 250

15. object_counting

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 30508
    • number of examples: 250

16. penguins_in_a_table

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 70062
    • number of examples: 146

17. reasoning_about_colored_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 89579
    • number of examples: 250

18. ruin_names

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 46537
    • number of examples: 250

19. salient_translation_error_detection

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 277110
    • number of examples: 250

20. snarks

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 38223
    • number of examples: 178

21. sports_understanding

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 22723
    • number of examples: 250

22. temporal_sequences

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 139546
    • number of examples: 250

23. tracking_shuffled_objects_five_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 162590
    • number of examples: 250

24. tracking_shuffled_objects_seven_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 207274
    • number of examples: 250

25. tracking_shuffled_objects_three_objects

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 122104
    • number of examples: 250

26. web_of_lies

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 47582
    • number of examples: 250

27. word_sorting

  • Features:
    • input: string
    • target: string
  • Test Set:
    • bytes: 60918
    • number of examples: 250
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio