Dataset assetOpen Source CommunityMachine Learning BenchmarkingData Science

lukaemon/bbh

The BIG-Bench Hard dataset comprises multiple sub‑tasks, each associated with a configuration name such as boolean expressions, causal judgement, date understanding, etc. Each sub‑task contains input and target features, and every configuration has a test set with 250 examples (unless otherwise noted). The dataset is primarily used to evaluate and challenge the performance of natural language processing models on complex tasks.

Source

hugging_face

Created

Nov 28, 2025

Updated

Feb 2, 2023

Signals

410 views

Availability

Linked source ready

Overview

Dataset description and usage context

BIG-Bench Hard Dataset Overview

Dataset List

1. boolean_expressions

Features:
- input: string
- target: string
Test Set:
- bytes: 11790
- number of examples: 250

2. causal_judgement

Features:
- input: string
- target: string
Test Set:
- bytes: 198021
- number of examples: 187

3. date_understanding

Features:
- input: string
- target: string
Test Set:
- bytes: 54666
- number of examples: 250

4. disambiguation_qa

Features:
- input: string
- target: string
Test Set:
- bytes: 78620
- number of examples: 250

5. dyck_languages

Features:
- input: string
- target: string
Test Set:
- bytes: 38432
- number of examples: 250

6. formal_fallacies

Features:
- input: string
- target: string
Test Set:
- bytes: 138224
- number of examples: 250

7. geometric_shapes

Features:
- input: string
- target: string
Test Set:
- bytes: 68560
- number of examples: 250

8. hyperbaton

Features:
- input: string
- target: string
Test Set:
- bytes: 38574
- number of examples: 250

9. logical_deduction_five_objects

Features:
- input: string
- target: string
Test Set:
- bytes: 148595
- number of examples: 250

10. logical_deduction_seven_objects

Features:
- input: string
- target: string
Test Set:
- bytes: 191022
- number of examples: 250

11. logical_deduction_three_objects

Features:
- input: string
- target: string
Test Set:
- bytes: 105831
- number of examples: 250

12. movie_recommendation

Features:
- input: string
- target: string
Test Set:
- bytes: 50985
- number of examples: 250

13. multistep_arithmetic_two

Features:
- input: string
- target: string
Test Set:
- bytes: 12943
- number of examples: 250

14. navigate

Features:
- input: string
- target: string
Test Set:
- bytes: 49031
- number of examples: 250

15. object_counting

Features:
- input: string
- target: string
Test Set:
- bytes: 30508
- number of examples: 250

16. penguins_in_a_table

Features:
- input: string
- target: string
Test Set:
- bytes: 70062
- number of examples: 146

17. reasoning_about_colored_objects

Features:
- input: string
- target: string
Test Set:
- bytes: 89579
- number of examples: 250

18. ruin_names

Features:
- input: string
- target: string
Test Set:
- bytes: 46537
- number of examples: 250

19. salient_translation_error_detection

Features:
- input: string
- target: string
Test Set:
- bytes: 277110
- number of examples: 250

20. snarks

Features:
- input: string
- target: string
Test Set:
- bytes: 38223
- number of examples: 178

21. sports_understanding

Features:
- input: string
- target: string
Test Set:
- bytes: 22723
- number of examples: 250

22. temporal_sequences

Features:
- input: string
- target: string
Test Set:
- bytes: 139546
- number of examples: 250

23. tracking_shuffled_objects_five_objects

Features:
- input: string
- target: string
Test Set:
- bytes: 162590
- number of examples: 250

24. tracking_shuffled_objects_seven_objects

Features:
- input: string
- target: string
Test Set:
- bytes: 207274
- number of examples: 250

25. tracking_shuffled_objects_three_objects

Features:
- input: string
- target: string
Test Set:
- bytes: 122104
- number of examples: 250

26. web_of_lies

Features:
- input: string
- target: string
Test Set:
- bytes: 47582
- number of examples: 250

27. word_sorting

Features:
- input: string
- target: string
Test Set:
- bytes: 60918
- number of examples: 250

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio