Back to datasets
Dataset assetOpen Source CommunityMachine Learning BenchmarkingData Science
lukaemon/bbh
The BIG-Bench Hard dataset comprises multiple sub‑tasks, each associated with a configuration name such as boolean expressions, causal judgement, date understanding, etc. Each sub‑task contains input and target features, and every configuration has a test set with 250 examples (unless otherwise noted). The dataset is primarily used to evaluate and challenge the performance of natural language processing models on complex tasks.
Source
hugging_face
Created
Nov 28, 2025
Updated
Feb 2, 2023
Signals
410 views
Availability
Linked source ready
Overview
Dataset description and usage context
BIG-Bench Hard Dataset Overview
Dataset List
1. boolean_expressions
- Features:
- input: string
- target: string
- Test Set:
- bytes: 11790
- number of examples: 250
2. causal_judgement
- Features:
- input: string
- target: string
- Test Set:
- bytes: 198021
- number of examples: 187
3. date_understanding
- Features:
- input: string
- target: string
- Test Set:
- bytes: 54666
- number of examples: 250
4. disambiguation_qa
- Features:
- input: string
- target: string
- Test Set:
- bytes: 78620
- number of examples: 250
5. dyck_languages
- Features:
- input: string
- target: string
- Test Set:
- bytes: 38432
- number of examples: 250
6. formal_fallacies
- Features:
- input: string
- target: string
- Test Set:
- bytes: 138224
- number of examples: 250
7. geometric_shapes
- Features:
- input: string
- target: string
- Test Set:
- bytes: 68560
- number of examples: 250
8. hyperbaton
- Features:
- input: string
- target: string
- Test Set:
- bytes: 38574
- number of examples: 250
9. logical_deduction_five_objects
- Features:
- input: string
- target: string
- Test Set:
- bytes: 148595
- number of examples: 250
10. logical_deduction_seven_objects
- Features:
- input: string
- target: string
- Test Set:
- bytes: 191022
- number of examples: 250
11. logical_deduction_three_objects
- Features:
- input: string
- target: string
- Test Set:
- bytes: 105831
- number of examples: 250
12. movie_recommendation
- Features:
- input: string
- target: string
- Test Set:
- bytes: 50985
- number of examples: 250
13. multistep_arithmetic_two
- Features:
- input: string
- target: string
- Test Set:
- bytes: 12943
- number of examples: 250
14. navigate
- Features:
- input: string
- target: string
- Test Set:
- bytes: 49031
- number of examples: 250
15. object_counting
- Features:
- input: string
- target: string
- Test Set:
- bytes: 30508
- number of examples: 250
16. penguins_in_a_table
- Features:
- input: string
- target: string
- Test Set:
- bytes: 70062
- number of examples: 146
17. reasoning_about_colored_objects
- Features:
- input: string
- target: string
- Test Set:
- bytes: 89579
- number of examples: 250
18. ruin_names
- Features:
- input: string
- target: string
- Test Set:
- bytes: 46537
- number of examples: 250
19. salient_translation_error_detection
- Features:
- input: string
- target: string
- Test Set:
- bytes: 277110
- number of examples: 250
20. snarks
- Features:
- input: string
- target: string
- Test Set:
- bytes: 38223
- number of examples: 178
21. sports_understanding
- Features:
- input: string
- target: string
- Test Set:
- bytes: 22723
- number of examples: 250
22. temporal_sequences
- Features:
- input: string
- target: string
- Test Set:
- bytes: 139546
- number of examples: 250
23. tracking_shuffled_objects_five_objects
- Features:
- input: string
- target: string
- Test Set:
- bytes: 162590
- number of examples: 250
24. tracking_shuffled_objects_seven_objects
- Features:
- input: string
- target: string
- Test Set:
- bytes: 207274
- number of examples: 250
25. tracking_shuffled_objects_three_objects
- Features:
- input: string
- target: string
- Test Set:
- bytes: 122104
- number of examples: 250
26. web_of_lies
- Features:
- input: string
- target: string
- Test Set:
- bytes: 47582
- number of examples: 250
27. word_sorting
- Features:
- input: string
- target: string
- Test Set:
- bytes: 60918
- number of examples: 250
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.