Back to datasets
Dataset assetOpen Source CommunityMachine LearningChain-of-Thought

gsm8k_synthetic_cot

The dataset includes three primary features—question, chain‑of‑thought, and answer—and is split into training, validation, and test sets containing 385,620, 500, and 1,319 samples respectively. The download size is 50,052,843 bytes and the total size is 91,978,048 bytes.

Source
huggingface
Created
Dec 18, 2024
Updated
Dec 22, 2024
Signals
253 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Language

  • English (en)

License

  • MIT

Dataset Information

Features

  • question: type is string
  • cot: type is sequence of strings
  • answer: type is string

Data Splits

  • train:
    • Bytes: 91430680
    • Samples: 385620
  • valid:
    • Bytes: 147836
    • Samples: 500
  • test:
    • Bytes: 399532
    • Samples: 1319

Data Size

  • Download Size: 50052843 bytes
  • Dataset Size: 91978048 bytes

Configuration

  • config_name: default
    • data_files:
      • train: data/train-*
      • valid: data/valid-*
      • test: data/test-*

Source

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio