Back to datasets
Dataset assetOpen Source CommunityQuestion Answering SystemsEducation

iamnguyen/edu_child_01

The dataset is primarily intended for text analysis and processing, containing text content, metadata, and vector information. The metadata records in detail the answer to a question, identifier, prefix, the question itself, school ID, sequence number, source, tokenized question, URL, and vector data. The dataset is suitable for training models for text understanding and related tasks.

Source
hugging_face
Created
Nov 28, 2025
Updated
Dec 18, 2023
Signals
69 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Features

  • content: Data type is string.
  • metadata: Structured data containing the following fields:
    • answer: Data type is string.
    • id: Data type is string.
    • prefix: Data type is string.
    • question: Data type is string.
    • school_id: Data type is string.
    • seq_num: Data type is integer (int64).
    • source: Data type is string.
    • tokenized_question: Data type is string.
    • url: Data type is string.
    • vector: Data type is a sequence of floats (float64).
  • vector: Data type is a sequence of floats (float64).

Dataset Split

  • train: Contains 1,015 samples, occupying 18,574,718 bytes.

Dataset Size

  • Download size: 12,148,966 bytes.
  • Dataset size: 18,574,718 bytes.

Configuration

  • default: Includes training data files, located at data/train-*.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio