Back to datasets
Dataset assetOpen Source CommunityQuestion Answering SystemsEducation
iamnguyen/edu_child_01
The dataset is primarily intended for text analysis and processing, containing text content, metadata, and vector information. The metadata records in detail the answer to a question, identifier, prefix, the question itself, school ID, sequence number, source, tokenized question, URL, and vector data. The dataset is suitable for training models for text understanding and related tasks.
Source
hugging_face
Created
Nov 28, 2025
Updated
Dec 18, 2023
Signals
69 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Features
- content: Data type is string.
- metadata: Structured data containing the following fields:
- answer: Data type is string.
- id: Data type is string.
- prefix: Data type is string.
- question: Data type is string.
- school_id: Data type is string.
- seq_num: Data type is integer (int64).
- source: Data type is string.
- tokenized_question: Data type is string.
- url: Data type is string.
- vector: Data type is a sequence of floats (float64).
- vector: Data type is a sequence of floats (float64).
Dataset Split
- train: Contains 1,015 samples, occupying 18,574,718 bytes.
Dataset Size
- Download size: 12,148,966 bytes.
- Dataset size: 18,574,718 bytes.
Configuration
- default: Includes training data files, located at
data/train-*.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.