iamnguyen/edu_child_01

The dataset is primarily intended for text analysis and processing, containing text content, metadata, and vector information. The metadata records in detail the answer to a question, identifier, prefix, the question itself, school ID, sequence number, source, tokenized question, URL, and vector data. The dataset is suitable for training models for text understanding and related tasks.

Updated 12/18/2023

hugging_face

Dataset Overview

Dataset Features

content: Data type is string.
metadata: Structured data containing the following fields:
- answer: Data type is string.
- id: Data type is string.
- prefix: Data type is string.
- question: Data type is string.
- school_id: Data type is string.
- seq_num: Data type is integer (int64).
- source: Data type is string.
- tokenized_question: Data type is string.
- url: Data type is string.
- vector: Data type is a sequence of floats (float64).
vector: Data type is a sequence of floats (float64).

Dataset Split

train: Contains 1,015 samples, occupying 18,574,718 bytes.

Dataset Size

Download size: 12,148,966 bytes.
Dataset size: 18,574,718 bytes.

Configuration

default: Includes training data files, located at data/train-*.

iamnguyen/edu_child_01

Description

Dataset Overview

Dataset Features

Dataset Split

Dataset Size

Configuration

AI studio

Access Dataset

Topics

Source