Dataset assetOpen Source CommunityNatural Language ProcessingSpanish

curated_20k_spanish

This dataset includes a feature named 'messages', which is a list containing two sub‑features: 'content' (string) and 'role' (string). The dataset is divided into a training split (train) with 20,207 samples, totaling 48,020,454 bytes. The download size is 24,914,380 bytes, and it is licensed under Apache 2.0. The language is Spanish.

Source

huggingface

Created

Dec 15, 2024

Updated

Dec 16, 2024

Signals

108 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Information

Features:
- messages:
  - content: data type is string
  - role: data type is string
Splits:
- train:
  - Bytes: 48020454
  - Samples: 20207
Download Size: 24914380
Dataset Size: 48020454

Configuration

Configuration Name: default
- Data Files:
  - Split: train
  - Path: data/train-*

License

License: apache-200

Language

Language: Spanish (es)

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio