Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingSpanish
curated_20k_spanish
This dataset includes a feature named 'messages', which is a list containing two sub‑features: 'content' (string) and 'role' (string). The dataset is divided into a training split (train) with 20,207 samples, totaling 48,020,454 bytes. The download size is 24,914,380 bytes, and it is licensed under Apache 2.0. The language is Spanish.
Source
huggingface
Created
Dec 15, 2024
Updated
Dec 16, 2024
Signals
108 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Information
- Features:
- messages:
- content: data type is string
- role: data type is string
- messages:
- Splits:
- train:
- Bytes: 48020454
- Samples: 20207
- train:
- Download Size: 24914380
- Dataset Size: 48020454
Configuration
- Configuration Name: default
- Data Files:
- Split: train
- Path: data/train-*
- Data Files:
License
- License: apache-200
Language
- Language: Spanish (es)
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.