Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingSpanish

curated_20k_spanish

This dataset includes a feature named 'messages', which is a list containing two sub‑features: 'content' (string) and 'role' (string). The dataset is divided into a training split (train) with 20,207 samples, totaling 48,020,454 bytes. The download size is 24,914,380 bytes, and it is licensed under Apache 2.0. The language is Spanish.

Source
huggingface
Created
Dec 15, 2024
Updated
Dec 16, 2024
Signals
108 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Information

  • Features:
    • messages:
      • content: data type is string
      • role: data type is string
  • Splits:
    • train:
      • Bytes: 48020454
      • Samples: 20207
  • Download Size: 24914380
  • Dataset Size: 48020454

Configuration

  • Configuration Name: default
    • Data Files:
      • Split: train
      • Path: data/train-*

License

  • License: apache-200

Language

  • Language: Spanish (es)
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio