Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingText Clustering

AdapterOcean/med_alpaca_standardized_cluster_8

--- dataset_info: features: - name: text dtype: string - name: conversation_id dtype: int64 - name: embedding sequence: float64 - name: cluster dtype: int64 splits: - name: train num_bytes: 145562012 num_examples: 14666 download_size: 42803368 dataset_size: 145562012 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "med_alpaca_standardized_cluster_8" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

Source
hugging_face
Created
Nov 28, 2025
Updated
Oct 23, 2023
Signals
57 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Information

  • Features:
    • text: Data type is string.
    • conversation_id: Data type is 64-bit integer.
    • embedding: Sequence type is 64-bit float.
    • cluster: Data type is 64-bit integer.
  • Splits:
    • train: Contains 14,666 samples, total bytes 145,562,012.
  • Download Size: 42,803,368 bytes.
  • Dataset Size: 145,562,012 bytes.

Configuration

  • Default Configuration:
    • Data Files:
      • train: Path is data/train-*.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio