AdapterOcean/med_alpaca_standardized_cluster_8

--- dataset_info: features: - name: text dtype: string - name: conversation_id dtype: int64 - name: embedding sequence: float64 - name: cluster dtype: int64 splits: - name: train num_bytes: 145562012 num_examples: 14666 download_size: 42803368 dataset_size: 145562012 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "med_alpaca_standardized_cluster_8" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

Updated 10/23/2023

hugging_face

Description

Dataset Overview

Dataset Information

Features:
- text: Data type is string.
- conversation_id: Data type is 64-bit integer.
- embedding: Sequence type is 64-bit float.
- cluster: Data type is 64-bit integer.
Splits:
- train: Contains 14,666 samples, total bytes 145,562,012.
Download Size: 42,803,368 bytes.
Dataset Size: 145,562,012 bytes.

Configuration

Default Configuration:
- Data Files:
  - train: Path is data/train-*.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Natural Language Processing

Text Clustering

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →