Back to datasets
Dataset assetOpen Source CommunityChatbotDialogue Data

NemoSheng/codefuse_fc_v1_sharegpt

The dataset contains dialogues and tool information, primarily for training and testing models. Dialogue information is stored as a list, each dialogue having a source and content field. Tool information is stored as a string. The dataset is split into training and test sets, with 72,032 training examples and 1,250 test examples. Download size 193,720,278 bytes, total size 1,002,393,963 bytes.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 18, 2024
Signals
50 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Data Features

  • conversations:
    • from: string type
    • value: string type
  • tools: string type

Data Splits

  • train:
    • Bytes: 999,501,804.0
    • Samples: 72,032
  • test:
    • Bytes: 2,892,159.0
    • Samples: 1,250

Dataset Size

  • Download size: 193,720,278
  • Total size: 1,002,393,963.0

Configuration

  • default:
    • train: data file path data/train-*
    • test: data file path data/test-*
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio