Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingSentence Similarity

C-MTEB/LCQMC

--- configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* dataset_info: features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: score dtype: int32 splits: - name: train num_bytes: 18419299 num_examples: 238766 - name: validation num_bytes: 760701 num_examples: 8802 - name: test num_bytes: 876457 num_examples: 12500 download_size: 14084841 dataset_size: 20056457 --- # Dataset Card for "LCQMC" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 28, 2023
Signals
322 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Configuration

  • Default configuration (config_name: default)
    • Training data (split: train): data/train-*
    • Validation data (split: validation): data/validation-*
    • Test data (split: test): data/test-*

Dataset Information

  • Features

    • sentence1: type string
    • sentence2: type string
    • score: type int32
  • Split Details

    • Training (name: train)
      • Bytes: 18,419,299
      • Samples: 238,766
    • Validation (name: validation)
      • Bytes: 760,701
      • Samples: 8,802
    • Test (name: test)
      • Bytes: 876,457
      • Samples: 12,500
  • Dataset Size

    • Download size: 14,084,841
    • Total size: 20,056,457
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio