Back to datasets
Dataset assetOpen Source CommunityMachine LearningText Classification

C-MTEB/TNews-classification

--- configs: - config_name: default data_files: - split: test path: data/test-* - split: train path: data/train-* - split: validation path: data/validation-* dataset_info: features: - name: text dtype: string - name: label dtype: class_label: names: '0': '100' '1': '101' '2': '102' '3': '103' '4': '104' '5': '106' '6': '107' '7': '108' '8': '109' '9': '110' '10': '112' '11': '113' '12': '114' '13': '115' '14': '116' - name: idx dtype: int32 splits: - name: test num_bytes: 810970 num_examples: 10000 - name: train num_bytes: 4245677 num_examples: 53360 - name: validation num_bytes: 797922 num_examples: 10000 download_size: 4697191 dataset_size: 5854569 --- # Dataset Card for "TNews-classification" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 28, 2023
Signals
108 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Configuration Information

  • Default Configuration:
    • Data Files:
      • Test Set: path data/test-*
      • Training Set: path data/train-*
      • Validation Set: path data/validation-*

Dataset Information

  • Features:

    • Text: string type
    • Label: categorical type with the following names:
      • 0: 100
      • 1: 101
      • 2: 102
      • 3: 103
      • 4: 104
      • 5: 106
      • 6: 107
      • 7: 108
      • 8: 109
      • 9: 110
      • 10: 112
      • 11: 113
      • 12: 114
      • 13: 115
      • 14: 116
    • Index: int32 type
  • Dataset Splits:

    • Test Set:
      • Bytes: 810,970
      • Samples: 10,000
    • Training Set:
      • Bytes: 4,245,677
      • Samples: 53,360
    • Validation Set:
      • Bytes: 797,922
      • Samples: 10,000
  • Dataset Size:

    • Download Size: 4,697,191 bytes
    • Dataset Size: 5,854,569 bytes
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio