Dataset assetOpen Source CommunityArtificial IntelligenceLegal AI

ShengbinYue/DISC-Law-SFT

The DISC‑Law‑SFT dataset is a high‑quality Chinese legal supervision fine‑tuning dataset designed to improve legal AI systems' abilities in understanding and generating legal text. It consists of two subsets—DISC‑Law‑SFT‑Pair (for introducing legal reasoning) and DISC‑Law‑SFT‑Triplet (for enhancing the model's use of external legal knowledge). The dataset covers numerous legal scenarios such as information extraction, judgment prediction, document summarization, and legal QA. Tasks include legal information extraction, event detection, case classification, judgment prediction, case matching, text summarization, judicial public‑opinion summarization, QA, reading comprehension, and judicial exam. Total size is 403 K entries, suitable for legal assistants, consulting services, and exam preparation.

Source

hugging_face

Created

Nov 28, 2025

Updated

Oct 20, 2024

Signals

1,540 views

Availability

Linked source ready

Overview

Dataset description and usage context

DISC‑Law‑SFT Dataset Overview

Basic Information

Name: DISC‑Law‑SFT Dataset
Language: Chinese
Tags: Legal
Size: 100 M < n < 1 B
License: Apache‑2.0

DISC‑Law‑SFT contains two main subsets:

1. DISC‑Law‑SFT‑Pair

Purpose: Introduce legal reasoning ability.
Tasks & Sizes:
- Legal Information Extraction: 32 K
- Legal Event Detection: 27 K
- Legal Case Classification: 20 K
- Legal Judgment Prediction: 11 K
- Legal Case Matching: 8 K
- Legal Text Summarization: 9 K
- Judicial Public‑Opinion Summarization: 6 K
- Legal QA: 93 K
- Legal Reading Comprehension: 38 K
- Judicial Exam: 12 K

2. DISC‑Law‑SFT‑Triplet

Purpose: Enhance the model's use of external legal knowledge.
Tasks & Sizes:
- Legal Judgment Prediction: 16 K
- Legal QA: 23 K

Common Portion

Tasks & Sizes:
- Alpaca‑GPT4: 48 K
- Firefly: 60 K

Total Size

Total: 403 K

Availability

Status: Most data have been open‑sourced.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio

ShengbinYue/DISC-Law-SFT

Dataset description and usage context

DISC‑Law‑SFT Dataset Overview

Basic Information

Contents

1. DISC‑Law‑SFT‑Pair

2. DISC‑Law‑SFT‑Triplet

Common Portion

Total Size

Availability

Pair the dataset with AI analysis and content workflows.