JUHE API Marketplace
DATASET
Open Source Community

ShengbinYue/DISC-Law-SFT

The DISC‑Law‑SFT dataset is a high‑quality Chinese legal supervision fine‑tuning dataset designed to improve legal AI systems' abilities in understanding and generating legal text. It consists of two subsets—DISC‑Law‑SFT‑Pair (for introducing legal reasoning) and DISC‑Law‑SFT‑Triplet (for enhancing the model's use of external legal knowledge). The dataset covers numerous legal scenarios such as information extraction, judgment prediction, document summarization, and legal QA. Tasks include legal information extraction, event detection, case classification, judgment prediction, case matching, text summarization, judicial public‑opinion summarization, QA, reading comprehension, and judicial exam. Total size is 403 K entries, suitable for legal assistants, consulting services, and exam preparation.

Updated 10/20/2024
hugging_face

Description

DISC‑Law‑SFT Dataset Overview

Basic Information

  • Name: DISC‑Law‑SFT Dataset
  • Language: Chinese
  • Tags: Legal
  • Size: 100 M < n < 1 B
  • License: Apache‑2.0

Contents

DISC‑Law‑SFT contains two main subsets:

1. DISC‑Law‑SFT‑Pair

  • Purpose: Introduce legal reasoning ability.
  • Tasks & Sizes:
    • Legal Information Extraction: 32 K
    • Legal Event Detection: 27 K
    • Legal Case Classification: 20 K
    • Legal Judgment Prediction: 11 K
    • Legal Case Matching: 8 K
    • Legal Text Summarization: 9 K
    • Judicial Public‑Opinion Summarization: 6 K
    • Legal QA: 93 K
    • Legal Reading Comprehension: 38 K
    • Judicial Exam: 12 K

2. DISC‑Law‑SFT‑Triplet

  • Purpose: Enhance the model's use of external legal knowledge.
  • Tasks & Sizes:
    • Legal Judgment Prediction: 16 K
    • Legal QA: 23 K

Common Portion

  • Tasks & Sizes:
    • Alpaca‑GPT4: 48 K
    • Firefly: 60 K

Total Size

  • Total: 403 K

Availability

  • Status: Most data have been open‑sourced.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Legal AI
Artificial Intelligence

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.