Dataset assetOpen Source CommunityLegal AnalysisLegal Judgment Prediction

china-ai-law-challenge/cail2018

--- annotations_creators: - found language_creators: - found language: - zh license: - unknown multilinguality: - monolingual size_categories: - 1M<n<10M source_datasets: - original task_categories: - other task_ids: [] paperswithcode_id: chinese-ai-and-law-cail-2018 pretty_name: CAIL 2018 tags: - judgement-prediction dataset_info: features: - name: fact dtype: string - name: relevant_articles sequence: int32 - name: accusation sequence: string - name: punish_of_money dtype: float32 - name: criminals sequence: string - name: death_penalty dtype: bool - name: imprisonment dtype: float32 - name: life_imprisonment dtype: bool splits: - name: exercise_contest_train num_bytes: 220112348 num_examples: 154592 - name: exercise_contest_valid num_bytes: 21702109 num_examples: 17131 - name: exercise_contest_test num_bytes: 41057538 num_examples: 32508 - name: first_stage_train num_bytes: 1779653382 num_examples: 1710856 - name: first_stage_test num_bytes: 244334666 num_examples: 217016 - name: final_test num_bytes: 44194611 num_examples: 35922 download_size: 1167828091 dataset_size: 2351054654 configs: - config_name: default data_files: - split: exercise_contest_train path: data/exercise_contest_train-* - split: exercise_contest_valid path: data/exercise_contest_valid-* - split: exercise_contest_test path: data/exercise_contest_test-* - split: first_stage_train path: data/first_stage_train-* - split: first_stage_test path: data/first_stage_test-* - split: final_test path: data/final_test-* --- --- # Dataset Card for CAIL 2018 ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [Github](https://github.com/thunlp/CAIL/blob/master/README_en.md) - **Repository:** [Github](https://github.com/thunlp/CAIL) - **Paper:** [Arxiv](https://arxiv.org/abs/1807.02478) - **Leaderboard:** - **Point of Contact:** ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@JetRunner](https://github.com/JetRunner) for adding this dataset.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jan 16, 2024

Signals

596 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Card for CAIL 2018

Dataset Description

Dataset Summary

Language: Chinese
License: Unknown
Multilinguality: Monolingual
Size Category: 1M < n < 10M
Source Dataset: Raw data
Task Category: Other
Paper ID: chinese‑ai‑and‑law‑cail‑2018
Tags: judgement‑prediction
Dataset Name: CAIL 2018

Dataset Structure

Data Fields

fact: string
relevant_articles: integer sequence
accusation: string sequence
punish_of_money: float
criminals: string sequence
death_penalty: boolean
imprisonment: float
life_imprisonment: boolean

Data Splits

exercise_contest_train: 220,112,348 bytes, 154,592 samples
exercise_contest_valid: 21,702,109 bytes, 17,131 samples
exercise_contest_test: 41,057,538 bytes, 32,508 samples
first_stage_train: 1,779,653,382 bytes, 1,710,856 samples
first_stage_test: 244,334,666 bytes, 217,016 samples
final_test: 44,194,611 bytes, 35,922 samples

Dataset Size

Download Size: 1,167,828,091 bytes
Dataset Size: 2,351,054,654 bytes

Configuration

Config Name: default
Data Files:
- exercise_contest_train: data/exercise_contest_train-*
- exercise_contest_valid: data/exercise_contest_valid-*
- exercise_contest_test: data/exercise_contest_test-*
- first_stage_train: data/first_stage_train-*
- first_stage_test: data/first_stage_test-*
- final_test: data/final_test-*

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio