JUHE API Marketplace
DATASET
Open Source Community

china-ai-law-challenge/cail2018

--- annotations_creators: - found language_creators: - found language: - zh license: - unknown multilinguality: - monolingual size_categories: - 1M<n<10M source_datasets: - original task_categories: - other task_ids: [] paperswithcode_id: chinese-ai-and-law-cail-2018 pretty_name: CAIL 2018 tags: - judgement-prediction dataset_info: features: - name: fact dtype: string - name: relevant_articles sequence: int32 - name: accusation sequence: string - name: punish_of_money dtype: float32 - name: criminals sequence: string - name: death_penalty dtype: bool - name: imprisonment dtype: float32 - name: life_imprisonment dtype: bool splits: - name: exercise_contest_train num_bytes: 220112348 num_examples: 154592 - name: exercise_contest_valid num_bytes: 21702109 num_examples: 17131 - name: exercise_contest_test num_bytes: 41057538 num_examples: 32508 - name: first_stage_train num_bytes: 1779653382 num_examples: 1710856 - name: first_stage_test num_bytes: 244334666 num_examples: 217016 - name: final_test num_bytes: 44194611 num_examples: 35922 download_size: 1167828091 dataset_size: 2351054654 configs: - config_name: default data_files: - split: exercise_contest_train path: data/exercise_contest_train-* - split: exercise_contest_valid path: data/exercise_contest_valid-* - split: exercise_contest_test path: data/exercise_contest_test-* - split: first_stage_train path: data/first_stage_train-* - split: first_stage_test path: data/first_stage_test-* - split: final_test path: data/final_test-* --- --- # Dataset Card for CAIL 2018 ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [Github](https://github.com/thunlp/CAIL/blob/master/README_en.md) - **Repository:** [Github](https://github.com/thunlp/CAIL) - **Paper:** [Arxiv](https://arxiv.org/abs/1807.02478) - **Leaderboard:** - **Point of Contact:** ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@JetRunner](https://github.com/JetRunner) for adding this dataset.

Updated 1/16/2024
hugging_face

Description

Dataset Card for CAIL 2018

Dataset Description

Dataset Summary

  • Language: Chinese
  • License: Unknown
  • Multilinguality: Monolingual
  • Size Category: 1M < n < 10M
  • Source Dataset: Raw data
  • Task Category: Other
  • Paper ID: chinese‑ai‑and‑law‑cail‑2018
  • Tags: judgement‑prediction
  • Dataset Name: CAIL 2018

Dataset Structure

Data Fields

  • fact: string
  • relevant_articles: integer sequence
  • accusation: string sequence
  • punish_of_money: float
  • criminals: string sequence
  • death_penalty: boolean
  • imprisonment: float
  • life_imprisonment: boolean

Data Splits

  • exercise_contest_train: 220,112,348 bytes, 154,592 samples
  • exercise_contest_valid: 21,702,109 bytes, 17,131 samples
  • exercise_contest_test: 41,057,538 bytes, 32,508 samples
  • first_stage_train: 1,779,653,382 bytes, 1,710,856 samples
  • first_stage_test: 244,334,666 bytes, 217,016 samples
  • final_test: 44,194,611 bytes, 35,922 samples

Dataset Size

  • Download Size: 1,167,828,091 bytes
  • Dataset Size: 2,351,054,654 bytes

Configuration

  • Config Name: default
  • Data Files:
    • exercise_contest_train: data/exercise_contest_train-*
    • exercise_contest_valid: data/exercise_contest_valid-*
    • exercise_contest_test: data/exercise_contest_test-*
    • first_stage_train: data/first_stage_train-*
    • first_stage_test: data/first_stage_test-*
    • final_test: data/final_test-*

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Legal Judgment Prediction
Legal Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.