HFforLegal/laws
The Laws dataset is a collection of legal texts from various countries, intended to improve legal AI model development by providing a standardized, easily accessible global corpus of legal documents. The dataset includes features such as book name, document content, timestamp, ID, and hash value. It is organized by country, using ISO 3166‑1 alpha‑2 codes to identify the legal documents of each nation. Additionally, the dataset addresses ethical considerations such as privacy, bias, timeliness, and jurisdictional issues.
Description
Dataset Overview
Dataset Information
- Features:
book: name or code of the legal book (e.g., "Civil Code", "Penal Code")document: full text of the legal documenttimestamp: timestamp of enactment or last updateid: identifier of each documenthash: SHA‑256 hash of thedocumentfor verification purposes
- Splits:
fr: contains 153,005 samples, total size 151,400,300 bytes
- Download Size: 64,396,801 bytes
- Dataset Size: 151,400,300 bytes
- Configuration:
default: loads thefrsplit using thedata/fr-*path
- License: cc‑by‑4.0
- Task Types:
- Question Answering
- Text Generation
- Table Question Answering
- Language: French
- Tags:
- Law
- Law
- Finance
- Taxation
- δεξιά
- recht
- derecho
- Name: The Laws, centralizing legal texts for better use
Objectives
- Centralize legal texts worldwide in a common format to facilitate:
- Comparative legal research
- Development of multilingual legal AI models
- Cross‑jurisdictional legal research
- Improvement of legal tech tools
Dataset Structure
book: name or code of the legal bookdocument: full text of the legal documenttimestamp: timestamp of enactment or last updateid: identifier of each documenthash: SHA‑256 hash of thedocumentfor verification purposes
Country Splits
- The dataset is organized by country using ISO 3166‑1 alpha‑2 codes:
- France: fr
- United States: us
- United Kingdom: gb
- Germany: de
- Japan: jp
- Brazil: br
Ethical Considerations
- Privacy: Ensure all personal information is properly anonymized.
- Bias: Be aware of potential biases in source materials and in the selection of laws.
- Timeliness: Laws evolve; always verify that the version you use is up‑to‑date.
- Jurisdiction: Legal interpretation may vary across jurisdictions. AI models trained on this data should not replace professional legal advice.
Citation
- If you use this dataset in your research, please cite the following BibTeX entry:
@misc{HFforLegal2024,
author = {Louis Brulé Naudet},
title = {The Laws, centralizing legal texts for better use},
year = {2024},
howpublished = {url{https://huggingface.co/datasets/HFforLegal/laws}}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.