Back to datasets
Dataset assetOpen Source CommunityLegal TextsAI Models
HFforLegal/laws
The Laws dataset is a collection of legal texts from various countries, intended to improve legal AI model development by providing a standardized, easily accessible global corpus of legal documents. The dataset includes features such as book name, document content, timestamp, ID, and hash value. It is organized by country, using ISO 3166‑1 alpha‑2 codes to identify the legal documents of each nation. Additionally, the dataset addresses ethical considerations such as privacy, bias, timeliness, and jurisdictional issues.
Source
hugging_face
Created
Nov 28, 2025
Updated
Sep 13, 2024
Signals
332 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Information
- Features:
book: name or code of the legal book (e.g., "Civil Code", "Penal Code")document: full text of the legal documenttimestamp: timestamp of enactment or last updateid: identifier of each documenthash: SHA‑256 hash of thedocumentfor verification purposes
- Splits:
fr: contains 153,005 samples, total size 151,400,300 bytes
- Download Size: 64,396,801 bytes
- Dataset Size: 151,400,300 bytes
- Configuration:
default: loads thefrsplit using thedata/fr-*path
- License: cc‑by‑4.0
- Task Types:
- Question Answering
- Text Generation
- Table Question Answering
- Language: French
- Tags:
- Law
- Law
- Finance
- Taxation
- δεξιά
- recht
- derecho
- Name: The Laws, centralizing legal texts for better use
Objectives
- Centralize legal texts worldwide in a common format to facilitate:
- Comparative legal research
- Development of multilingual legal AI models
- Cross‑jurisdictional legal research
- Improvement of legal tech tools
Dataset Structure
book: name or code of the legal bookdocument: full text of the legal documenttimestamp: timestamp of enactment or last updateid: identifier of each documenthash: SHA‑256 hash of thedocumentfor verification purposes
Country Splits
- The dataset is organized by country using ISO 3166‑1 alpha‑2 codes:
- France: fr
- United States: us
- United Kingdom: gb
- Germany: de
- Japan: jp
- Brazil: br
Ethical Considerations
- Privacy: Ensure all personal information is properly anonymized.
- Bias: Be aware of potential biases in source materials and in the selection of laws.
- Timeliness: Laws evolve; always verify that the version you use is up‑to‑date.
- Jurisdiction: Legal interpretation may vary across jurisdictions. AI models trained on this data should not replace professional legal advice.
Citation
- If you use this dataset in your research, please cite the following BibTeX entry:
@misc{HFforLegal2024,
author = {Louis Brulé Naudet},
title = {The Laws, centralizing legal texts for better use},
year = {2024},
howpublished = {url{https://huggingface.co/datasets/HFforLegal/laws}}
}
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.