HFforLegal/laws

The Laws dataset is a collection of legal texts from various countries, intended to improve legal AI model development by providing a standardized, easily accessible global corpus of legal documents. The dataset includes features such as book name, document content, timestamp, ID, and hash value. It is organized by country, using ISO 3166‑1 alpha‑2 codes to identify the legal documents of each nation. Additionally, the dataset addresses ethical considerations such as privacy, bias, timeliness, and jurisdictional issues.

Updated 9/13/2024

hugging_face

Description

Dataset Overview

Dataset Information

Features:
- book: name or code of the legal book (e.g., "Civil Code", "Penal Code")
- document: full text of the legal document
- timestamp: timestamp of enactment or last update
- id: identifier of each document
- hash: SHA‑256 hash of the document for verification purposes
Splits:
- fr: contains 153,005 samples, total size 151,400,300 bytes
Download Size: 64,396,801 bytes
Dataset Size: 151,400,300 bytes
Configuration:
- default: loads the fr split using the data/fr-* path
License: cc‑by‑4.0
Task Types:
- Question Answering
- Text Generation
- Table Question Answering
Language: French
Tags:
- Law
- Law
- Finance
- Taxation
- δεξιά
- recht
- derecho
Name: The Laws, centralizing legal texts for better use

Objectives

Centralize legal texts worldwide in a common format to facilitate:
1. Comparative legal research
2. Development of multilingual legal AI models
3. Cross‑jurisdictional legal research
4. Improvement of legal tech tools

Dataset Structure

book: name or code of the legal book
document: full text of the legal document
timestamp: timestamp of enactment or last update
id: identifier of each document
hash: SHA‑256 hash of the document for verification purposes

Country Splits

The dataset is organized by country using ISO 3166‑1 alpha‑2 codes:
- France: fr
- United States: us
- United Kingdom: gb
- Germany: de
- Japan: jp
- Brazil: br

Ethical Considerations

Privacy: Ensure all personal information is properly anonymized.
Bias: Be aware of potential biases in source materials and in the selection of laws.
Timeliness: Laws evolve; always verify that the version you use is up‑to‑date.
Jurisdiction: Legal interpretation may vary across jurisdictions. AI models trained on this data should not replace professional legal advice.

Citation

If you use this dataset in your research, please cite the following BibTeX entry:

@misc{HFforLegal2024,
  author = {Louis Brulé Naudet},
  title = {The Laws, centralizing legal texts for better use},
  year = {2024},
  howpublished = {url{https://huggingface.co/datasets/HFforLegal/laws}}
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Legal Texts

AI Models

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →