JUHE API Marketplace
DATASET
Open Source Community

HFforLegal/laws

The Laws dataset is a collection of legal texts from various countries, intended to improve legal AI model development by providing a standardized, easily accessible global corpus of legal documents. The dataset includes features such as book name, document content, timestamp, ID, and hash value. It is organized by country, using ISO 3166‑1 alpha‑2 codes to identify the legal documents of each nation. Additionally, the dataset addresses ethical considerations such as privacy, bias, timeliness, and jurisdictional issues.

Updated 9/13/2024
hugging_face

Description

Dataset Overview

Dataset Information

  • Features:
    • book: name or code of the legal book (e.g., "Civil Code", "Penal Code")
    • document: full text of the legal document
    • timestamp: timestamp of enactment or last update
    • id: identifier of each document
    • hash: SHA‑256 hash of the document for verification purposes
  • Splits:
    • fr: contains 153,005 samples, total size 151,400,300 bytes
  • Download Size: 64,396,801 bytes
  • Dataset Size: 151,400,300 bytes
  • Configuration:
    • default: loads the fr split using the data/fr-* path
  • License: cc‑by‑4.0
  • Task Types:
    • Question Answering
    • Text Generation
    • Table Question Answering
  • Language: French
  • Tags:
    • Law
    • Law
    • Finance
    • Taxation
    • δεξιά
    • recht
    • derecho
  • Name: The Laws, centralizing legal texts for better use

Objectives

  • Centralize legal texts worldwide in a common format to facilitate:
    1. Comparative legal research
    2. Development of multilingual legal AI models
    3. Cross‑jurisdictional legal research
    4. Improvement of legal tech tools

Dataset Structure

  • book: name or code of the legal book
  • document: full text of the legal document
  • timestamp: timestamp of enactment or last update
  • id: identifier of each document
  • hash: SHA‑256 hash of the document for verification purposes

Country Splits

  • The dataset is organized by country using ISO 3166‑1 alpha‑2 codes:
    • France: fr
    • United States: us
    • United Kingdom: gb
    • Germany: de
    • Japan: jp
    • Brazil: br

Ethical Considerations

  • Privacy: Ensure all personal information is properly anonymized.
  • Bias: Be aware of potential biases in source materials and in the selection of laws.
  • Timeliness: Laws evolve; always verify that the version you use is up‑to‑date.
  • Jurisdiction: Legal interpretation may vary across jurisdictions. AI models trained on this data should not replace professional legal advice.

Citation

  • If you use this dataset in your research, please cite the following BibTeX entry:
@misc{HFforLegal2024,
  author = {Louis Brulé Naudet},
  title = {The Laws, centralizing legal texts for better use},
  year = {2024},
  howpublished = {url{https://huggingface.co/datasets/HFforLegal/laws}}
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Legal Texts
AI Models

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.