JUHE API Marketplace
DATASET
Open Source Community

Zellic/smart-contract-fiesta

The Zellic 2023 Smart Contract Source Index dataset is a publicly available collection of Ethereum main‑net smart contract source code, intended to provide an easily downloadable resource that advances smart contract security research. It includes address and bytecode hash indices for all deployed contracts up to block 16860349, along with source code gathered from public resources. The dataset de‑duplicates source code by bytecode hash and supplies organized contract directories and metadata.

Updated 4/23/2023
hugging_face

Description

Dataset Overview

Dataset Name

  • Name: Zellic 2023 Smart Contract Source Index
  • Alias: Zellic Smart Contract Source Index

Dataset Description

  • Purpose: Provide a publicly downloadable Ethereum mainnet smart contract source code dataset to advance smart contract security research.
  • Applications: Includes static analysis, machine learning, etc.

Dataset Content

  • Methodology:
    • Collect all contract addresses deployed on the Ethereum mainnet and their EVM bytecode Keccak256 hashes.
    • Build the index by fully syncing from the genesis block using a modified Geth instance.
    • De‑duplicate source code based on bytecode hash.
  • Statistics:
    • Unique Source Codes: 149,386
    • Contracts with Code: 3,897,319
    • Total Smart Contracts in Global Index: 30,586,657
    • Character Count: 6,473,548,073
    • Word Count: 712,444,206
    • Lines of Code: 90,562,628
    • Comment Lines: 62,503,873
    • Blank Lines: 24,485,549
    • Total Lines: 177,552,050
    • Unique Words: 939,288

Dataset Structure

  • Index:
    • Filename: address_bytecodehash_index
    • Content: Mapping of all deployed contract addresses to the Keccak256 hash of their EVM bytecode.
  • Contract Source Code:
    • Storage Location: Under the organized_contracts directory, organized by bytecode hash.
    • Contents: Source files and metadata.json (includes compiler version, optimization settings, etc.).
    • Source Formats: Single‑file, multi‑file, Solidity compiler JSON input.

Additional Information

  • Contract Languages: Not limited to Solidity; includes Vyper and other languages.
  • Source Extraction: A Bash script is provided to extract all source code.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Smart Contracts
Blockchain Security

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.