Back to datasets
Dataset assetOpen Source CommunityBlockchain SecuritySmart Contracts
Zellic/smart-contract-fiesta
The Zellic 2023 Smart Contract Source Index dataset is a publicly available collection of Ethereum main‑net smart contract source code, intended to provide an easily downloadable resource that advances smart contract security research. It includes address and bytecode hash indices for all deployed contracts up to block 16860349, along with source code gathered from public resources. The dataset de‑duplicates source code by bytecode hash and supplies organized contract directories and metadata.
Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 23, 2023
Signals
209 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Name: Zellic 2023 Smart Contract Source Index
- Alias: Zellic Smart Contract Source Index
Dataset Description
- Purpose: Provide a publicly downloadable Ethereum mainnet smart contract source code dataset to advance smart contract security research.
- Applications: Includes static analysis, machine learning, etc.
Dataset Content
- Methodology:
- Collect all contract addresses deployed on the Ethereum mainnet and their EVM bytecode Keccak256 hashes.
- Build the index by fully syncing from the genesis block using a modified Geth instance.
- De‑duplicate source code based on bytecode hash.
- Statistics:
- Unique Source Codes: 149,386
- Contracts with Code: 3,897,319
- Total Smart Contracts in Global Index: 30,586,657
- Character Count: 6,473,548,073
- Word Count: 712,444,206
- Lines of Code: 90,562,628
- Comment Lines: 62,503,873
- Blank Lines: 24,485,549
- Total Lines: 177,552,050
- Unique Words: 939,288
Dataset Structure
- Index:
- Filename:
address_bytecodehash_index - Content: Mapping of all deployed contract addresses to the Keccak256 hash of their EVM bytecode.
- Filename:
- Contract Source Code:
- Storage Location: Under the
organized_contractsdirectory, organized by bytecode hash. - Contents: Source files and
metadata.json(includes compiler version, optimization settings, etc.). - Source Formats: Single‑file, multi‑file, Solidity compiler JSON input.
- Storage Location: Under the
Additional Information
- Contract Languages: Not limited to Solidity; includes Vyper and other languages.
- Source Extraction: A Bash script is provided to extract all source code.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.