Zellic/smart-contract-fiesta
The Zellic 2023 Smart Contract Source Index dataset is a publicly available collection of Ethereum main‑net smart contract source code, intended to provide an easily downloadable resource that advances smart contract security research. It includes address and bytecode hash indices for all deployed contracts up to block 16860349, along with source code gathered from public resources. The dataset de‑duplicates source code by bytecode hash and supplies organized contract directories and metadata.
Description
Dataset Overview
Dataset Name
- Name: Zellic 2023 Smart Contract Source Index
- Alias: Zellic Smart Contract Source Index
Dataset Description
- Purpose: Provide a publicly downloadable Ethereum mainnet smart contract source code dataset to advance smart contract security research.
- Applications: Includes static analysis, machine learning, etc.
Dataset Content
- Methodology:
- Collect all contract addresses deployed on the Ethereum mainnet and their EVM bytecode Keccak256 hashes.
- Build the index by fully syncing from the genesis block using a modified Geth instance.
- De‑duplicate source code based on bytecode hash.
- Statistics:
- Unique Source Codes: 149,386
- Contracts with Code: 3,897,319
- Total Smart Contracts in Global Index: 30,586,657
- Character Count: 6,473,548,073
- Word Count: 712,444,206
- Lines of Code: 90,562,628
- Comment Lines: 62,503,873
- Blank Lines: 24,485,549
- Total Lines: 177,552,050
- Unique Words: 939,288
Dataset Structure
- Index:
- Filename:
address_bytecodehash_index - Content: Mapping of all deployed contract addresses to the Keccak256 hash of their EVM bytecode.
- Filename:
- Contract Source Code:
- Storage Location: Under the
organized_contractsdirectory, organized by bytecode hash. - Contents: Source files and
metadata.json(includes compiler version, optimization settings, etc.). - Source Formats: Single‑file, multi‑file, Solidity compiler JSON input.
- Storage Location: Under the
Additional Information
- Contract Languages: Not limited to Solidity; includes Vyper and other languages.
- Source Extraction: A Bash script is provided to extract all source code.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.