Explore high-quality datasets for your AI and machine learning projects.
The Zellic 2023 Smart Contract Source Index dataset is a publicly available collection of Ethereum main‑net smart contract source code, intended to provide an easily downloadable resource that advances smart contract security research. It includes address and bytecode hash indices for all deployed contracts up to block 16860349, along with source code gathered from public resources. The dataset de‑duplicates source code by bytecode hash and supplies organized contract directories and metadata.
The dataset contains source code of Solidity smart contracts verified by the Slither static analysis framework and the deployed bytecode, with vulnerabilities classified. Supported tasks include text classification, text generation, and image classification. It was created to provide a large‑scale open dataset for detecting and classifying verified Solidity contract vulnerabilities. The dataset is in English; source code is Solidity. It comprises instances, fields, and splits. Data were collected from Smart Contract Sanctuary, Etherscan, etc., and analyzed with Slither.