mwritescode/slither-audited-smart-contracts
The dataset contains source code of Solidity smart contracts verified by the Slither static analysis framework and the deployed bytecode, with vulnerabilities classified. Supported tasks include text classification, text generation, and image classification. It was created to provide a large‑scale open dataset for detecting and classifying verified Solidity contract vulnerabilities. The dataset is in English; source code is Solidity. It comprises instances, fields, and splits. Data were collected from Smart Contract Sanctuary, Etherscan, etc., and analyzed with Slither.
Description
Dataset Overview
Dataset Name
- Name: Slither Audited Smart Contracts
- Alias: Slither Audited Smart Contracts Dataset
Basic Information
- Language: English
- License: MIT
- Multilinguality: Monolingual
- Size: 100K < n < 1M
- Source: Raw data
- Task Types: Text classification, Text generation
- Task IDs: Multi‑label classification, Multi‑input text classification, Language modeling
Description
- Overview: The dataset contains source code and deployed bytecode of Solidity smart contracts verified by the Slither static analysis framework, along with vulnerabilities classified by Slither.
- Supported Tasks:
- Text classification: Train models for binary and multi‑label classification of contract bytecode and source code.
- Text generation: Train language models for the Solidity programming language.
- Image classification: Convert bytecode to RGB images for CNN‑based vulnerability detection.
- Language: Annotations are in English; all source code is Solidity.
Structure
- Instances: Each instance includes address, source code, and bytecode. Labels are provided in two configurations: a cleaned textual version of Slither output, and a multi‑label version consisting of integer lists representing specific vulnerability classes. Label 4 indicates contract safety.
- Fields:
address: string, Ethereum main‑net contract address.source_code: flattened Solidity code.bytecode: string, contract bytecode obtained viaweb3.eth.getCode().slither: cleaned JSON output from Slither or list of class labels.
- Splits: Six configurations exist; those without the
all-prefix provide train, test, and validation splits, with test and validation each comprising ~15 % of the total data.
Creation
- Rationale: Provide a large, free dataset for detecting and classifying verified Solidity contract vulnerabilities.
- Source: Built from verified contract lists provided by Smart Contract Sanctuary; source code and bytecode were downloaded from Etherscan using Web3.py and analyzed with Slither.
Additional Information
- Creator: Martina Rossini
- License: Except for the Solidity source code, all files are under the MIT license.
- Citation: If used in research, cite as follows:
@misc{rossini2022slitherauditedcontracts,
title = {Slither Audited Smart Contracts Dataset},
author={Martina Rossini},
year={2022}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.