JUHE API Marketplace
DATASET
Open Source Community

mwritescode/slither-audited-smart-contracts

The dataset contains source code of Solidity smart contracts verified by the Slither static analysis framework and the deployed bytecode, with vulnerabilities classified. Supported tasks include text classification, text generation, and image classification. It was created to provide a large‑scale open dataset for detecting and classifying verified Solidity contract vulnerabilities. The dataset is in English; source code is Solidity. It comprises instances, fields, and splits. Data were collected from Smart Contract Sanctuary, Etherscan, etc., and analyzed with Slither.

Updated 7/14/2022
hugging_face

Description

Dataset Overview

Dataset Name

  • Name: Slither Audited Smart Contracts
  • Alias: Slither Audited Smart Contracts Dataset

Basic Information

  • Language: English
  • License: MIT
  • Multilinguality: Monolingual
  • Size: 100K < n < 1M
  • Source: Raw data
  • Task Types: Text classification, Text generation
  • Task IDs: Multi‑label classification, Multi‑input text classification, Language modeling

Description

  • Overview: The dataset contains source code and deployed bytecode of Solidity smart contracts verified by the Slither static analysis framework, along with vulnerabilities classified by Slither.
  • Supported Tasks:
    • Text classification: Train models for binary and multi‑label classification of contract bytecode and source code.
    • Text generation: Train language models for the Solidity programming language.
    • Image classification: Convert bytecode to RGB images for CNN‑based vulnerability detection.
  • Language: Annotations are in English; all source code is Solidity.

Structure

  • Instances: Each instance includes address, source code, and bytecode. Labels are provided in two configurations: a cleaned textual version of Slither output, and a multi‑label version consisting of integer lists representing specific vulnerability classes. Label 4 indicates contract safety.
  • Fields:
    • address: string, Ethereum main‑net contract address.
    • source_code: flattened Solidity code.
    • bytecode: string, contract bytecode obtained via web3.eth.getCode().
    • slither: cleaned JSON output from Slither or list of class labels.
  • Splits: Six configurations exist; those without the all- prefix provide train, test, and validation splits, with test and validation each comprising ~15 % of the total data.

Creation

  • Rationale: Provide a large, free dataset for detecting and classifying verified Solidity contract vulnerabilities.
  • Source: Built from verified contract lists provided by Smart Contract Sanctuary; source code and bytecode were downloaded from Etherscan using Web3.py and analyzed with Slither.

Additional Information

  • Creator: Martina Rossini
  • License: Except for the Solidity source code, all files are under the MIT license.
  • Citation: If used in research, cite as follows:
@misc{rossini2022slitherauditedcontracts,
    title = {Slither Audited Smart Contracts Dataset},
    author={Martina Rossini},
    year={2022}
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Blockchain Security
Smart Contract Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.