Dataset assetOpen Source CommunityPython ProgrammingCode Obfuscation

asure22/python_obfuscated_small

This dataset is primarily intended for code analysis and processing, and includes multiple code-related features such as repository name, file path, function name, original string, programming language, code, code tokens, docstring, docstring tokens, SHA value, URL, partition, summary, obfuscated code, code length, and obfuscated code length. The dataset is divided into a training split containing 30,000 samples with a total size of 442,939,709.61477566 bytes. The download size of the dataset is 115,314,164 bytes.

Source

hugging_face

Created

Nov 28, 2025

Updated

Mar 7, 2024

Signals

72 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Data Features

The dataset includes the following features:

repo: string type
path: string type
func_name: string type
original_string: string type
language: string type
code: string type
code_tokens: sequence of strings
docstring: string type
docstring_tokens: sequence of strings
sha: string type
url: string type
partition: string type
summary: string type
obf_code: string type
code_len: integer type (int64)
obf_code_len: integer type (int64)

Data Split

The dataset includes a training split:

train: contains 30,000 samples, total bytes 442939709.61477566

Dataset Size

Download size: 115,314,164 bytes
Dataset size: 442,939,709.61477566 bytes

Configuration

config_name: default
- data_files:
  - split: train
  - path: data/train-*

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio