asure22/python_obfuscated_small
This dataset is primarily intended for code analysis and processing, and includes multiple code-related features such as repository name, file path, function name, original string, programming language, code, code tokens, docstring, docstring tokens, SHA value, URL, partition, summary, obfuscated code, code length, and obfuscated code length. The dataset is divided into a training split containing 30,000 samples with a total size of 442,939,709.61477566 bytes. The download size of the dataset is 115,314,164 bytes.
Description
Dataset Overview
Data Features
The dataset includes the following features:
- repo: string type
- path: string type
- func_name: string type
- original_string: string type
- language: string type
- code: string type
- code_tokens: sequence of strings
- docstring: string type
- docstring_tokens: sequence of strings
- sha: string type
- url: string type
- partition: string type
- summary: string type
- obf_code: string type
- code_len: integer type (int64)
- obf_code_len: integer type (int64)
Data Split
The dataset includes a training split:
- train: contains 30,000 samples, total bytes 442939709.61477566
Dataset Size
- Download size: 115,314,164 bytes
- Dataset size: 442,939,709.61477566 bytes
Configuration
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.