Explore high-quality datasets for your AI and machine learning projects.
This dataset is primarily intended for code analysis and processing, and includes multiple code-related features such as repository name, file path, function name, original string, programming language, code, code tokens, docstring, docstring tokens, SHA value, URL, partition, summary, obfuscated code, code length, and obfuscated code length. The dataset is divided into a training split containing 30,000 samples with a total size of 442,939,709.61477566 bytes. The download size of the dataset is 115,314,164 bytes.