DATASET
Open Source Community
MLCQ
The MLCQ dataset is used for code smell detection experiments and contains code snippets along with relevant code metrics.
Updated 11/26/2024
github
Description
MLCQ‑Experiments Dataset Overview
Source
- MLCQ Dataset: MLCQ dataset
Purpose
- Supports the paper "Exploring NLP Techniques for Code Smell Detection: A Comparative Study".
- Enables comparison of NLP‑based models with baseline approaches for detecting code smells.
Processing Steps
- Environment Setup:
- Create a conda environment and install dependencies.
- Commands:
conda create -n mlcqenv python=3.10 conda activate mlcqenv conda install -f requirements.txt - Data Extraction:
- Set a GitHub token for API access.
- Export token:
export GITHUB_TOKEN=<your_github_token> - Run the extractor:
python DataExtractor.py
- Baseline Model:
- Use the J48 decision‑tree as baseline.
- Compute code metrics with the Designite tool.
- Steps:
python baseline/MetricsExtractor.py(prepare .java files)python baseline/DesigniteRun.py(run Designite)python baseline/DatasetCreator.py(final dataset)python train.py(train & test)
- Model Training:
- Train BiLSTM with Attention:
python bilstm_attn_train.py --batch_size 16 --epochs 20 --learning_rate 0.0001 --hidden_dim 512 --num_layers 2- Train CodeBERT:
python bert.py
Dependencies
- Designite Tool: Designite tool
- CodeBERT Pre‑trained Model: CodeBERT on Hugging Face
Authors
- Djamel Mesbah
- Nour El Madhoun
- Hani Chalouati
- Khaldoun Al Agha
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Code Smell Detection
Code Quality Analysis
Source
Organization: github
Created: 11/26/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.