TrainerML
Advanced machine learning platform with MCP integration that enables automated ML workflows from data analysis to model deployment, featuring smart preprocessing, 15+ ML algorithms, and interactive visualizations.
README Documentation
title: AutoML - MCP Hackathon emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: updated_ML.py pinned: false license: mit tags:
- machine-learning
- mcp
- hackathon
- automl
- lazypredict
- gradio
- mcp-server-track
- agent-demo-track short_description: Automated ML model comparison with LazyPredict and MCP integration
🤖 AutoML - MCP Hackathon Submission
Automated Machine Learning Platform with LazyPredict and Model Context Protocol Integration
🏆 Hackathon Track
Agents & MCP Hackathon - Track 1: MCP Tool / Server
🌟 Key Features
Core ML Capabilities
- 📤 Dual Data Input: Support for both local CSV file uploads and public URL data sources
- 🎯 Auto Problem Detection: Automatically determines regression vs classification tasks
- 🤖 Multi-Algorithm Comparison: LazyPredict-powered comparison of 20+ ML algorithms
- 📊 Automated EDA: Comprehensive dataset profiling with ydata-profiling
- 💾 Best Model Export: Download top-performing model as pickle file
- 📈 Performance Visualization: Interactive charts showing model comparison results
🚀 Advanced Features
- 🌐 URL Data Loading: Direct data loading from public CSV URLs with robust error handling
- 🔄 Agent-Friendly Interface: Designed for both human users and AI agent interactions
- 📊 Interactive Dashboards: Real-time model performance metrics and visualizations
- 🔍 Smart Error Handling: Comprehensive validation and user feedback system
- 💻 MCP Server Integration: Full Model Context Protocol server implementation
🛠️ How It Works
The AutoML provides a streamlined pipeline for automated machine learning:
Core Functions
-
load_data(file_input)
- Universal data loader that handles:- Local CSV file uploads through Gradio's file component
- Public CSV URLs with HTTP/HTTPS support
- Robust error handling and validation
- Automatic format detection and parsing
-
analyze_and_model(df, target_column)
- Core ML pipeline that:- Generates comprehensive EDA reports using ydata-profiling
- Automatically detects task type (classification vs regression) based on target variable uniqueness
- Trains and evaluates multiple models using LazyPredict
- Selects the best performing model based on appropriate metrics
- Creates publication-ready visualizations comparing model performance
- Exports the best model as a serialized pickle file
-
run_pipeline(data_source, target_column)
- Main orchestration function:- Validates all inputs and provides clear error messages
- Coordinates the entire ML workflow from data loading to model export
- Generates AI-powered explanations of results
- Returns all outputs in a format optimized for both UI and API consumption
Agent-Friendly Design
- Single Entry Point: The
run_pipeline()
function serves as the primary interface for AI agents - Flexible Input Handling: Automatically determines whether input is a file path or URL
- Comprehensive Output: Returns all generated artifacts (models, reports, visualizations)
- Error Resilience: Robust error handling with informative feedback
🚀 Quick Start
📋 Application File Comparison
Feature | updated_ML.py | fixed_ML_MCP_backup.py |
---|---|---|
Core ML Pipeline | ✅ Full AutoML functionality | ✅ Full AutoML functionality |
MCP Server | ✅ Enabled | ✅ Enhanced configuration |
UI Interface | ✅ Clean, streamlined | ✅ Identical interface |
Code Structure | ✅ Primary, well-documented | ✅ Backup with additional features |
Recommended For | General use, development | Advanced MCP integration |
Running the Application
The project includes two main application files:
Primary Application: updated_ML.py
(Recommended)
# Install dependencies
pip install -r requirements.txt
# Run the main application
python updated_ML.py
Backup Version: fixed_ML_MCP_backup.py
# Alternative version with additional MCP features
python fixed_ML_MCP_backup.py
Web Interface
- Choose Data Source:
- Local Upload: Use the file upload component to select a CSV file from your computer
- URL Input: Enter a public CSV URL (e.g., from GitHub, data repositories, or cloud storage)
- Specify Target: Enter the exact name of your target column (case-sensitive)
- Run Analysis: Click "Run Analysis & AutoML" to start the AutoML pipeline
- Review Results:
- View detected task type (classification/regression)
- Examine model performance metrics in the interactive table
- Download comprehensive EDA report (HTML format)
- Download the best performing model (pickle format)
- View model comparison visualization
Installation & Setup
# Clone the repository
git clone [repository-url]
cd MCP_Project
# Install dependencies
pip install -r requirements.txt
Server Configuration
The application launches with the following settings:
- Host:
0.0.0.0
(accessible from any network interface) - Port:
7860
(default Gradio port) - MCP Server: Enabled for AI agent integration
- API Documentation: Available at
/docs
endpoint - Browser Launch: Automatic browser opening enabled
🎯 Current Implementation
1. LazyPredict Integration
- Automated Model Training: Trains 20+ algorithms automatically
- Performance Comparison: Side-by-side evaluation of all models
- Best Model Selection: Automatically selects top performer based on accuracy/R² score
2. Comprehensive EDA
- ydata-profiling: Generates detailed dataset analysis reports
- Automatic Insights: Data quality, distributions, correlations, and missing values
- Interactive Reports: Downloadable HTML reports with comprehensive statistics
3. Smart Task Detection
- Classification: Automatically detected when target has ≤10 unique values
- Regression: Automatically detected for continuous target variables
- Adaptive Metrics: Uses appropriate evaluation metrics for each task type
4. Model Persistence
- Pickle Export: Save trained models for future use
- Model Reuse: Load and apply models to new datasets
- Production Ready: Serialized models ready for deployment
📊 Supported Algorithms (via LazyPredict)
Classification Algorithms
- Logistic Regression, Decision Tree Classifier
- Random Forest Classifier, Extra Trees Classifier
- Gradient Boosting Classifier, AdaBoost Classifier
- XGBoost Classifier, LightGBM Classifier
- SVM Classifier, K-Nearest Neighbors
- Naive Bayes, Linear Discriminant Analysis
- Quadratic Discriminant Analysis, and more...
Regression Algorithms
- Linear Regression, Ridge Regression, Lasso Regression
- Decision Tree Regressor, Random Forest Regressor
- Extra Trees Regressor, Gradient Boosting Regressor
- XGBoost Regressor, LightGBM Regressor
- Support Vector Regression, K-Nearest Neighbors
- AdaBoost Regressor, Elastic Net, and more...
🏆 Demo Scenarios
House Price Prediction (Regression)
- Upload
sample_house_prices.csv
included in the project - Enter
price
as the target column name - System automatically detects regression task
- Compare performance of 15+ regression algorithms
- Download the best performing model and detailed EDA report
Loan Approval Prediction (Classification)
- Upload
sample_loan_approval.csv
included in the project - Enter the loan approval status column name as target
- System automatically detects classification task
- Compare accuracy of 15+ classification algorithms
- Get comprehensive EDA report with approval insights
College Placement Analysis
- Upload
collegePlace.csv
included in the project - Analyze student placement outcomes
- Automatic feature analysis and model comparison
- Export trained model for future predictions
URL-Based Data Analysis
- Use public dataset URLs for instant analysis
- Example: Government open data, research datasets, cloud-hosted files
- No file size limitations with URL-based loading
- Seamless integration with cloud storage platforms
🚀 Technologies Used
- Frontend: Gradio 4.0+ with soft theme and MCP server integration
- AutoML Engine: LazyPredict for automated model comparison and evaluation
- EDA Framework: ydata-profiling for comprehensive dataset analysis and reporting
- ML Libraries: scikit-learn, XGBoost, LightGBM (via LazyPredict ecosystem)
- Visualization: Matplotlib and Seaborn for model comparison charts and statistical plots
- Data Processing: pandas and numpy for efficient data manipulation and preprocessing
- Model Persistence: pickle for secure model serialization and export
- Web Requests: requests library for robust URL-based data loading
- MCP Integration: Model Context Protocol server for AI agent compatibility
- File Handling: tempfile for secure temporary file management
📈 Current Features
- 🔄 Dual Input Support: Upload local CSV files or provide public URLs for data loading
- 🤖 One-Click AutoML: Complete ML pipeline from data upload to trained model export
- 🎯 Intelligent Task Detection: Automatic classification vs regression detection based on target variable analysis
- 📊 Multi-Algorithm Comparison: Simultaneous comparison of 20+ algorithms with LazyPredict
- 📋 Comprehensive EDA: Detailed dataset profiling with statistical analysis and data quality reports
- 💾 Model Export: Download best performing model as pickle file for production deployment
- 📈 Performance Visualization: Clear charts showing algorithm comparison and performance metrics
- 🌐 MCP Server Integration: Full Model Context Protocol support for seamless AI assistant integration
- 🛡️ Robust Error Handling: Comprehensive validation with informative user feedback
- 🎨 Modern UI: Clean, responsive interface optimized for both human and agent interactions
🎯 Hackathon Submission Highlights
- 🤖 LazyPredict Integration: Automated comparison of 20+ ML algorithms with minimal configuration
- 🧠 Smart Automation: Intelligent task detection, data validation, and model selection
- 📊 Comprehensive Analysis: ydata-profiling powered EDA reports with statistical insights
- 👥 Dual Interface Design: Optimized for both human users and AI agent interactions
- 🌐 MCP Server Implementation: Full Model Context Protocol integration for seamless agent workflows
- 🔄 Flexible Data Loading: Support for both local uploads and URL-based data sources
- 📈 Production Ready: Exportable models, comprehensive documentation, and robust error handling
- 🎨 Modern UI/UX: Clean Gradio interface with intuitive workflow and clear feedback systems
📦 Project Structure
MCP_Project/
├── updated_ML.py # Primary application file (recommended)
├── fixed_ML_MCP_backup.py # Backup version with enhanced MCP features
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── uv.lock # UV dependency lockfile
├── README.md # This documentation
├── sample_house_prices.csv # Demo dataset for regression
├── sample_loan_approval.csv # Demo dataset for classification
├── collegePlace.csv # Demo dataset for placement analysis
├── model_plot.png # Sample visualization output
└── __pycache__/ # Python cache files
Application Files Overview
updated_ML.py
: The main application file with clean, streamlined code structure. Recommended for most users.fixed_ML_MCP_backup.py
: Alternative version with additional MCP server configurations and enhanced features.
Both files provide identical core functionality with slight variations in configuration and additional features.
📧 Contact & Support
Built with ❤️ for the Agents & MCP Hackathon 2025
This project demonstrates the power of combining LazyPredict's automated machine learning capabilities with the Model Context Protocol to create an intelligent, easy-to-use ML platform that seamlessly integrates into AI assistant workflows and provides production-ready machine learning solutions.
🔮 Features in Development
- 🧠 LLM-powered model explanations and insights
- ⚙️ Advanced feature engineering and preprocessing pipelines
- 🎯 Ensemble model creation and stacking capabilities
- 🚀 Real-time prediction API endpoints
- 🛠️ Enhanced MCP tool suite with additional ML operations
- 📊 Interactive model interpretation and SHAP value analysis
🎮 Usage Tips & Best Practices
Getting Started
- Choose Your File: Use
updated_ML.py
for standard usage,fixed_ML_MCP_backup.py
for advanced MCP features - Target Column: Ensure your target column name is exactly as it appears in the dataset (case-sensitive)
- Data Sources: Both local CSV uploads and public URLs are supported seamlessly
Data Loading Best Practices
- URL Loading: Use direct links to CSV files (GitHub raw URLs work great!)
- File Size: No strict limitations, but larger files may take longer to process
- Data Quality: The system handles missing values automatically, but clean data yields better results
Model Performance
- Classification: System uses Accuracy as the primary metric for model selection
- Regression: System uses R-Squared as the primary metric for model selection
- File Formats: Currently supports CSV format with automatic delimiter detection
- Column Types: Handles both numeric and categorical features automatically
Troubleshooting
- Target Not Found: Double-check column name spelling and case sensitivity
- URL Issues: Ensure URLs point directly to CSV files (not web pages)
- Performance: For large datasets, expect processing times of 2-5 minutes
Ready to experience automated machine learning? Upload your dataset or provide a URL and let LazyPredict find the best algorithm for your problem! 🚀
Transform your data into insights with just a few clicks - no ML expertise required!