README Documentation
Microsoft Fabric MCP Server
A comprehensive Python-based MCP (Model Context Protocol) server for interacting with Microsoft Fabric APIs, featuring advanced PySpark notebook development, testing, and optimization capabilities with LLM integration.
🚀 Features
Core Fabric Operations
- ✅ Workspace, lakehouse, warehouse, and table management
- ✅ Delta table schemas and metadata retrieval
- ✅ SQL query execution and data loading
- ✅ Report and semantic model operations
Advanced PySpark Development
- 📓 Intelligent notebook creation with 6 specialized templates
- 🔧 Smart code generation for common PySpark operations
- ✅ Comprehensive validation with syntax and best practices checking
- 🎯 Fabric-specific optimizations and compatibility checks
- 📊 Performance analysis with scoring and optimization recommendations
- 🚀 Real-time monitoring and execution insights
LLM Integration
- 🤖 Natural language interface for PySpark development
- 🧠 Context-aware assistance with conversation memory
- 🎨 Intelligent code formatting and explanations
- 📈 Smart optimization suggestions based on project patterns
🏗️ Architecture
graph TB
subgraph "Developer Environment"
IDE[IDE/VSCode]
DEV[Developer]
PROJ[Project Files]
end
subgraph "AI Layer"
LLM[Large Language Model<br/>Claude/GPT/etc.]
CONTEXT[Conversation Context]
REASONING[AI Reasoning Engine]
end
subgraph "MCP Layer"
MCP[MCP Server]
TOOLS[PySpark Tools]
HELPERS[PySpark Helpers]
TEMPLATES[Template Manager]
VALIDATORS[Code Validators]
GENERATORS[Code Generators]
end
subgraph "Microsoft Fabric"
API[Fabric API]
WS[Workspace]
LH[Lakehouse]
NB[Notebooks]
TABLES[Delta Tables]
SPARK[Spark Clusters]
end
subgraph "Operations Flow"
CREATE[Create Notebooks]
VALIDATE[Validate Code]
GENERATE[Generate Code]
ANALYZE[Analyze Performance]
DEPLOY[Deploy to Fabric]
end
%% Developer interactions
DEV --> IDE
IDE --> PROJ
%% LLM interactions
IDE <--> LLM
LLM <--> CONTEXT
LLM --> REASONING
%% MCP interactions
LLM <--> MCP
MCP --> TOOLS
TOOLS --> HELPERS
TOOLS --> TEMPLATES
TOOLS --> VALIDATORS
TOOLS --> GENERATORS
%% Fabric interactions
MCP <--> API
API --> WS
WS --> LH
WS --> NB
LH --> TABLES
NB --> SPARK
%% Operation flows
TOOLS --> CREATE
TOOLS --> VALIDATE
TOOLS --> GENERATE
TOOLS --> ANALYZE
CREATE --> DEPLOY
%% Data flow arrows
REASONING -.->|"Intelligent Decisions"| TOOLS
CONTEXT -.->|"Project Awareness"| VALIDATORS
%% Styling
classDef devEnv fill:#e1f5fe
classDef aiLayer fill:#fff9c4
classDef mcpLayer fill:#f3e5f5
classDef fabricLayer fill:#e8f5e8
classDef operations fill:#fff3e0
class IDE,DEV,PROJ devEnv
class LLM,CONTEXT,REASONING aiLayer
class MCP,TOOLS,HELPERS,TEMPLATES,VALIDATORS,GENERATORS mcpLayer
class API,WS,LH,NB,TABLES,SPARK fabricLayer
class CREATE,VALIDATE,GENERATE,ANALYZE,DEPLOY operations
Interaction Flow
- Developer requests assistance in IDE
- IDE communicates with LLM (Claude/GPT)
- LLM analyzes using context and reasoning
- LLM calls MCP server tools intelligently
- MCP tools interact with Fabric API
- Results flow back through LLM with intelligent formatting
- Developer receives contextual, smart responses
📋 Requirements
- Python 3.12+
- Azure credentials for authentication
- uv (from astral): Installation instructions
- Azure CLI: Installation instructions
- Optional: Node.js for MCP inspector: Installation instructions
🔧 Installation
-
Clone the repository:
git clone https://github.com/your-repo/fabric-mcp.git cd fabric-mcp
-
Set up virtual environment:
uv sync
-
Install dependencies:
pip install -r requirements.txt
🚀 Usage
- Using STDIO
Connect to Microsoft Fabric
az login --scope https://api.fabric.microsoft.com/.default
Running with MCP Inspector
uv run --with mcp mcp dev fabric_mcp.py
This starts the server with inspector at http://localhost:6274
.
VSCode Integration
Add to your launch.json
:
{
"mcp": {
"servers": {
"ms-fabric-mcp": {
"type": "stdio",
"command": "<FullPathToProjectFolder>\\.venv\\Scripts\\python.exe",
"args": ["<FullPathToProjectFolder>\\fabric_mcp.py"]
}
}
}
}
- Using HTTP
Start the MCP Server
uv run python .\fabric_mcp.py --port 8081
VSCode Integration
Add to your launch.json
:
{
"mcp": {
"servers": {
"ms-fabric-mcp": {
"type": "http",
"url": "http://<localhost or remote IP>:8081/mcp/",
"headers": {
"Accept": "application/json,text/event-stream",
}
}
}
}
}
🛠️ Complete Tool Reference
1. Workspace Management
list_workspaces
List all available Fabric workspaces.
# Usage in LLM: "List all my Fabric workspaces"
set_workspace
Set the current workspace context for the session.
set_workspace(workspace="Analytics-Workspace")
2. Lakehouse Operations
list_lakehouses
List all lakehouses in a workspace.
list_lakehouses(workspace="Analytics-Workspace")
create_lakehouse
Create a new lakehouse.
create_lakehouse(
name="Sales-Data-Lake",
workspace="Analytics-Workspace",
description="Sales data lakehouse"
)
set_lakehouse
Set current lakehouse context.
set_lakehouse(lakehouse="Sales-Data-Lake")
3. Warehouse Operations
list_warehouses
List all warehouses in a workspace.
list_warehouses(workspace="Analytics-Workspace")
create_warehouse
Create a new warehouse.
create_warehouse(
name="Sales-DW",
workspace="Analytics-Workspace",
description="Sales data warehouse"
)
set_warehouse
Set current warehouse context.
set_warehouse(warehouse="Sales-DW")
4. Table Operations
list_tables
List all tables in a lakehouse.
list_tables(workspace="Analytics-Workspace", lakehouse="Sales-Data-Lake")
get_lakehouse_table_schema
Get schema for a specific table.
get_lakehouse_table_schema(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
table_name="transactions"
)
get_all_lakehouse_schemas
Get schemas for all tables in a lakehouse.
get_all_lakehouse_schemas(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake"
)
set_table
Set current table context.
set_table(table_name="transactions")
5. SQL Operations
get_sql_endpoint
Get SQL endpoint for lakehouse or warehouse.
get_sql_endpoint(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
type="lakehouse"
)
run_query
Execute SQL queries.
run_query(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
query="SELECT COUNT(*) FROM transactions",
type="lakehouse"
)
6. Data Loading
load_data_from_url
Load data from URL into tables.
load_data_from_url(
url="https://example.com/data.csv",
destination_table="new_data",
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake"
)
7. Reports & Models
list_reports
List all reports in a workspace.
list_reports(workspace="Analytics-Workspace")
get_report
Get specific report details.
get_report(workspace="Analytics-Workspace", report_id="report-id")
list_semantic_models
List semantic models in workspace.
list_semantic_models(workspace="Analytics-Workspace")
get_semantic_model
Get specific semantic model.
get_semantic_model(workspace="Analytics-Workspace", model_id="model-id")
8. Basic Notebook Operations
list_notebooks
List all notebooks in a workspace.
list_notebooks(workspace="Analytics-Workspace")
get_notebook_content
Retrieve notebook content.
get_notebook_content(
workspace="Analytics-Workspace",
notebook_id="notebook-id"
)
update_notebook_cell
Update specific notebook cells.
update_notebook_cell(
workspace="Analytics-Workspace",
notebook_id="notebook-id",
cell_index=0,
cell_content="print('Hello, Fabric!')",
cell_type="code"
)
9. Advanced PySpark Notebook Creation
create_pyspark_notebook
Create notebooks from basic templates.
create_pyspark_notebook(
workspace="Analytics-Workspace",
notebook_name="Data-Analysis",
template_type="analytics" # Options: basic, etl, analytics, ml
)
create_fabric_notebook
Create Fabric-optimized notebooks.
create_fabric_notebook(
workspace="Analytics-Workspace",
notebook_name="Fabric-Pipeline",
template_type="fabric_integration" # Options: fabric_integration, streaming
)
10. PySpark Code Generation
generate_pyspark_code
Generate code for common operations.
generate_pyspark_code(
operation="read_table",
source_table="sales.transactions",
columns="id,amount,date"
)
# Available operations:
# - read_table, write_table, transform, join, aggregate
# - schema_inference, data_quality, performance_optimization
generate_fabric_code
Generate Fabric-specific code.
generate_fabric_code(
operation="read_lakehouse",
lakehouse_name="Sales-Data-Lake",
table_name="transactions"
)
# Available operations:
# - read_lakehouse, write_lakehouse, merge_delta, performance_monitor
11. Code Validation & Analysis
validate_pyspark_code
Validate PySpark code syntax and best practices.
validate_pyspark_code(code="""
df = spark.table('transactions')
df.show()
""")
validate_fabric_code
Validate Fabric compatibility.
validate_fabric_code(code="""
df = spark.table('lakehouse.transactions')
df.write.format('delta').saveAsTable('summary')
""")
analyze_notebook_performance
Comprehensive performance analysis.
analyze_notebook_performance(
workspace="Analytics-Workspace",
notebook_id="notebook-id"
)
12. Context Management
clear_context
Clear current session context.
clear_context()
📊 PySpark Templates
Basic Templates
- basic: Fundamental PySpark operations and DataFrame usage
- etl: Complete ETL pipeline with data cleaning and Delta Lake
- analytics: Advanced analytics with aggregations and window functions
- ml: Machine learning pipeline with MLlib and feature engineering
Advanced Templates
- fabric_integration: Lakehouse connectivity and Fabric-specific utilities
- streaming: Real-time processing with Structured Streaming
🎯 Best Practices
Fabric Optimization
# ✅ Use managed tables
df = spark.table("lakehouse.my_table")
# ✅ Use Delta Lake format
df.write.format("delta").mode("overwrite").saveAsTable("my_table")
# ✅ Leverage notebookutils
import notebookutils as nbu
workspace_id = nbu.runtime.context.workspaceId
Performance Optimization
# ✅ Cache frequently used DataFrames
df.cache()
# ✅ Use broadcast for small tables
from pyspark.sql.functions import broadcast
result = large_df.join(broadcast(small_df), "key")
# ✅ Partition large datasets
df.write.partitionBy("year", "month").saveAsTable("partitioned_table")
Code Quality
# ✅ Define explicit schemas
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True)
])
# ✅ Handle null values
df.filter(col("column").isNotNull())
🔄 Example LLM-Enhanced Workflows
Natural Language Requests
Human: "Create a PySpark notebook that reads sales data, cleans it, and optimizes performance"
LLM Response:
1. Creates Fabric-optimized notebook with ETL template
2. Generates lakehouse reading code
3. Adds data cleaning transformations
4. Includes performance optimization patterns
5. Validates code for best practices
Performance Analysis
Human: "My PySpark notebook is slow. Help me optimize it."
LLM Response:
1. Analyzes notebook performance (scoring 0-100)
2. Identifies anti-patterns and bottlenecks
3. Suggests specific optimizations
4. Generates optimized code alternatives
5. Provides before/after comparisons
🔍 Troubleshooting
Common Issues
- Authentication: Ensure
az login
with correct scope - Context: Use
clear_context()
to reset session state - Workspace: Verify workspace names and permissions
- Templates: Check available template types in documentation
Getting Help
- Use validation tools for code issues
- Check performance analysis for optimization opportunities
- Leverage LLM natural language interface for guidance
📈 Performance Metrics
The analysis tools provide:
- Operation counts per notebook cell
- Performance issues detection and flagging
- Optimization opportunities identification
- Scoring system (0-100) for code quality
- Fabric compatibility assessment
🤝 Contributing
This project welcomes contributions! Please see our contributing guidelines for details.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
🙏 Acknowledgments
Inspired by: https://github.com/Augustab/microsoft_fabric_mcp/tree/main
Ready to supercharge your Microsoft Fabric development with intelligent PySpark assistance! 🚀