MCP Browser Server
A Model Context Protocol server that enables AI assistants to interact with web browsers through Playwright, providing automation capabilities for navigation, interaction, and screenshots.
README Documentation
AutoProbeMCP - a browser for your Agent
A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright. This server enables AI assistants to interact with web pages through a standardized interface.
Perfect for web automation, testing, and debugging workflows with AI assistants including:
- Chat.fans agents - Empower AI agents with web interaction capabilities in VS Code
- GitHub Copilot Chat - Enhance your development workflow with browser automation
- Any MCP-compatible AI assistant - Universal browser automation for AI tools
Features
- Multi-browser support: Chromium, Firefox, and WebKit
- Comprehensive automation: Navigate, click, type, screenshot, and more
- JavaScript execution: Run custom scripts in the browser context
- Element interaction: Wait for elements, get text content, and interact with forms
- Screenshot capabilities: Capture full pages or viewport screenshots
- Type-safe: Built with TypeScript and runtime validation using Zod
Installation
npm install
npm run build
Make sure Playwright browsers are installed:
npx playwright install
For system dependencies (Linux):
sudo npx playwright install-deps
Usage
VS Code Integration
Configure the MCP server in VS Code by adding to your settings.json
or workspace configuration:
"mcp": {
"servers": {
"browser-automation": {
"command": "node",
"args": [
"/home/yourUserName/mcp-browser-server/build/index.js"
],
"env": {}
}
}
}
Once configured, Chat.fans agents and GitHub Copilot Chat can use browser automation tools for web testing, scraping, and automation tasks.
Available VS Code Tasks
- Build:
Ctrl+Shift+P
→ "Tasks: Run Task" → "build" - Development Mode:
Ctrl+Shift+P
→ "Tasks: Run Task" → "dev" - Test MCP Server:
Ctrl+Shift+P
→ "Tasks: Run Task" → "test-mcp-server"
Available Tools
- launch_browser - Start a new browser instance
- navigate - Go to a specific URL
- click_element - Click on page elements
- type_text - Enter text into form fields
- screenshot - Capture page screenshots
- get_element_text - Extract text from elements
- wait_for_element - Wait for elements to appear/disappear
- evaluate_javascript - Run custom JavaScript
- get_console_logs - Get browser console logs (log, info, warn, error, debug)
- analyze_screenshot - AI-powered screenshot analysis using Gemma3 (requires Ollama)
- get_page_info - Get current page information
- close_browser - Close the browser instance
- scroll - Scroll the page in the specified direction (up/down/left/right)
- check_scrollability - Check if the page is scrollable in specific directions
Example: Web Application Testing
// Launch browser in headed mode for visual debugging
await launch_browser({ browser: "chromium", headless: false });
// Navigate to login page
await navigate({ url: "http://localhost:3000/login" });
// Fill in credentials
await type_text({ selector: "input[type='email']", text: "user@example.com" });
await type_text({ selector: "input[type='password']", text: "password123" });
// Submit form
await click_element({ selector: "button[type='submit']" });
// Wait for successful login
await wait_for_element({ selector: ".dashboard", timeout: 10000 });
// Check for any console errors during login
await get_console_logs({ level: "error" });
// Take screenshot of dashboard
await screenshot({ fullPage: true, path: "dashboard.png" });
// Get all console logs for debugging
await get_console_logs();
// Scroll down to see more content
await scroll({ direction: "down", pixels: 500, behavior: "smooth" });
// Check if page can be scrolled vertically
await check_scrollability({ direction: "vertical" });
// Scroll back to top
await scroll({ direction: "up", pixels: 500 });
Page Scrolling and Navigation
The MCP Browser Server includes comprehensive scrolling tools for navigating long pages and checking scroll capabilities:
Scroll Tool
The scroll
tool allows you to scroll the page in any direction with fine-grained control:
// Scroll down by default amount (100px)
await scroll();
// Scroll in specific directions with custom distances
await scroll({ direction: "down", pixels: 300, behavior: "smooth" });
await scroll({ direction: "up", pixels: 200, behavior: "auto" });
await scroll({ direction: "left", pixels: 150 });
await scroll({ direction: "right", pixels: 150 });
// Smooth scrolling for better user experience
await scroll({ direction: "down", pixels: 500, behavior: "smooth" });
Parameters:
direction
:"up"
,"down"
,"left"
,"right"
(default:"down"
)pixels
: Number of pixels to scroll (default: 100)behavior
:"auto"
or"smooth"
(default:"auto"
)
Scrollability Check Tool
The check_scrollability
tool determines whether a page can be scrolled in specific directions:
// Check both vertical and horizontal scrollability
await check_scrollability({ direction: "both" });
// Check only vertical scrolling
await check_scrollability({ direction: "vertical" });
// Check only horizontal scrolling
await check_scrollability({ direction: "horizontal" });
Response includes:
- Current scroll position
- Maximum scroll distance
- Whether scrolling is possible in each direction
- Detailed position information
AI-Powered Screenshot Analysis
The analyze_screenshot
tool provides AI-powered analysis of web pages using local Gemma3 models via Ollama. This feature can describe what's visible on a page, analyze page structure, and look for specific elements based on context.
Prerequisites
- Install Ollama: Download from ollama.ai
- Install Gemma3 model:
ollama pull gemma3:4b
- Start Ollama service:
ollama serve
Usage Examples
Basic Screenshot Analysis
// Take and analyze a screenshot with AI
await analyze_screenshot({
fullPage: true,
model: "gemma3:4b"
});
Detailed Structural Analysis
// Get detailed analysis of page structure
await analyze_screenshot({
detailed: true,
pretext: "Focus on navigation elements and form fields"
});
Context-Specific Analysis
// Look for specific elements or issues
await analyze_screenshot({
pretext: "Check if there are any error messages or broken layouts",
path: "error-check.png"
});
Parameters
- fullPage (boolean): Capture entire scrollable page vs viewport only
- path (string): Optional file path to save the screenshot
- pretext (string): Additional context or specific instructions for the AI
- model (string): AI model to use (default: "gemma3:4b")
- detailed (boolean): Request detailed structural analysis
Supported Models
gemma3:4b
(default, good balance of speed and quality)- Any other vision-capable model available in your Ollama installation
Development & Testing
Quick Setup
# One-command setup (installs dependencies, browsers, and builds)
npm run setup
# Or step by step:
npm install
npx playwright install
npm run build
Development Commands
# Build the project
npm run build
# Run in development mode
npm run dev
# Start the server
npm run start
# Development helper (shows all available commands)
npm run dev-helper help
Testing
The project includes comprehensive tests in the tests/
directory:
# Run basic communication test
npm run test
# Run browser automation demo
npm run test:demo
# Run AI analysis test (requires Ollama)
npm run test:ai-simple
# Check system status
npm run test:status
# Run all tests
npm run test:all
Development Helper
Use the development helper for common tasks:
# Show all available commands
npm run dev-helper help
# Quick setup from scratch
npm run dev-helper setup
# Run comprehensive tests
npm run dev-helper test
# Clean generated files
npm run dev-helper clean
For more details about testing, see tests/README.md.
Project Structure
mcp-browser-server/
├── src/ # TypeScript source code
│ └── index.ts # Main MCP server implementation
├── build/ # Compiled JavaScript output
├── tests/ # Test scripts and documentation
│ ├── README.md # Testing documentation
│ ├── simple-test.mjs # Basic communication test
│ ├── demo-test.mjs # Browser automation demo
│ └── *.mjs # Additional test files
├── screenshots/ # Generated screenshots from tests
├── package.json # Project configuration
└── README.md # This file
License
Dual License:
- Personal Use: Free for personal, educational, and non-commercial use
- Commercial Use: Requires a separate commercial license
See LICENSE for full terms. For commercial licensing inquiries, please contact us.