WebSurfer MCP
A Model Context Protocol server that enables AI assistants to securely fetch and extract readable text content from web pages through a standardized interface.
README Documentation
WebSurfer MCP
Securely fetch and extract clean text from the web for LLMs.
MCP Server • src-layout Python package • SSRF Protection
WebSurfer is a Model Context Protocol (MCP) server designed to provide Large Language Models (LLMs) with secure and efficient access to web content.
Core Features
- Advanced URL Validation: Implements strict security controls using the
ipaddressmodule to block private, loopback, link-local, and reserved destinations before any fetch occurs. - Optimized Content Extraction: Utilizes
trafilaturaandBeautifulSoup4to extract high-quality, readable text from HTML, effectively removing boilerplate such as navigation, headers, and scripts. - Resource Management: Enforces strict content size limits and request timeouts to ensure system stability and performance.
- Redirect Safety: Validates every redirect hop and refuses redirects to blocked schemes, localhost, private IP literals, or unsafe DNS targets.
- Rate Limiting: Built-in request throttling to prevent service abuse and manage resource consumption.
- Robust Error Handling: Provides granular feedback for network issues, HTTP errors, and content parsing failures.
Project Layout
websurfer-mcp/
├── src/websurfer_mcp/
│ ├── cli.py
│ ├── config.py
│ ├── extractor.py
│ ├── networking.py
│ ├── server.py
│ └── url_validation.py
├── tests/
├── docs/images/
├── pyproject.toml
└── run_tests.py
Key runtime components:
WebSurferServer: MCP transport and tool registration.TextExtractor: asynchronous HTTP fetching and readable-text extraction.SafeResolver: DNS resolution guard that rejects private and reserved IP answers.URLValidator: URL normalization and SSRF-focused validation.Config: environment-driven runtime configuration.
Installation
Prerequisites
- Python 3.12 or higher
- uv package manager
Setup
-
Clone the repository:
git clone https://github.com/crybo-rybo/websurfer-mcp cd websurfer-mcp -
Install runtime dependencies:
uv sync -
Install development tooling:
uv sync --group dev
Usage
Server Execution
The server communicates via standard I/O (stdio) and is compatible with any MCP-compliant client.
Use either the console script or the package module:
uv run websurfer-mcp serve
uv run python -m websurfer_mcp serve
Manual Testing
You can verify the extraction functionality directly from the command line:
uv run websurfer-mcp test --url "https://example.com"
Desktop Client Integration
Claude Desktop
To use WebSurfer MCP with Claude Desktop, add the following configuration to your claude_desktop_config.json file.
Path locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Configuration:
Replace /path/to/websurfer-mcp with the absolute path to your cloned repository.
After updating the configuration, restart Claude Desktop to enable the search_url tool.
{
"mcpServers": {
"websurfer": {
"command": "uv",
"args": [
"--directory",
"/path/to/websurfer-mcp",
"run",
"python",
"-m",
"websurfer_mcp",
"serve"
]
}
}
}
Configuration
The server can be configured using the following environment variables:
| Variable | Default | Description |
|---|---|---|
MCP_DEFAULT_TIMEOUT | 10 | Default request timeout in seconds. |
MCP_MAX_TIMEOUT | 60 | Maximum allowed timeout in seconds. |
MCP_MAX_REDIRECTS | 10 | Maximum number of redirect hops to follow. |
MCP_USER_AGENT | websurfer-mcp/0.2.0 | User-Agent string for outgoing requests. |
MCP_MAX_CONTENT_LENGTH | 10485760 | Maximum content size in bytes (default 10MB). |
Development
Run the test suite:
uv run pytest
Run quality checks:
uv run ruff check .
uv run ruff format .
Run a focused module:
uv run python run_tests.py --module test_server
Security
WebSurfer MCP is designed with security as a primary concern. It explicitly blocks:
- Private IP ranges (e.g., 10.0.0.0/8, 192.168.0.0/16)
- Loopback addresses (e.g., 127.0.0.1, ::1)
- Link-local and reserved addresses
- Non-HTTP/HTTPS schemes (e.g., file://, ftp://, javascript:)
- Redirect hops that resolve to blocked destinations
- DNS answers that resolve public-looking hostnames to private or reserved IPs
Developed with the Model Context Protocol.