JUHE API Marketplace
crybo-rybo avatar
MCP Server

WebSurfer MCP

A Model Context Protocol server that enables AI assistants to securely fetch and extract readable text content from web pages through a standardized interface.

0
GitHub Stars
11/23/2025
Last Updated
No Configuration
Please check the documentation below.
  1. Home
  2. MCP Servers
  3. websurfer-mcp

README Documentation

๐ŸŒ WebSurfer MCP

A powerful Model Context Protocol (MCP) server that enables Large Language Models (LLMs) to fetch and extract readable text content from web pages. This tool provides a secure, efficient, and feature-rich way for AI assistants to access web content through a standardized interface.

โœจ Features

  • ๐Ÿ”’ Secure URL Validation: Blocks dangerous schemes, private IPs, and localhost domains
  • ๐Ÿ“„ Smart Content Extraction: Extracts clean, readable text from HTML pages using advanced parsing
  • โšก Rate Limiting: Built-in rate limiting to prevent abuse (60 requests/minute)
  • ๐Ÿ›ก๏ธ Content Type Filtering: Only processes supported content types (HTML, plain text, XML)
  • ๐Ÿ“ Size Limits: Configurable content size limits (default: 10MB)
  • โฑ๏ธ Timeout Management: Configurable request timeouts with validation
  • ๐Ÿ”ง Comprehensive Error Handling: Detailed error messages for various failure scenarios
  • ๐Ÿงช Full Test Coverage: 45+ unit tests covering all functionality

๐Ÿ—๏ธ Architecture

The project consists of several key components:

Core Components

  • MCPURLSearchServer: Main MCP server implementation
  • TextExtractor: Handles web content fetching and text extraction
  • URLValidator: Validates and sanitizes URLs for security
  • Config: Centralized configuration management

Key Features

  • Async/Await: Built with modern Python async patterns for high performance
  • Resource Management: Proper cleanup of network connections and resources
  • Context Managers: Safe resource handling with automatic cleanup
  • Logging: Comprehensive logging for debugging and monitoring

๐Ÿš€ Installation

Prerequisites

  • Python 3.12 or higher
  • uv package manager (recommended)

Quick Start

  1. Clone the repository:

    git clone https://github.com/crybo-rybo/websurfer-mcp
    cd websurfer-mcp
    
  2. Install dependencies:

    uv sync
    
  3. Verify installation:

    uv run python -c "import mcp_url_search_server; print('Installation successful!')"
    

๐ŸŽฏ Usage

Starting the MCP Server

The server communicates via stdio (standard input/output) and can be integrated with any MCP-compatible client.

# Start the server
uv run run_server.py serve

# Start with custom log level
uv run run_server.py serve --log-level DEBUG

Testing URL Search Functionality

Test the URL search functionality directly:

# Test with a simple URL
uv run run_server.py test --url "https://example.com"

# Test with custom timeout
uv run run_server.py test --url "https://httpbin.org/html" --timeout 15

Example Test Output

{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "content_type": "text/html",
  "status_code": 200,
  "text_length": 1250,
  "text_preview": "Example Domain This domain is for use in illustrative examples in documents..."
}

๐Ÿ› ๏ธ Configuration

The server can be configured using environment variables:

VariableDefaultDescription
MCP_DEFAULT_TIMEOUT10Default request timeout in seconds
MCP_MAX_TIMEOUT60Maximum allowed timeout in seconds
MCP_USER_AGENTMCP-URL-Search-Server/1.0.0User agent string for requests
MCP_MAX_CONTENT_LENGTH10485760Maximum content size in bytes (10MB)

Example Configuration

export MCP_DEFAULT_TIMEOUT=15
export MCP_MAX_CONTENT_LENGTH=5242880  # 5MB
uv run run_server.py serve

๐Ÿงช Testing

Running All Tests

# Run all tests with verbose output
uv run python -m unittest discover tests -v

# Run tests with coverage (if coverage is installed)
uv run coverage run -m unittest discover tests
uv run coverage report

Running Specific Test Files

# Run only integration tests
uv run python -m unittest tests.test_integration -v

# Run only text extraction tests
uv run python -m unittest tests.test_text_extractor -v

# Run only URL validation tests
uv run python -m unittest tests.test_url_validator -v

Test Results

All 45 tests should pass successfully:

test_content_types_immutable (test_config.TestConfig.test_content_types_immutable) ... ok
test_default_configuration_values (test_config.TestConfig.test_default_configuration_values) ... ok
test_404_error_handling (test_integration.TestMCPURLSearchIntegration.test_404_error_handling) ... ok
...
----------------------------------------------------------------------
Ran 45 tests in 1.827s

OK

๐Ÿ”ง Development

Project Structure

websurfer-mcp/
โ”œโ”€โ”€ mcp_url_search_server.py  # Main MCP server implementation
โ”œโ”€โ”€ text_extractor.py         # Web content extraction logic
โ”œโ”€โ”€ url_validator.py          # URL validation and security
โ”œโ”€โ”€ config.py                 # Configuration management
โ”œโ”€โ”€ run_server.py             # Command-line interface
โ”œโ”€โ”€ run_tests.py              # Test runner utilities
โ”œโ”€โ”€ tests/                    # Test suite
โ”‚   โ”œโ”€โ”€ test_integration.py   # Integration tests
โ”‚   โ”œโ”€โ”€ test_text_extractor.py # Text extraction tests
โ”‚   โ”œโ”€โ”€ test_url_validator.py # URL validation tests
โ”‚   โ””โ”€โ”€ test_config.py        # Configuration tests
โ”œโ”€โ”€ pyproject.toml            # Project configuration
โ””โ”€โ”€ README.md                 # This file

๐Ÿ”’ Security Features

URL Validation

  • Scheme Blocking: Blocks file://, javascript:, ftp:// schemes
  • Private IP Protection: Blocks access to private IP ranges (10.x.x.x, 192.168.x.x, etc.)
  • Localhost Protection: Blocks localhost and local domain access
  • URL Length Limits: Prevents extremely long URLs
  • Format Validation: Ensures proper URL structure

Content Safety

  • Content Type Filtering: Only processes supported text-based content types
  • Size Limits: Configurable maximum content size (default: 10MB)
  • Rate Limiting: Prevents abuse with configurable limits
  • Timeout Protection: Configurable request timeouts

๐Ÿ“Š Performance

  • Async Processing: Non-blocking I/O for high concurrency
  • Connection Pooling: Efficient HTTP connection reuse
  • DNS Caching: Reduces DNS lookup overhead
  • Resource Cleanup: Automatic cleanup prevents memory leaks

๐Ÿ™ Acknowledgments

  • Built with the Model Context Protocol (MCP)
  • Uses aiohttp for async HTTP requests
  • Leverages trafilatura for content extraction
  • Powered by BeautifulSoup for HTML parsing

Happy web surfing with your AI assistant! ๐Ÿš€

Quick Actions

View on GitHubView All Servers

Key Features

Model Context Protocol
Secure Communication
Real-time Updates
Open Source

Boost your projects with Wisdom Gate LLM API

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Learn More
JUHE API Marketplace

Accelerate development, innovate faster, and transform your business with our comprehensive API ecosystem.

JUHE API VS

  • vs. RapidAPI
  • vs. API Layer
  • API Platforms 2025
  • API Marketplaces 2025
  • Best Alternatives to RapidAPI

For Developers

  • Console
  • Collections
  • Documentation
  • MCP Servers
  • Free APIs
  • Temp Mail Demo

Product

  • Browse APIs
  • Suggest an API
  • Wisdom Gate LLM
  • Global SMS Messaging
  • Temp Mail API

Company

  • What's New
  • Welcome
  • About Us
  • Contact Support
  • Terms of Service
  • Privacy Policy
Featured on Startup FameFeatured on Twelve ToolsFazier badgeJuheAPI Marketplace - Connect smarter, beyond APIs | Product Huntai tools code.marketDang.ai
Copyright ยฉ 2025 - All rights reserved