JUHE API Marketplace
rayss868 avatar
MCP Server

Web-curl MCP Server

A powerful tool for fetching and extracting text content from web pages and APIs, supporting web scraping, REST API requests, and Google Custom Search integration.

9
GitHub Stars
3/3/2026
Last Updated
MCP Server Configuration
1{
2 "name": "web-curl",
3 "command": "node",
4 "args": [
5 "build/index.js"
6 ],
7 "disabled": false,
8 "alwaysAllow": [
9 "browser_flow",
10 "browser_configure",
11 "browser_close",
12 "multi_search",
13 "fetch_api",
14 "download_file",
15 "parse_document"
16 ],
17 "env": {
18 "APIKEY_GOOGLE_SEARCH": "YOUR_GOOGLE_API_KEY",
19 "CX_GOOGLE_SEARCH": "YOUR_CX_ID"
20 }
21}
JSON21 lines
  1. Home
  2. MCP Servers
  3. MCP-Web-Curl

README Documentation

Google Custom Search API

Google Custom Search API is free with usage limits (e.g., 100 queries per day for free, with additional queries requiring payment). For full details on quotas, pricing, and restrictions, see the official documentation.

Web-curl

Web-curl Logo

Developed by Rayss

๐Ÿš€ Open Source Project
๐Ÿ› ๏ธ Built with Node.js & TypeScript (Node.js v18+ required)



Web-curl Server MCP server

๐ŸŽฌ Demo Video

Click here to watch the demo video directly in your browser.

If your platform supports it, you can also download and play demo/demo_1.mp4 directly.

Your browser does not support the video tag.

๐Ÿ“š Table of Contents

  • Changelog / Update History
  • Overview
  • Features
  • Architecture
  • Installation
  • Usage
  • CLI Usage
  • MCP Server Usage
  • Configuration
  • Examples
  • Troubleshooting
  • Tips & Best Practices
  • Contributing & Issues
  • License & Attribution

๐Ÿ“ Changelog / Update History

See CHANGELOG.md for a complete history of updates and new features.

๐Ÿ“ Overview

Web-curl is a powerful tool for fetching and extracting text content from web pages and APIs. Use it as a standalone CLI or as an MCP (Model Context Protocol) server. Web-curl leverages Puppeteer for robust web scraping and supports advanced features such as resource blocking, custom headers, authentication, and Google Custom Search.


โœจ Features

๐Ÿš€ Deep Research & Automation (v1.4.2)

  • Advanced Browser Automation: Full control over Chromium via Puppeteer (click, type, scroll, hover, key presses).
  • Always-On Session Persistence: Browser profiles are now always persistent. Login sessions, cookies, and cache are automatically saved in a local user_data/ directory.
  • Token-Efficient Snapshots:
    • Accessibility Tree: Clean, structured snapshots instead of messy HTML.
    • HTML Slice Mode: Raw HTML with startIndex/endIndex for safe chunking when needed.
    • Viewport Filtering: Automatically filters out elements not visible on screen, saving up to 90% of context tokens on long pages.
  • Chrome DevTools Integration (implemented, but hidden from list_tools):
    • Network Monitoring (browser_network_requests)
    • Console Logs (browser_console_messages)
  • Parallel Search:
    • multi_search: Run multiple Google searches at once (only exposed search tool).
  • Intelligent Resource Management:
    • Idle Auto-Close: Browser automatically shuts down after 15 minutes of inactivity to save RAM/CPU.
    • Tab Rotation: Automatically replaces the oldest tab when the 10-tab limit is reached.
  • Media & Documents:
    • Full-Page Screenshots: Capture high-quality screenshots with a 5-day auto-cleanup lifecycle and custom destination support.
    • Document Parsing: Extract text from PDF and DOCX files directly from URLs.

Storage & Download Details

  • ๐Ÿ—‚๏ธ Error log rotation: logs/error-log.txt is rotated when it exceeds ~1MB (renamed to error-log.txt.bak) to prevent unbounded growth.
  • ๐Ÿงน Logs & temp cleanup: old temporary files in the logs/ directory are cleaned up at startup.
  • ๐Ÿ›‘ Browser lifecycle: Puppeteer browser instances are closed in finally blocks to avoid Chromium temp file leaks.
  • ๐Ÿ”Ž Content extraction:
    • Returns raw text, HTML, and Readability "main article" when available. Readability attempts to extract the primary content of a webpage, removing headers, footers, sidebars, and other non-essential elements, providing a cleaner, more focused text.
    • Readability output is subject to startIndex/maxLength/chunkSize slicing when requested.
  • ๐Ÿšซ Resource blocking: blockResources is now always forced to false, meaning resources are never blocked for faster page loads.
  • โฑ๏ธ Timeout control: navigation and API request timeouts are configurable via tool arguments.
  • ๐Ÿ’พ Output: results can be printed to stdout or written to a file via CLI options.
  • โฌ‡๏ธ Download behavior (download_file):
    • destinationFolder accepts relative paths (resolved against the project root) or absolute paths.
    • The server creates destinationFolder if it does not exist.
    • Downloads are streamed using Node streams + pipeline to minimize memory use and ensure robust writes.
    • Filenames are derived from the URL path (e.g., https://.../path/file.jpg -> file.jpg). If no filename is present, the fallback name is downloaded_file.
    • Overwrite semantics: by default the implementation will overwrite an existing file with the same name.
  • ๐Ÿ–ฅ๏ธ Usage modes: CLI and MCP server (stdin/stdout transport).
  • ๐ŸŒ REST client: fetch_api returns JSON/text when appropriate and base64 for binary responses.
  • ๐Ÿ” Google Custom Search: requires APIKEY_GOOGLE_SEARCH and CX_GOOGLE_SEARCH.
  • ๐Ÿค– Smart command:
    • Auto language detection (franc-min) and optional translation (dynamic translate import).
    • Query enrichment is heuristic-based; results depend on the detected intent.

๐Ÿ—๏ธ Architecture

This section outlines the high-level architecture of Web-curl.

graph TD
    A[User/MCP Host] --> B(CLI / MCP Server)
    B --> C{Tool Handlers}
    C -- browser_flow --> D["Puppeteer (Web Scraping)"]
    C -- fetch_api --> E["REST Client"]
    C -- multi_search --> F["Google Custom Search API"]
    C -- parse_document --> G["Document Parser (PDF/DOCX)"]
    C -- download_file --> H["File System (Downloads)"]
    D --> I["Web Content"]
    E --> J["External APIs"]
    F --> K["Google Search Results"]
    H --> L["Local Storage"]
  • CLI & MCP Server: src/index.ts Implements both the CLI entry point and the MCP server.
  • Web Scraping: Uses Puppeteer for headless browsing and content extraction.
  • REST Client: src/rest-client.ts Provides a flexible HTTP client for API requests.

โš™๏ธ MCP Server Configuration Example

To integrate web-curl as an MCP server, add the following configuration to your mcp_settings.json:

{
  "mcpServers": {
    "web-curl": {
      "command": "node",
      "args": [
        "build/index.js"
      ],
      "disabled": false,
      "alwaysAllow": [
        "browser_flow",
        "browser_configure",
        "browser_close",
        "multi_search",
        "fetch_api",
        "download_file",
        "parse_document"
      ],
      "env": {
        "APIKEY_GOOGLE_SEARCH": "YOUR_GOOGLE_API_KEY",
        "CX_GOOGLE_SEARCH": "YOUR_CX_ID"
      }
    }
  }
}

๐Ÿ”‘ How to Obtain Google API Key and CX

  1. Get a Google API Key:
    • Go to Google Cloud Console.
    • Create/select a project, then go to APIs & Services > Credentials.
    • Click Create Credentials > API key and copy it.
  2. Get a Custom Search Engine (CX) ID:
    • Go to Google Custom Search Engine.
    • Create/select a search engine, then copy the Search engine ID (CX).
  3. Enable Custom Search API:
    • In Google Cloud Console, go to APIs & Services > Library.
    • Search for Custom Search API and enable it.

Replace YOUR_GOOGLE_API_KEY and YOUR_CX_ID in the config above.


๐Ÿ› ๏ธ Installation

# Clone the repository
git clone https://github.com/rayss868/MCP-Web-Curl
cd web-curl

# Install dependencies
npm install

# Build the project
npm run build
  • Prerequisites: Ensure you have Node.js (v18+) and Git installed on your system.

Puppeteer installation notes

  • Windows: Just run npm install.

  • Linux / Ubuntu Server: You must install extra dependencies for Chromium to handle rendering and screenshots in a headless environment. Run:

    sudo apt-get update && sudo apt-get install -y \
      fonts-liberation \
      libasound2 \
      libatk-bridge2.0-0 \
      libatk1.0-0 \
      libc6 \
      libcairo2 \
      libcups2 \
      libdbus-1-3 \
      libexpat1 \
      libfontconfig1 \
      libgbm1 \
      libgcc1 \
      libglib2.0-0 \
      libgtk-3-0 \
      libnspr4 \
      libnss3 \
      libpango-1-0-0 \
      libpangocairo-1.0-0 \
      libstdc++6 \
      libx11-6 \
      libx11-xcb1 \
      libxcb1 \
      libxcomposite1 \
      libxcursor1 \
      libxdamage1 \
      libxext6 \
      libxfixes3 \
      libxi6 \
      libxrandr2 \
      libxrender1 \
      libxss1 \
      libxtst6 \
      lsb-release \
      wget \
      xdg-utils
    

For more details, see the Puppeteer troubleshooting guide.


๐Ÿš€ Usage

CLI Usage

The CLI supports fetching and extracting text content from web pages.

# Basic usage
node build/index.js https://example.com

# With options
node build/index.js --timeout 30000 https://example.com

# Save output to a file
node build/index.js -o result.json https://example.com
Command Line Options
  • --timeout <ms>: Set navigation timeout (default: 60000)
  • -o <file>: Output result to specified file

MCP Server Usage

Web-curl can be run as an MCP server for integration with Roo Context or other MCP-compatible environments.

Exposed Tools (v1.4.2)

Only the tools below are exposed via list_tools to reduce tool-chaining in agent clients.

  • browser_flow: One-call browser workflow (optional navigate โ†’ optional actions โ†’ return ONE result).

  • browser_configure: Set proxy/user-agent/viewport (session persistence is always on via user_data/).

  • browser_close: Close browser and tabs (also auto-closes after 15 minutes of inactivity).

  • multi_search: Run multiple Google searches in parallel (the only exposed search entrypoint).

  • fetch_api: REST API request with response truncation (limit).

  • download_file: Download a file from a URL.

  • parse_document: Extract text from PDF/DOCX URLs.

Running as MCP Server
npm run start

The server will communicate via stdin/stdout and expose the tools as defined in src/index.ts.


๐Ÿšฆ HTML Slicing Example (Recommended for Large Pages)

Use browser_flow with result: { type: "snapshot", mode: "html" } when you need raw HTML but want to keep the response small.

Client request for first slice:

{
  "name": "browser_flow",
  "arguments": {
    "result": {
      "type": "snapshot",
      "mode": "html",
      "startIndex": 0,
      "endIndex": 20000
    }
  }
}

Response (example):

{
  "mode": "html",
  "totalLength": 123456,
  "startIndex": 0,
  "endIndex": 20000,
  "remainingCharacters": 103456,
  "content": "<html>...first slice...</html>"
}

๐Ÿงฉ Configuration

  • Session Persistence: Always enabled. Logins and cookies are automatically reused across restarts.
  • Timeout: Set navigation and API request timeouts.
  • Environment Variables: Used for Google Search API integration (used by multi_search).

๐Ÿ’ก Examples {#examples}

Make a REST API Request
{
  "name": "fetch_api",
  "arguments": {
    "url": "https://api.github.com/repos/nodejs/node",
    "method": "GET",
    "headers": {
      "Accept": "application/vnd.github.v3+json"
    },
    "limit": 10000
  }
}
Download File
{
  "name": "download_file",
  "arguments": {
    "url": "https://example.com/image.jpg",
    "destinationFolder": "downloads"
  }
}

Note: destinationFolder can be either a relative path (resolved against the project root) or an absolute path. The server will create the destination folder if it does not exist.

Configure Browser
{
  "name": "browser_configure",
  "arguments": {
    "proxy": "http://proxy.example.com:8080",
    "viewport": { "width": 1920, "height": 1080 }
  }
}

Note: Session persistence is always enabled. Cookies and login sessions are automatically stored in the user_data/ directory.


๐Ÿ› ๏ธ Troubleshooting {#troubleshooting}

  • Timeout Errors: Increase the timeout parameter if requests are timing out.
  • Google Search Fails: Ensure APIKEY_GOOGLE_SEARCH and CX_GOOGLE_SEARCH are set in your environment.
  • Error Logs: Check the logs/error-log.txt file for detailed error messages.

๐Ÿง  Tips & Best Practices {#tips--best-practices}

Click for advanced tips
  • For large pages, use maxLength and startIndex to fetch content in slices.
  • Always validate your tool arguments to avoid errors.
  • Secure your API keys and sensitive data using environment variables.
  • Review the MCP tool schemas in src/index.ts for all available options.

๐Ÿค Contributing & Issues {#contributing--issues}

Contributions are welcome! If you want to contribute, fork this repository and submit a pull request.
If you find any issues or have suggestions, please open an issue on the repository page.


๐Ÿ“„ License & Attribution {#license--attribution}

This project was developed by Rayss.
For questions, improvements, or contributions, please contact the author or open an issue in the repository.


Note: Google Search API is free with usage limits. For details, see: Google Custom Search API Overview

Quick Install

Quick Actions

View on GitHubView All Servers

Key Features

Model Context Protocol
Secure Communication
Real-time Updates
Open Source

Boost your projects with Wisdom Gate LLM API

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Learn More
JUHE API Marketplace

Accelerate development, innovate faster, and transform your business with our comprehensive API ecosystem.

JUHE API VS

  • vs. RapidAPI
  • vs. API Layer
  • API Platforms 2025
  • API Marketplaces 2025
  • Best Alternatives to RapidAPI

For Developers

  • Console
  • Collections
  • Documentation
  • MCP Servers
  • Free APIs
  • Temp Mail Demo

Product

  • Browse APIs
  • Suggest an API
  • Wisdom Gate LLM
  • Global SMS Messaging
  • Temp Mail API

Company

  • What's New
  • Welcome
  • About Us
  • Contact Support
  • Terms of Service
  • Privacy Policy
Featured on Startup FameFeatured on Twelve ToolsFazier badgeJuheAPI Marketplace - Connect smarter, beyond APIs | Product Huntai tools code.marketDang.aiFeatured on ShowMeBestAI
Copyright ยฉ 2026 JUHEDATA HK LIMITED - All rights reserved