🔍 Web Analyzer MCP

A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities.

✨ Features

🎯 Core Tools

url_to_markdown - Extract and summarize key web page content
- Analyzes content importance using custom algorithms
- Removes ads, navigation, and irrelevant content
- Keeps only essential information (tables, images, key text)
- Outputs structured markdown optimized for analysis
web_content_qna - AI-powered Q&A about web content
- Extracts relevant content sections from web pages
- Uses intelligent chunking and relevance matching
- Answers questions using OpenAI GPT models

🚀 Key Features

Smart Content Ranking: Algorithm-based content importance scoring
Essential Content Only: Removes clutter, keeps what matters
Multi-IDE Support: Works with Claude Desktop, Cursor, VS Code, PyCharm
Flexible Models: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-5

📦 Installation

Prerequisites

uv (Python package manager)
Chrome/Chromium browser (for Selenium)
OpenAI API key (for Q&A functionality)

🚀 Quick Start with uv (Recommended)

# Clone the repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Run directly with uv (auto-installs dependencies)
uv run mcp-webanalyzer

Installing via Smithery

To install web-analyzer-mcp for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @kimdonghwi94/web-analyzer-mcp --client claude

IDE/Editor Integration

Install Claude Desktop

Add to your Claude Desktop_config.json file. See Claude Desktop MCP documentation for more details.

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

Install Claude Code (VS Code Extension)

Add the server using Claude Code CLI:

claude mcp add web-analyzer -e OPENAI_API_KEY=your_api_key_here -e OPENAI_MODEL=gpt-4 -- uv --directory /path/to/web-analyzer-mcp run mcp-webanalyzer

Install Cursor IDE

Add to your Cursor settings (File > Preferences > Settings > Extensions > MCP):

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

Install JetBrains AI Assistant

See JetBrains AI Assistant Documentation for more details.

In JetBrains IDEs go to Settings → Tools → AI Assistant → Model Context Protocol (MCP)
Click + Add
Click on Command in the top-left corner of the dialog and select the As JSON option from the list
Add this configuration and click OK:

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

🎛️ Tool Descriptions

`url_to_markdown`

Converts web pages to clean markdown format with essential content extraction.

Parameters:

url (string): The web page URL to analyze

Returns: Clean markdown content with structured data preservation

`web_content_qna`

Answers questions about web page content using intelligent content analysis.

Parameters:

url (string): The web page URL to analyze
question (string): Question about the page content

Returns: AI-generated answer based on page content

🏗️ Architecture

Content Extraction Pipeline

URL Validation - Ensures proper URL format
HTML Fetching - Uses Selenium for dynamic content
Content Parsing - BeautifulSoup for HTML processing
Element Scoring - Custom algorithm ranks content importance
Content Filtering - Removes duplicates and low-value content
Markdown Conversion - Structured output generation

Q&A Processing Pipeline

Content Chunking - Intelligent text segmentation
Relevance Scoring - Matches content to questions
Context Selection - Picks most relevant chunks
Answer Generation - OpenAI GPT integration

🏗️ Project Structure

web-analyzer-mcp/
├── web_analyzer_mcp/          # Main Python package
│   ├── __init__.py           # Package initialization
│   ├── server.py             # FastMCP server with tools
│   ├── web_extractor.py      # Web content extraction engine
│   └── rag_processor.py      # RAG-based Q&A processor
├── scripts/                   # Build and utility scripts
│   └── build.js              # Node.js build script
├── README.md                 # English documentation
├── README.ko.md              # Korean documentation
├── package.json              # npm configuration and scripts
├── pyproject.toml            # Python package configuration
├── .env.example              # Environment variables template
└── dist-info.json            # Build information (generated)

🛠️ Development

Modern Development with uv

# Clone repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Development commands
uv run mcp-webanalyzer     # Start development server
uv run python -m pytest   # Run tests
uv run ruff check .        # Lint code
uv run ruff format .       # Format code
uv sync                    # Sync dependencies

# Install development dependencies
uv add --dev pytest ruff mypy

# Create production build
npm run build

Alternative: Traditional Python Development

# Setup Python environment (if not using uv)
pip install -e .[dev]

# Development commands
python -m web_analyzer_mcp.server  # Start server
python -m pytest tests/            # Run tests
python -m ruff check .             # Lint code
python -m ruff format .            # Format code
python -m mypy web_analyzer_mcp/   # Type checking

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📋 Roadmap

Support for more content types (PDFs, videos)
Multi-language content extraction
Custom extraction rules
Caching for frequently accessed content
Webhook support for real-time updates

⚠️ Limitations

Requires Chrome/Chromium for JavaScript-heavy sites
OpenAI API key needed for Q&A functionality
Rate limited to prevent abuse
Some sites may block automated access

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♂️ Support

Create an issue for bug reports or feature requests
Contribute to discussions in the GitHub repository
Check the documentation for detailed guides

🌟 Acknowledgments

Built with FastMCP framework
Inspired by HTMLRAG techniques for web content processing
Thanks to the MCP community for feedback and contributions

Made with ❤️ for the MCP community

🔍 Web Analyzer MCP

✨ Features

🎯 Core Tools

url_to_markdown - Extract and summarize key web page content
- Analyzes content importance using custom algorithms
- Removes ads, navigation, and irrelevant content
- Keeps only essential information (tables, images, key text)
- Outputs structured markdown optimized for analysis
web_content_qna - AI-powered Q&A about web content
- Extracts relevant content sections from web pages
- Uses intelligent chunking and relevance matching
- Answers questions using OpenAI GPT models

🚀 Key Features

Smart Content Ranking: Algorithm-based content importance scoring
Essential Content Only: Removes clutter, keeps what matters
Multi-IDE Support: Works with Claude Desktop, Cursor, VS Code, PyCharm
Flexible Models: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-5

📦 Installation

Prerequisites

uv (Python package manager)
Chrome/Chromium browser (for Selenium)
OpenAI API key (for Q&A functionality)

🚀 Quick Start with uv (Recommended)

# Clone the repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Run directly with uv (auto-installs dependencies)
uv run mcp-webanalyzer

Installing via Smithery

To install web-analyzer-mcp for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @kimdonghwi94/web-analyzer-mcp --client claude

IDE/Editor Integration

Install Claude Desktop

Add to your Claude Desktop_config.json file. See Claude Desktop MCP documentation for more details.

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

Install Claude Code (VS Code Extension)

Add the server using Claude Code CLI:

claude mcp add web-analyzer -e OPENAI_API_KEY=your_api_key_here -e OPENAI_MODEL=gpt-4 -- uv --directory /path/to/web-analyzer-mcp run mcp-webanalyzer

Install Cursor IDE

Add to your Cursor settings (File > Preferences > Settings > Extensions > MCP):

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

Install JetBrains AI Assistant

See JetBrains AI Assistant Documentation for more details.

In JetBrains IDEs go to Settings → Tools → AI Assistant → Model Context Protocol (MCP)
Click + Add
Click on Command in the top-left corner of the dialog and select the As JSON option from the list
Add this configuration and click OK:

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

🎛️ Tool Descriptions

`url_to_markdown`

Converts web pages to clean markdown format with essential content extraction.

Parameters:

url (string): The web page URL to analyze

Returns: Clean markdown content with structured data preservation

`web_content_qna`

Answers questions about web page content using intelligent content analysis.

Parameters:

url (string): The web page URL to analyze
question (string): Question about the page content

Returns: AI-generated answer based on page content

🏗️ Architecture

Content Extraction Pipeline

URL Validation - Ensures proper URL format
HTML Fetching - Uses Selenium for dynamic content
Content Parsing - BeautifulSoup for HTML processing
Element Scoring - Custom algorithm ranks content importance
Content Filtering - Removes duplicates and low-value content
Markdown Conversion - Structured output generation

Q&A Processing Pipeline

Content Chunking - Intelligent text segmentation
Relevance Scoring - Matches content to questions
Context Selection - Picks most relevant chunks
Answer Generation - OpenAI GPT integration

🏗️ Project Structure

web-analyzer-mcp/
├── web_analyzer_mcp/          # Main Python package
│   ├── __init__.py           # Package initialization
│   ├── server.py             # FastMCP server with tools
│   ├── web_extractor.py      # Web content extraction engine
│   └── rag_processor.py      # RAG-based Q&A processor
├── scripts/                   # Build and utility scripts
│   └── build.js              # Node.js build script
├── README.md                 # English documentation
├── README.ko.md              # Korean documentation
├── package.json              # npm configuration and scripts
├── pyproject.toml            # Python package configuration
├── .env.example              # Environment variables template
└── dist-info.json            # Build information (generated)

🛠️ Development

Modern Development with uv

# Clone repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Development commands
uv run mcp-webanalyzer     # Start development server
uv run python -m pytest   # Run tests
uv run ruff check .        # Lint code
uv run ruff format .       # Format code
uv sync                    # Sync dependencies

# Install development dependencies
uv add --dev pytest ruff mypy

# Create production build
npm run build

Alternative: Traditional Python Development

# Setup Python environment (if not using uv)
pip install -e .[dev]

# Development commands
python -m web_analyzer_mcp.server  # Start server
python -m pytest tests/            # Run tests
python -m ruff check .             # Lint code
python -m ruff format .            # Format code
python -m mypy web_analyzer_mcp/   # Type checking

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📋 Roadmap

Support for more content types (PDFs, videos)
Multi-language content extraction
Custom extraction rules
Caching for frequently accessed content
Webhook support for real-time updates

⚠️ Limitations

Requires Chrome/Chromium for JavaScript-heavy sites
OpenAI API key needed for Q&A functionality
Rate limited to prevent abuse
Some sites may block automated access

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♂️ Support

Create an issue for bug reports or feature requests
Contribute to discussions in the GitHub repository
Check the documentation for detailed guides

🌟 Acknowledgments

Built with FastMCP framework
Inspired by HTMLRAG techniques for web content processing
Thanks to the MCP community for feedback and contributions

Made with ❤️ for the MCP community

MCP WebAnalyzer

README Documentation

🔍 Web Analyzer MCP

✨ Features

🎯 Core Tools

🚀 Key Features

📦 Installation

Prerequisites

🚀 Quick Start with uv (Recommended)

Installing via Smithery

IDE/Editor Integration

🎛️ Tool Descriptions

url_to_markdown

web_content_qna

🏗️ Architecture

Content Extraction Pipeline

Q&A Processing Pipeline

🏗️ Project Structure

🛠️ Development

Modern Development with uv

Alternative: Traditional Python Development

🤝 Contributing

📋 Roadmap

⚠️ Limitations

📄 License

🙋‍♂️ Support

🌟 Acknowledgments

Quick Install

Quick Actions

Key Features

MCP WebAnalyzer

README Documentation

🔍 Web Analyzer MCP

✨ Features

🎯 Core Tools

🚀 Key Features

📦 Installation

Prerequisites

🚀 Quick Start with uv (Recommended)

Installing via Smithery

IDE/Editor Integration

🎛️ Tool Descriptions

url_to_markdown

web_content_qna

🏗️ Architecture

Content Extraction Pipeline

Q&A Processing Pipeline

🏗️ Project Structure

🛠️ Development

Modern Development with uv

Alternative: Traditional Python Development

🤝 Contributing

📋 Roadmap

⚠️ Limitations

📄 License

🙋‍♂️ Support

🌟 Acknowledgments

Quick Install

Quick Actions

Key Features

`url_to_markdown`

`web_content_qna`

`url_to_markdown`

`web_content_qna`