README Documentation
MCP PDF to Markdown Converter and Crawler πβ‘οΈπ
This project provides a robust system for converting PDF documents to Markdown format and crawling web content using a Multi-Server Communication Protocol (MCP) architecture. It comprises two main modules: convert_pdf for PDF upload and conversion, and crawl_mcp for web crawling, along with a client application that orchestrates operations using a reactive agent.
Project Structure
The core components of this project are:
convert_pdf: A FastMCP server (running onhttp://127.0.0.1:8001) responsible for handling PDF file uploads and converting them to Markdown. It includes two endpoints:/upload/mcp/upload_pdf_tool: Handles PDF file uploads via multipart form data./mcp: Converts uploaded PDFs to Markdown using theconvert_pdf_to_markdown_tool.
crawl_mcp: A server module for crawling web content. For details on running this module, see src/crawl_mcp/README.md.client: A client application that acts as an intelligent agent. It uses LangChain and LangGraph to interact with the MCP servers, upload PDFs, and trigger conversions or crawling tasks.
Getting Started π
Follow these steps to set up and run the project:
1. Prerequisites
- Python 3.9+
- uv: A fast Python package installer and resolver. Install it via
pipif not already present:pip install uv
2. Project Setup
-
Clone the repository (if applicable) or navigate to your project root.
cd /path/to/your/MCP -
Create and Sync Virtual Environment:
uvwill create a.venvdirectory and install all necessary dependencies based on yourpyproject.toml.uv sync -
Activate the Virtual Environment: This ensures all commands run within your isolated environment.
- macOS/Linux:
source .venv/bin/activate - Windows (Command Prompt):
.venv\Scripts\activate.bat - Windows (PowerShell):
.venv\Scripts\Activate.ps1
- macOS/Linux:
-
Create
.envfile: Create a file named.envin the project root (MCP/) and add your Google Gemini API key:GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_HERE"Replace
"YOUR_GEMINI_API_KEY_HERE"with your actual API key.
3. Running the Modules
Each module has its own setup and running instructions. Refer to the module-specific READMEs for details:
- Convert PDF Module: See src/convert_pdf/README.md for instructions on running the
convert_pdfserver. - Crawl MCP Module: See src/crawl_mcp/README.md for instructions on running the
crawl_mcpserver.
4. Docker
The convert_pdf module can be run using Docker Compose with a single service:
- Service:
mcp-convert-server(port 8001) - Functionality: Handles PDF uploads and conversion to Markdown.
To run:
cd src/convert_pdf
docker-compose up --build -d
For crawl_mcp Docker instructions, refer to src/crawl_mcp/README.md.
5. Testing with Client
To test the modules, use the client application located in src/client/. Ensure the relevant servers are running, then execute:
uv run python src/client/*
For example, to test the convert_pdf module, ensure a PDF file (e.g., input/sample.pdf) exists in the projectβs input directory and run:
uv run python src/client/test_client.py
For testing crawl_mcp, refer to its README for specific client instructions.
6. Directory Structure
MCP/
βββ src/
β βββ convert_pdf/
β β βββ README.md
β β βββ src/
β β β βββ __init__.py
β β β βββ convert_mcp.py
β β β βββ pdf2md.py
β β β βββ upload_api.py
β β βββ uploaded/
β β βββ output/
β β βββ processed_files.json
β β βββ docker-compose.yml
β β βββ Dockerfile
β β βββ pyproject.toml
β β βββ uv.lock
β βββ crawl_mcp/
β β βββ README.md
β β βββ (other module files)
β βββ client/
β β βββ test_client.py
β β βββ (other client scripts)
βββ .env
βββ README.md
Notes
- Ensure the
.envfile is correctly configured with your API key. - The
convert_pdfmodule handles both upload and conversion on port 8001, consolidating functionality for efficiency. - For detailed module configurations, refer to the respective READMEs.
- If encountering issues (e.g.,
ClientDisconnector import errors), check logs with:docker-compose logs mcp-convert-server