JUHE API Marketplace
thanharmstrong86 avatar
MCP Server

MCP PDF to Markdown Converter

A multi-server system that converts PDF documents to Markdown format using FastMCP architecture with upload and convert servers orchestrated by a reactive client agent.

0
GitHub Stars
11/17/2025
Last Updated
No Configuration
Please check the documentation below.
  1. Home
  2. MCP Servers
  3. mcp

README Documentation

MCP PDF to Markdown Converter and Crawler πŸ“„βž‘οΈπŸ“

This project provides a robust system for converting PDF documents to Markdown format and crawling web content using a Multi-Server Communication Protocol (MCP) architecture. It comprises two main modules: convert_pdf for PDF upload and conversion, and crawl_mcp for web crawling, along with a client application that orchestrates operations using a reactive agent.

Project Structure

The core components of this project are:

  • convert_pdf: A FastMCP server (running on http://127.0.0.1:8001) responsible for handling PDF file uploads and converting them to Markdown. It includes two endpoints:
    • /upload/mcp/upload_pdf_tool: Handles PDF file uploads via multipart form data.
    • /mcp: Converts uploaded PDFs to Markdown using the convert_pdf_to_markdown_tool.
  • crawl_mcp: A server module for crawling web content. For details on running this module, see src/crawl_mcp/README.md.
  • client: A client application that acts as an intelligent agent. It uses LangChain and LangGraph to interact with the MCP servers, upload PDFs, and trigger conversions or crawling tasks.

Getting Started πŸš€

Follow these steps to set up and run the project:

1. Prerequisites

  • Python 3.9+
  • uv: A fast Python package installer and resolver. Install it via pip if not already present:
    pip install uv
    

2. Project Setup

  1. Clone the repository (if applicable) or navigate to your project root.

    cd /path/to/your/MCP
    
  2. Create and Sync Virtual Environment: uv will create a .venv directory and install all necessary dependencies based on your pyproject.toml.

    uv sync
    
  3. Activate the Virtual Environment: This ensures all commands run within your isolated environment.

    • macOS/Linux:
      source .venv/bin/activate
      
    • Windows (Command Prompt):
      .venv\Scripts\activate.bat
      
    • Windows (PowerShell):
      .venv\Scripts\Activate.ps1
      
  4. Create .env file: Create a file named .env in the project root (MCP/) and add your Google Gemini API key:

    GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_HERE"
    

    Replace "YOUR_GEMINI_API_KEY_HERE" with your actual API key.

3. Running the Modules

Each module has its own setup and running instructions. Refer to the module-specific READMEs for details:

  • Convert PDF Module: See src/convert_pdf/README.md for instructions on running the convert_pdf server.
  • Crawl MCP Module: See src/crawl_mcp/README.md for instructions on running the crawl_mcp server.

4. Docker

The convert_pdf module can be run using Docker Compose with a single service:

  • Service: mcp-convert-server (port 8001)
  • Functionality: Handles PDF uploads and conversion to Markdown.

To run:

cd src/convert_pdf
docker-compose up --build -d

For crawl_mcp Docker instructions, refer to src/crawl_mcp/README.md.

5. Testing with Client

To test the modules, use the client application located in src/client/. Ensure the relevant servers are running, then execute:

uv run python src/client/*

For example, to test the convert_pdf module, ensure a PDF file (e.g., input/sample.pdf) exists in the project’s input directory and run:

uv run python src/client/test_client.py

For testing crawl_mcp, refer to its README for specific client instructions.

6. Directory Structure

MCP/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ convert_pdf/
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ convert_mcp.py
β”‚   β”‚   β”‚   β”œβ”€β”€ pdf2md.py
β”‚   β”‚   β”‚   └── upload_api.py
β”‚   β”‚   β”œβ”€β”€ uploaded/
β”‚   β”‚   β”œβ”€β”€ output/
β”‚   β”‚   β”œβ”€β”€ processed_files.json
β”‚   β”‚   β”œβ”€β”€ docker-compose.yml
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   β”œβ”€β”€ pyproject.toml
β”‚   β”‚   └── uv.lock
β”‚   β”œβ”€β”€ crawl_mcp/
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   └── (other module files)
β”‚   β”œβ”€β”€ client/
β”‚   β”‚   β”œβ”€β”€ test_client.py
β”‚   β”‚   └── (other client scripts)
β”œβ”€β”€ .env
└── README.md

Notes

  • Ensure the .env file is correctly configured with your API key.
  • The convert_pdf module handles both upload and conversion on port 8001, consolidating functionality for efficiency.
  • For detailed module configurations, refer to the respective READMEs.
  • If encountering issues (e.g., ClientDisconnect or import errors), check logs with:
    docker-compose logs mcp-convert-server
    

Quick Actions

View on GitHubView All Servers

Key Features

Model Context Protocol
Secure Communication
Real-time Updates
Open Source

Boost your projects with Wisdom Gate LLM API

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Learn More
JUHE API Marketplace

Accelerate development, innovate faster, and transform your business with our comprehensive API ecosystem.

JUHE API VS

  • vs. RapidAPI
  • vs. API Layer
  • API Platforms 2025
  • API Marketplaces 2025
  • Best Alternatives to RapidAPI

For Developers

  • Console
  • Collections
  • Documentation
  • MCP Servers
  • Free APIs
  • Temp Mail Demo

Product

  • Browse APIs
  • Suggest an API
  • Wisdom Gate LLM
  • Global SMS Messaging
  • Temp Mail API

Company

  • What's New
  • Welcome
  • About Us
  • Contact Support
  • Terms of Service
  • Privacy Policy
Featured on Startup FameFeatured on Twelve ToolsFazier badgeJuheAPI Marketplace - Connect smarter, beyond APIs | Product Huntai tools code.marketDang.ai
Copyright Β© 2025 - All rights reserved