JUHE API Marketplace
Baronco avatar
MCP Server

Local Documents MCP Server

A Model Context Protocol server that allows AI assistants to discover, load, and process local documents on Windows systems, with support for multiple file formats and OCR capabilities for scanned PDFs.

1
GitHub Stars
3/10/2026
Last Updated
MCP Server Configuration
1{
2 "name": "local-documents",
3 "command": "uv",
4 "args": [
5 "--directory",
6 "C:\\Users\\YourUsername\\Documents\\LocalDocs",
7 "run",
8 "server.py",
9 "C:\\Users\\YourUsername\\Documents\\MyDocuments",
10 "30000"
11 ]
12}
JSON12 lines
  1. Home
  2. MCP Servers
  3. Local-Docs-MCP-Tool

README Documentation

šŸ“š Local Documents MCP Server

A Model Context Protocol (MCP) server for interacting with local documents on Windows systems. This server provides tools to list, load, and process documents with support for OCR on scanned PDFs.

✨ Features

  • šŸ“ Document Discovery: List all documents in a specified directory
  • ⚔ Document Processing: Convert various document formats to markdown
  • šŸ” OCR Support: Extract text from scanned PDFs using Tesseract OCR
  • šŸŽÆ Token Management: Automatic content truncation based on token limits
  • šŸ“„ Multi-format Support: Handle Word docs, PDFs, PowerPoint, Excel, and more

šŸ› ļø Tools Available

  • list_documents: Find documents by path, name, and extension
  • load_documents: Extract document content as markdown
  • load_scanned_document: Extract text from scanned PDFs using OCR

šŸ’» System Requirements

  • Operating System: Windows 10/11
  • Python: 3.13 or higher
  • Package Manager: uv (recommended)

šŸ“‹ Prerequisites Installation

1. šŸ Python 3.13

Download and install Python 3.13 from python.org

2. ⚔ UV Package Manager

Install uv using pip:

pip install uv

3. šŸ“– Poppler for Windows

Purpose: Required for PDF processing and conversion to images for OCR.

  1. Download the latest Poppler Windows release from: https://github.com/oschwartz10612/poppler-windows/releases/

  2. Extract the ZIP file to:

    D:\Program Files\poppler-24.08.0
    
  3. The Poppler binaries should be located at:

    D:\Program Files\poppler-24.08.0\Library\bin
    

Alternative locations: You can install Poppler in any directory, just make sure to update the .env file with the correct path.

4. šŸ‘ļø Tesseract OCR

Purpose: Required for extracting text from scanned documents and images.

  1. Download Tesseract for Windows from: https://github.com/UB-Mannheim/tesseract/wiki

  2. Install Tesseract following the installer instructions

  3. Make sure Tesseract is added to your system PATH, or note the installation directory

šŸš€ Project Installation

1. šŸ“„ Clone or Download the Project

git clone <your-repo-url>
cd LocalDocs

2. šŸ“¦ Install Python Dependencies

uv sync

This will install all required dependencies from pyproject.toml:

  • markitdown[docx,pdf,pptx,xls,xlsx]>=0.1.2 - Document conversion
  • mcp[cli]>=1.10.1 - MCP server framework
  • opencv-python>=4.11.0.86 - Image processing
  • pdf2image>=1.17.0 - PDF to image conversion
  • pytesseract>=0.3.13 - Tesseract OCR wrapper
  • python-dotenv>=1.1.1 - Environment variable management
  • tiktoken>=0.9.0 - Token counting

3. āš™ļø Configure Environment Variables

Create or update the .env file in the project root:

POPPLER_PATH="D:\\Program Files\\poppler-24.08.0\\Library\\bin"

Note: Update the path to match your Poppler installation location.

šŸ”§ Configuration for MCP Clients

šŸ¤– Claude Desktop Configuration

Add the following configuration to your Claude Desktop config.json file:

  • First argument: Path to your documents directory

    • Example: "C:\\Users\\YourUsername\\Documents\\MyDocuments"
    • Use double backslashes for Windows paths in JSON
  • Second argument: Maximum tokens per document

    • Example: "30000"
    • Adjust based on your needs and Claude's token limits

šŸ“ Example Configurations

For different document locations:

{
  "mcpServers": {
    "local-documents": {
      "command": "uv",
      "args": [
        "--directory",
        "C:\\Users\\YourUsername\\Documents\\LocalDocs",
        "run",
        "server.py",
        "C:\\Users\\YourUsername\\Documents\\MyDocuments",
        "30000"
      ]
    }
  }
}

šŸŽÆ Usage

šŸš€ Starting the Server

The server is automatically started when Claude Desktop loads with the configured settings.

šŸ”„ Available Operations

  1. šŸ“‹ List Documents: Discover all documents in your configured directory
  2. šŸ“„ Load Standard Documents: Process Word docs, PDFs, PowerPoint, Excel files
  3. šŸ” Load Scanned Documents: Use OCR to extract text from scanned PDFs

šŸ“Š Response Format

The server returns structured responses with:

  • Document paths and metadata
  • Token usage information
  • Processing time (for OCR operations)
  • Extracted content in markdown format

šŸ› ļø Troubleshooting

āš ļø Common Issues

  1. šŸ” Poppler not found

    • Verify Poppler installation path
    • Check .env file configuration
    • Ensure path uses double backslashes in Windows
  2. šŸ‘ļø Tesseract not found

    • Verify Tesseract installation
    • Add Tesseract to system PATH
    • Restart command prompt/PowerShell
  3. šŸ” Permission denied errors

    • Ensure the document directory is accessible
    • Check file permissions
    • Run as administrator if necessary
  4. āŒ Import errors

    • Verify all dependencies are installed: uv sync
    • Check Python version: python --version
    • Ensure you're using Python 3.13
  5. ā³ Large document processing

    • Reduce token limit for better performance
    • Consider splitting large documents
    • Monitor memory usage during OCR operations

šŸ› Debug Information

To get more detailed error information, check the Claude Desktop logs or run the server manually in a PowerShell window.

šŸ“ File Structure

LocalDocs/
ā”œā”€ā”€ server.py              # Main MCP server
ā”œā”€ā”€ pyproject.toml         # Project dependencies
ā”œā”€ā”€ .env                   # Environment configuration
ā”œā”€ā”€ README.md              # This documentation
ā”œā”€ā”€ src/
│   └── instructions.md    # Assistant instructions
└── utils/
    ā”œā”€ā”€ __init__.py
    ā”œā”€ā”€ markitdown.py      # Document conversion
    ā”œā”€ā”€ max_tokens.py      # Token management
    ā”œā”€ā”€ ocr.py             # OCR processing
    ā”œā”€ā”€ path_files.py      # File discovery
    └── prompts.py         # Instruction loading

šŸ“„ Supported Document Formats

  • šŸ“Š Microsoft Office: .docx, .xlsx, .pptx
  • šŸ“– PDF: Regular PDFs and scanned PDFs (via OCR)

⚔ Performance Considerations

  • šŸ” OCR Processing: Scanned documents take significantly longer to process
  • šŸŽÆ Token Limits: Adjust based on your document sizes and Claude's context window
  • šŸ’¾ Memory Usage: Large documents and OCR operations can be memory-intensive

šŸ¤ Contributing

When contributing to this project:

  1. Ensure compatibility with Windows and Python 3.13
  2. Test with various document formats
  3. Verify OCR functionality with scanned documents
  4. Update documentation for any new features

šŸ“š Related Documentation

  • MCP Documentation
  • Claude Desktop MCP Guide
  • PDF2Image
  • Poppler PDF Processing
  • Tesseract OCR
  • MarkItDown

šŸ—ŗļø Roadmap and Future Enhancements

šŸ”® Planned Features

  • 🧠 Vector Storage and RAG Integration: Future versions will include vectorial document storage to:

    • Reduce token consumption by avoiding repeated text extraction
    • Enable semantic search across document collections
    • Provide more efficient document retrieval and chunking
    • Support for persistent document indexing
  • šŸ” Enhanced OCR Validation: Currently, OCR functionality for scanned books has not been fully validated and may encounter issues with:

    • Complex layouts and formatting
    • Multi-column documents
    • Poor quality scans
    • Non-standard fonts or languages

šŸ’” Current Recommendations

šŸš€ For Large Context Models
  • šŸ¤– Gemini Models: With 1M+ token context windows, you can process very long documents without truncation
  • šŸŽÆ Token Management: Current implementation supports up to 128K tokens by default, but can be adjusted for larger context models
  • šŸ“– Document Processing: Consider using higher token limits (e.g., 500K-1M) when working with:
    • Complete books or long reports
    • Multiple related documents
    • Comprehensive document analysis
āš ļø Limitations to Consider
  • šŸ” OCR Reliability: Scanned document processing is experimental and may require manual validation
  • ā³ Processing Time: Large documents and OCR operations can be time-intensive
  • šŸ’¾ Memory Usage: High-resolution scanned documents may require significant system resources

Quick Install

Quick Actions

View on GitHubView All Servers

Key Features

Model Context Protocol
Secure Communication
Real-time Updates
Open Source

Boost your projects with Wisdom Gate LLM API

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Learn More
JUHE API Marketplace

Accelerate development, innovate faster, and transform your business with our comprehensive API ecosystem.

JUHE API VS

  • vs. RapidAPI
  • vs. API Layer
  • API Platforms 2025
  • API Marketplaces 2025
  • Best Alternatives to RapidAPI

For Developers

  • Console
  • Collections
  • Documentation
  • MCP Servers
  • Free APIs
  • Temp Mail Demo

Product

  • Browse APIs
  • Suggest an API
  • Wisdom Gate LLM
  • Global SMS Messaging
  • Temp Mail API

Company

  • What's New
  • Welcome
  • About Us
  • Contact Support
  • Terms of Service
  • Privacy Policy
Featured on Startup FameFeatured on Twelve ToolsFazier badgeJuheAPI Marketplace - Connect smarter, beyond APIs | Product Huntai tools code.marketDang.aiFeatured on ShowMeBestAI
Copyright Ā© 2026 JUHEDATA HK LIMITED - All rights reserved