JUHE API Marketplace
jpwebb avatar
MCP Server

PDFtotext MCP Server

A reliable server for extracting text from PDF documents using the poppler-utils' pdftotext utility, compatible with any Model Context Protocol client.

0
GitHub Stars
11/23/2025
Last Updated
MCP Server Configuration
1{
2 "name": "pdftotext",
3 "command": "pdftotext-mcp"
4}
JSON4 lines
  1. Home
  2. MCP Servers
  3. pdftotext-mcp

README Documentation

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

npm version

๐Ÿš€ Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

  • โœ… Actually works - Clean JSON-RPC communication without stdout pollution
  • โœ… Reliable - Built on mature pdftotext from poppler-utils (used by millions)
  • โœ… Lightweight - Minimal dependencies, maximum compatibility
  • โœ… Production tested - Successfully tested with Claude Desktop and other MCP clients
  • โœ… Feature complete - Page-specific extraction, layout preservation, encoding options
  • โœ… Error handling - Comprehensive validation and helpful error messages

๐Ÿ“‹ Features

  • ๐Ÿ“„ Extract text from entire PDF documents or specific pages
  • ๐ŸŽจ Preserve original layout formatting (optional)
  • ๐Ÿ”ค Multiple text encoding support (UTF-8, Latin1, ASCII)
  • ๐Ÿ“Š Comprehensive metadata in responses (word count, file info, etc.)
  • ๐Ÿ›ก๏ธ File validation and security checks
  • โšก Fast processing with configurable timeouts
  • ๐Ÿ” Detailed error reporting with troubleshooting hints

๐Ÿ”ง Prerequisites

You must have pdftotext installed on your system:

Ubuntu/Debian

sudo apt update
sudo apt install poppler-utils

macOS

brew install poppler

Windows

# Using Chocolatey
choco install poppler

# Using Scoop
scoop install poppler

Verify Installation

pdftotext -v

๐Ÿ“ฆ Installation

Option 1: Global Installation (Recommended)

npm install -g pdftotext-mcp

Option 2: Use with npx (No Installation)

npx pdftotext-mcp

Option 3: Local Development

git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
npm start

โš™๏ธ Configuration

Add to your MCP client configuration:

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "pdftotext": {
      "command": "pdftotext-mcp"
    }
  }
}

Or with npx:

{
  "mcpServers": {
    "pdftotext": {
      "command": "npx",
      "args": ["pdftotext-mcp"]
    }
  }
}

Other MCP Clients

The server works with any MCP-compatible client. Use pdftotext-mcp as the command.

๐ŸŽฏ Usage

The server provides a single, powerful tool: read_pdf_text

Basic Usage

Extract entire document

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf"
  }
}

Extract specific page

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "page": 2
  }
}

Preserve layout formatting

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "layout": true
  }
}

Custom encoding

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "encoding": "Latin1"
  }
}

Response Format

Success Response

{
  "success": true,
  "file": "document.pdf",
  "path": "/absolute/path/to/document.pdf",
  "extractedText": "Full text content...",
  "pageSpecific": "all",
  "layoutPreserved": false,
  "encoding": "UTF-8",
  "fileSize": 1048576,
  "lastModified": "2024-01-15T10:30:00.000Z",
  "extractedAt": "2024-01-15T10:35:00.000Z",
  "textLength": 5234,
  "wordCount": 892
}

Error Response

{
  "success": false,
  "error": "File not found: ./nonexistent.pdf",
  "errorType": "FILE_NOT_FOUND",
  "file": "./nonexistent.pdf",
  "timestamp": "2024-01-15T10:35:00.000Z"
}

๐Ÿ“š API Reference

Tool: read_pdf_text

Extracts text content from PDF files using pdftotext.

Parameters

ParameterTypeRequiredDefaultDescription
pathstringโœ…-Path to PDF file (relative or absolute)
pagenumberโŒall pagesSpecific page to extract (1-based)
layoutbooleanโŒfalsePreserve original text layout
encodingstringโŒ"UTF-8"Output text encoding

Supported Encodings

  • UTF-8 (default)
  • Latin1
  • ASCII

Error Types

  • FILE_NOT_FOUND - PDF file doesn't exist
  • PERMISSION_DENIED - Cannot read the file
  • INVALID_PDF - File is not a valid PDF
  • PDFTOTEXT_ERROR - pdftotext utility error
  • UNKNOWN_ERROR - Unexpected error

๐Ÿ”ง Troubleshooting

"pdftotext is not available"

Solution: Install poppler-utils (see Prerequisites)

"File not found"

Solutions:

  • Use absolute paths: /home/user/document.pdf
  • Check file exists: ls -la /path/to/file.pdf
  • Verify MCP server working directory

"Permission denied"

Solutions:

  • Check file permissions: chmod 644 document.pdf
  • Ensure directory is readable: chmod 755 /path/to/directory/

"File is not a valid PDF"

Solutions:

  • Verify file is actually a PDF: file document.pdf
  • Check for file corruption
  • Try with a different PDF file

MCP Connection Issues

Solutions:

  • Restart your MCP client completely
  • Check configuration syntax in config file
  • Verify pdftotext-mcp is accessible in PATH
  • Check MCP client logs for detailed errors

๐Ÿงช Testing

# Run tests
npm test

# Run tests with watch mode
npm run test:watch

# Run linter
npm run lint

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install

Running Locally

npm start

Code Style

This project uses ESLint. Run npm run lint to check code style.

๐Ÿ“„ License

MIT - see LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built for the Model Context Protocol ecosystem
  • Uses poppler-utils pdftotext utility
  • Inspired by the need for reliable PDF processing in MCP environments

๐Ÿ”— Related

  • Model Context Protocol Documentation
  • Claude Desktop MCP Configuration
  • Poppler Utils Documentation

Made for the MCP community

Quick Install

Quick Actions

View on GitHubView All Servers

Key Features

Model Context Protocol
Secure Communication
Real-time Updates
Open Source

Boost your projects with Wisdom Gate LLM API

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Learn More
JUHE API Marketplace

Accelerate development, innovate faster, and transform your business with our comprehensive API ecosystem.

JUHE API VS

  • vs. RapidAPI
  • vs. API Layer
  • API Platforms 2025
  • API Marketplaces 2025
  • Best Alternatives to RapidAPI

For Developers

  • Console
  • Collections
  • Documentation
  • MCP Servers
  • Free APIs
  • Temp Mail Demo

Product

  • Browse APIs
  • Suggest an API
  • Wisdom Gate LLM
  • Global SMS Messaging
  • Temp Mail API

Company

  • What's New
  • Welcome
  • About Us
  • Contact Support
  • Terms of Service
  • Privacy Policy
Featured on Startup FameFeatured on Twelve ToolsFazier badgeJuheAPI Marketplace - Connect smarter, beyond APIs | Product Huntai tools code.marketDang.ai
Copyright ยฉ 2025 - All rights reserved