RAG Anything MCP Server
An MCP server that provides comprehensive multimodal Retrieval-Augmented Generation (RAG) capabilities for processing and querying document directories, supporting text, images, tables, and equations.
README Documentation
RAG Anything MCP Server
An MCP (Model Context Protocol) server that provides comprehensive RAG (Retrieval-Augmented Generation) capabilities for processing and querying directories of documents using the raganything
library with full multimodal support.
Features
- End-to-End Document Processing: Complete document parsing with multimodal content extraction
- Multimodal RAG: Support for images, tables, equations, and text processing
- Batch Processing: Process entire directories with multiple file types
- Advanced Querying: Both pure text and multimodal-enhanced queries
- Multiple Query Modes: hybrid, local, global, naive, mix, and bypass modes
- Vision Processing: Advanced image analysis using GPT-4V
- Persistent Storage: RAG instances maintained per directory for efficient querying
Available Tools
process_directory
Process all files in a directory for comprehensive RAG indexing with multimodal support.
Required Parameters:
directory_path
: Path to the directory containing files to processapi_key
: OpenAI API key for LLM and embedding functions
Optional Parameters:
working_dir
: Custom working directory for RAG storagebase_url
: OpenAI API base URL (for custom endpoints)file_extensions
: List of file extensions to process (default: ['.pdf', '.docx', '.pptx', '.txt', '.md'])recursive
: Process subdirectories (default: True)enable_image_processing
: Enable image analysis (default: True)enable_table_processing
: Enable table extraction (default: True)enable_equation_processing
: Enable equation processing (default: True)max_workers
: Concurrent processing workers (default: 4)
process_single_document
Process a single document with full multimodal analysis.
Required Parameters:
file_path
: Path to the document to processapi_key
: OpenAI API key
Optional Parameters:
working_dir
: Custom working directory for RAG storagebase_url
: OpenAI API base URLoutput_dir
: Output directory for parsed contentparse_method
: Document parsing method (default: "auto")enable_image_processing
: Enable image analysis (default: True)enable_table_processing
: Enable table extraction (default: True)enable_equation_processing
: Enable equation processing (default: True)
query_directory
Pure text query against processed documents using LightRAG.
Parameters:
directory_path
: Path to the processed directoryquery
: The question to ask about the documentsmode
: Query mode - "hybrid", "local", "global", "naive", "mix", or "bypass" (default: "hybrid")
query_with_multimodal_content
Enhanced query with additional multimodal content (tables, equations, etc.).
Parameters:
directory_path
: Path to the processed directoryquery
: The question to askmultimodal_content
: List of multimodal content dictionariesmode
: Query mode (default: "hybrid")
Example multimodal_content:
[
{
"type": "table",
"table_data": "Method,Accuracy\\nRAGAnything,95.2%\\nBaseline,87.3%",
"table_caption": "Performance comparison"
},
{
"type": "equation",
"latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}",
"equation_caption": "Document relevance probability"
}
]
list_processed_directories
List all directories that have been processed and are available for querying.
get_rag_info
Get detailed information about the RAG configuration and status for a directory.
Usage Examples
1. Basic Directory Processing
process_directory(
directory_path="/path/to/documents",
api_key="your-openai-api-key"
)
2. Advanced Directory Processing
process_directory(
directory_path="/path/to/research_papers",
api_key="your-openai-api-key",
file_extensions=[".pdf", ".docx"],
enable_image_processing=true,
enable_table_processing=true,
max_workers=6
)
3. Pure Text Query
query_directory(
directory_path="/path/to/documents",
query="What are the main findings in these research papers?",
mode="hybrid"
)
4. Multimodal Query with Table Data
query_with_multimodal_content(
directory_path="/path/to/documents",
query="Compare these results with the document findings",
multimodal_content=[{
"type": "table",
"table_data": "Method,Accuracy,Speed\\nRAGAnything,95.2%,120ms\\nBaseline,87.3%,180ms",
"table_caption": "Performance comparison"
}],
mode="hybrid"
)
5. Single Document Processing
process_single_document(
file_path="/path/to/important_paper.pdf",
api_key="your-openai-api-key",
enable_image_processing=true
)
Setup Requirements
1. Environment Variables
export OPENAI_API_KEY="your-openai-api-key-here"
2. Install Dependencies
uv sync
3. Run the MCP Server
python main.py
Query Modes Explained
- hybrid: Combines local and global search (recommended for most use cases)
- local: Focuses on local context and entity relationships
- global: Provides broader, document-level insights and summaries
- naive: Simple keyword-based search without graph reasoning
- mix: Combines multiple approaches for comprehensive results
- bypass: Direct access without RAG processing
Multimodal Content Types
The server supports processing and querying with:
- Images: Automatic caption generation and visual analysis
- Tables: Structure extraction and content analysis
- Equations: LaTeX parsing and mathematical reasoning
- Charts/Graphs: Visual data interpretation
- Mixed Content: Combined analysis of multiple content types
API Configuration
The server uses OpenAI's APIs by default:
- LLM: GPT-4o-mini for text processing
- Vision: GPT-4o for image analysis
- Embeddings: text-embedding-3-large (3072 dimensions)
You can customize the base_url
parameter to use:
- Azure OpenAI
- OpenAI-compatible APIs
- Custom model endpoints
File Support
Supported file formats include:
- PDF documents
- Microsoft Word (.docx)
- PowerPoint presentations (.pptx)
- Text files (.txt)
- Markdown files (.md)
- And more via the raganything library
Performance Notes
- Concurrent Processing: Use
max_workers
to control parallel document processing - Memory Usage: Large documents with many images may require significant memory
- API Costs: Vision processing (GPT-4o) is more expensive than text processing
- Storage: Processed data is stored locally for efficient re-querying