FastMCP Document Analyzer
A comprehensive document analysis server that performs sentiment analysis, keyword extraction, readability scoring, and text statistics while providing document management capabilities including storage, search, and organization.
README Documentation
🔍 FastMCP Document Analyzer
A comprehensive document analysis server built with the modern FastMCP framework
📋 Table of Contents
- 🌟 Features
- 🚀 Quick Start
- 📦 Installation
- 🔧 Usage
- 🛠️ Available Tools
- 📊 Sample Data
- 🏗️ Project Structure
- 🔄 API Reference
- 🧪 Testing
- 📚 Documentation
- 🤝 Contributing
🌟 Features
📖 Document Analysis
- 🎭 Sentiment Analysis: VADER + TextBlob dual-engine sentiment classification
- 🔑 Keyword Extraction: TF-IDF and frequency-based keyword identification
- 📚 Readability Scoring: Multiple metrics (Flesch, Flesch-Kincaid, ARI)
- 📊 Text Statistics: Word count, sentences, paragraphs, and more
🗂️ Document Management
- 💾 Persistent Storage: JSON-based document collection with metadata
- 🔍 Smart Search: TF-IDF semantic similarity search
- 🏷️ Tag System: Category and tag-based organization
- 📈 Collection Insights: Comprehensive statistics and analytics
🚀 FastMCP Advantages
- ⚡ Simple Setup: 90% less boilerplate than standard MCP
- 🔒 Type Safety: Full type validation with Pydantic
- 🎯 Modern API: Decorator-based tool definitions
- 🌐 Multi-Transport: STDIO, HTTP, and SSE support
🚀 Quick Start
1. Clone and Setup
git clone <repository-url>
cd document-analyzer
python -m venv venv
source venv/Scripts/activate # Windows
# source venv/bin/activate # macOS/Linux
2. Install Dependencies
pip install -r requirements.txt
3. Initialize NLTK Data
python -c "import nltk; nltk.download('punkt'); nltk.download('vader_lexicon'); nltk.download('stopwords'); nltk.download('punkt_tab')"
4. Run the Server
python fastmcp_document_analyzer.py
5. Test Everything
python test_fastmcp_analyzer.py
📦 Installation
System Requirements
- Python 3.8 or higher
- 500MB free disk space
- Internet connection (for initial NLTK data download)
Dependencies
fastmcp>=2.3.0 # Modern MCP framework
textblob>=0.17.1 # Sentiment analysis
nltk>=3.8.1 # Natural language processing
textstat>=0.7.3 # Readability metrics
scikit-learn>=1.3.0 # Machine learning utilities
numpy>=1.24.0 # Numerical computing
pandas>=2.0.0 # Data manipulation
python-dateutil>=2.8.2 # Date handling
Optional: Virtual Environment
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
🔧 Usage
Starting the Server
Default (STDIO Transport)
python fastmcp_document_analyzer.py
HTTP Transport (for web services)
python fastmcp_document_analyzer.py --transport http --port 9000
With Custom Host
python fastmcp_document_analyzer.py --transport http --host 0.0.0.0 --port 8080
Basic Usage Examples
# Analyze a document
result = analyze_document("doc_001")
print(f"Sentiment: {result['sentiment_analysis']['overall_sentiment']}")
# Extract keywords
keywords = extract_keywords("Artificial intelligence is transforming healthcare", 5)
print([kw['keyword'] for kw in keywords])
# Search documents
results = search_documents("machine learning", 3)
print(f"Found {len(results)} relevant documents")
# Get collection statistics
stats = get_collection_stats()
print(f"Total documents: {stats['total_documents']}")
🛠️ Available Tools
Core Analysis Tools
Tool | Description | Example |
---|---|---|
analyze_document | 🔍 Complete document analysis | analyze_document("doc_001") |
get_sentiment | 😊 Sentiment analysis | get_sentiment("I love this!") |
extract_keywords | 🔑 Keyword extraction | extract_keywords(text, 10) |
calculate_readability | 📖 Readability metrics | calculate_readability(text) |
Document Management Tools
Tool | Description | Example |
---|---|---|
add_document | 📝 Add new document | add_document("id", "title", "content") |
get_document | 📄 Retrieve document | get_document("doc_001") |
delete_document | 🗑️ Delete document | delete_document("old_doc") |
list_documents | 📋 List all documents | list_documents("Technology") |
Search and Discovery Tools
Tool | Description | Example |
---|---|---|
search_documents | 🔍 Semantic search | search_documents("AI", 5) |
search_by_tags | 🏷️ Tag-based search | search_by_tags(["AI", "tech"]) |
get_collection_stats | 📊 Collection statistics | get_collection_stats() |
📊 Sample Data
The server comes pre-loaded with 16 diverse documents covering:
Category | Documents | Topics |
---|---|---|
Technology | 4 | AI, Quantum Computing, Privacy, Blockchain |
Science | 3 | Space Exploration, Healthcare, Ocean Conservation |
Environment | 2 | Climate Change, Sustainable Agriculture |
Society | 3 | Remote Work, Mental Health, Transportation |
Business | 2 | Economics, Digital Privacy |
Culture | 2 | Art History, Wellness |
Sample Document Structure
{
"id": "doc_001",
"title": "The Future of Artificial Intelligence",
"content": "Artificial intelligence is rapidly transforming...",
"author": "Dr. Sarah Chen",
"category": "Technology",
"tags": ["AI", "technology", "future", "ethics"],
"language": "en",
"created_at": "2024-01-15T10:30:00"
}
🏗️ Project Structure
document-analyzer/
├── 📁 analyzer/ # Core analysis engine
│ ├── __init__.py
│ └── document_analyzer.py # Sentiment, keywords, readability
├── 📁 storage/ # Document storage system
│ ├── __init__.py
│ └── document_storage.py # JSON storage, search, management
├── 📁 data/ # Sample data
│ ├── __init__.py
│ └── sample_documents.py # 16 sample documents
├── 📄 fastmcp_document_analyzer.py # 🌟 Main FastMCP server
├── 📄 test_fastmcp_analyzer.py # Comprehensive test suite
├── 📄 requirements.txt # Python dependencies
├── 📄 documents.json # Persistent document storage
├── 📄 README.md # This documentation
├── 📄 FASTMCP_COMPARISON.md # FastMCP vs Standard MCP
├── 📄 .gitignore # Git ignore patterns
└── 📁 venv/ # Virtual environment (optional)
🔄 API Reference
Document Analysis
analyze_document(document_id: str) -> Dict[str, Any]
Performs comprehensive analysis of a document.
Parameters:
document_id
(str): Unique document identifier
Returns:
{
"document_id": "doc_001",
"title": "Document Title",
"sentiment_analysis": {
"overall_sentiment": "positive",
"confidence": 0.85,
"vader_scores": {...},
"textblob_scores": {...}
},
"keywords": [
{"keyword": "artificial", "frequency": 5, "relevance_score": 2.3}
],
"readability": {
"flesch_reading_ease": 45.2,
"reading_level": "Difficult",
"grade_level": "Grade 12"
},
"basic_statistics": {
"word_count": 119,
"sentence_count": 8,
"paragraph_count": 1
}
}
get_sentiment(text: str) -> Dict[str, Any]
Analyzes sentiment of any text.
Parameters:
text
(str): Text to analyze
Returns:
{
"overall_sentiment": "positive",
"confidence": 0.85,
"vader_scores": {
"compound": 0.7269,
"positive": 0.294,
"negative": 0.0,
"neutral": 0.706
},
"textblob_scores": {
"polarity": 0.5,
"subjectivity": 0.6
}
}
Document Management
add_document(...) -> Dict[str, str]
Adds a new document to the collection.
Parameters:
id
(str): Unique document IDtitle
(str): Document titlecontent
(str): Document contentauthor
(str, optional): Author namecategory
(str, optional): Document categorytags
(List[str], optional): Tags listlanguage
(str, optional): Language code
Returns:
{
"status": "success",
"message": "Document 'my_doc' added successfully",
"document_count": 17
}
Search and Discovery
search_documents(query: str, limit: int = 10) -> List[Dict[str, Any]]
Performs semantic search across documents.
Parameters:
query
(str): Search querylimit
(int): Maximum results
Returns:
[
{
"id": "doc_001",
"title": "AI Document",
"similarity_score": 0.8542,
"content_preview": "First 200 characters...",
"tags": ["AI", "technology"]
}
]
🧪 Testing
Run All Tests
python test_fastmcp_analyzer.py
Test Categories
- ✅ Server Initialization: FastMCP server setup
- ✅ Sentiment Analysis: VADER and TextBlob integration
- ✅ Keyword Extraction: TF-IDF and frequency analysis
- ✅ Readability Calculation: Multiple readability metrics
- ✅ Document Analysis: Full document processing
- ✅ Document Search: Semantic similarity search
- ✅ Collection Statistics: Analytics and insights
- ✅ Document Management: CRUD operations
- ✅ Tag Search: Tag-based filtering
Expected Test Output
=== Testing FastMCP Document Analyzer ===
✓ FastMCP server module imported successfully
✓ Server initialized successfully
✓ Sentiment analysis working
✓ Keyword extraction working
✓ Readability calculation working
✓ Document analysis working
✓ Document search working
✓ Collection statistics working
✓ Document listing working
✓ Document addition and deletion working
✓ Tag search working
=== All FastMCP tests completed successfully! ===
📚 Documentation
Additional Resources
- 📖 FastMCP Documentation
- 📖 MCP Protocol Specification
- 📖 FASTMCP_COMPARISON.md - FastMCP vs Standard MCP
Key Concepts
Sentiment Analysis
Uses dual-engine approach:
- VADER: Rule-based, excellent for social media text
- TextBlob: Machine learning-based, good for general text
Keyword Extraction
Combines multiple approaches:
- TF-IDF: Term frequency-inverse document frequency
- Frequency Analysis: Simple word frequency counting
- Relevance Scoring: Weighted combination of both methods
Readability Metrics
Provides multiple readability scores:
- Flesch Reading Ease: 0-100 scale (higher = easier)
- Flesch-Kincaid Grade: US grade level
- ARI: Automated Readability Index
Document Search
Uses TF-IDF vectorization with cosine similarity:
- Converts documents to numerical vectors
- Calculates similarity between query and documents
- Returns ranked results with similarity scores
🤝 Contributing
Development Setup
# Clone repository
git clone <repository-url>
cd document-analyzer
# Create development environment
python -m venv venv
source venv/Scripts/activate # Windows
pip install -r requirements.txt
# Run tests
python test_fastmcp_analyzer.py
Adding New Tools
FastMCP makes it easy to add new tools:
@mcp.tool
def my_new_tool(param: str) -> Dict[str, Any]:
"""
🔧 Description of what this tool does.
Args:
param: Parameter description
Returns:
Return value description
"""
# Implementation here
return {"result": "success"}
Code Style
- Use type hints for all functions
- Add comprehensive docstrings
- Include error handling
- Follow PEP 8 style guidelines
- Add emoji icons for better readability
Testing New Features
- Add your tool to the main server file
- Create test cases in the test file
- Run the test suite to ensure everything works
- Update documentation as needed
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- FastMCP Team for the excellent framework
- NLTK Team for natural language processing tools
- TextBlob Team for sentiment analysis capabilities
- Scikit-learn Team for machine learning utilities
Made with ❤️ using FastMCP
🚀 Ready to analyze documents? Start with
python fastmcp_document_analyzer.py