MCP Server
Job URL Analyzer MCP Server
A FastAPI-based microservice that analyzes job URLs and extracts detailed company information by crawling job postings and company websites, with data enrichment from external providers.
2
GitHub Stars
11/17/2025
Last Updated
No Configuration
Please check the documentation below.
README Documentation
Job URL Analyzer MCP Server
A comprehensive FastAPI-based microservice for analyzing job URLs and extracting detailed company information. Built with modern async Python, this service crawls job postings and company websites to build rich company profiles with data enrichment from external providers.
β¨ Features
- π·οΈ Intelligent Web Crawling: Respectful crawling with robots.txt compliance and rate limiting
- π§ Content Extraction: Advanced HTML parsing using Selectolax for fast, accurate data extraction
- π Data Enrichment: Pluggable enrichment providers (Crunchbase, LinkedIn, custom APIs)
- π Quality Scoring: Completeness and confidence metrics for extracted data
- π Markdown Reports: Beautiful, comprehensive company analysis reports
- π Observability: OpenTelemetry tracing, Prometheus metrics, structured logging
- π Production Ready: Docker, Kubernetes, health checks, graceful shutdown
- π§ͺ Well Tested: Comprehensive test suite with 80%+ coverage
ποΈ Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β FastAPI App βββββΆβ Orchestrator βββββΆβ Web Crawler β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Content Extract β β Database β
βββββββββββββββββββ β (SQLAlchemy) β
β βββββββββββββββββββ
βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Enrichment βββββΆβ Providers β
β Manager β β (Crunchbase,etc)β
βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Report Generatorβ
βββββββββββββββββββ
π Quick Start
Prerequisites
- Python 3.11+
- Poetry (for dependency management)
- Docker & Docker Compose (optional)
Local Development
-
Clone and Setup
git clone https://github.com/subslink326/job-url-analyzer-mcp.git cd job-url-analyzer-mcp poetry install -
Environment Configuration (Optional)
# The application has sensible defaults and can run without environment configuration # To customize settings, create a .env file with your configuration # See src/job_url_analyzer/config.py for available settings -
Database Setup
poetry run alembic upgrade head -
Run Development Server
poetry run python -m job_url_analyzer.main # Server starts at http://localhost:8000
Docker Deployment
-
Development
docker-compose up --build -
Production
docker-compose -f docker-compose.prod.yml up -d
π‘ API Usage
Analyze Job URL
curl -X POST "http://localhost:8000/analyze" \
-H "Content-Type: application/json" \
-d '{
"url": "https://company.com/jobs/software-engineer",
"include_enrichment": true,
"force_refresh": false
}'
Response Example
{
"profile_id": "123e4567-e89b-12d3-a456-426614174000",
"source_url": "https://company.com/jobs/software-engineer",
"company_profile": {
"name": "TechCorp",
"description": "Leading AI company...",
"industry": "Technology",
"employee_count": 150,
"funding_stage": "Series B",
"total_funding": 25.0,
"headquarters": "San Francisco, CA",
"tech_stack": ["Python", "React", "AWS"],
"benefits": ["Health insurance", "Remote work"]
},
"completeness_score": 0.85,
"confidence_score": 0.90,
"processing_time_ms": 3450,
"enrichment_sources": ["crunchbase"],
"markdown_report": "# TechCorp - Company Analysis Report\n..."
}
βοΈ Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
DEBUG | Enable debug mode | false |
HOST | Server host | 0.0.0.0 |
PORT | Server port | 8000 |
DATABASE_URL | Database connection string | sqlite+aiosqlite:///./data/job_analyzer.db |
MAX_CONCURRENT_REQUESTS | Max concurrent HTTP requests | 10 |
REQUEST_TIMEOUT | HTTP request timeout (seconds) | 30 |
CRAWL_DELAY | Delay between requests (seconds) | 1.0 |
RESPECT_ROBOTS_TXT | Respect robots.txt | true |
ENABLE_CRUNCHBASE | Enable Crunchbase enrichment | false |
CRUNCHBASE_API_KEY | Crunchbase API key | "" |
DATA_RETENTION_DAYS | Data retention period | 90 |
π Monitoring
Metrics Endpoints
- Health Check:
GET /health - Prometheus Metrics:
GET /metrics
Key Metrics
job_analyzer_requests_total- Total API requestsjob_analyzer_analysis_success_total- Successful analysesjob_analyzer_completeness_score- Data completeness distributionjob_analyzer_crawl_requests_total- Crawl requests by statusjob_analyzer_enrichment_success_total- Enrichment success by provider
π§ͺ Testing
Run Tests
# Unit tests
poetry run pytest
# With coverage
poetry run pytest --cov=job_url_analyzer --cov-report=html
# Integration tests only
poetry run pytest -m integration
# Skip slow tests
poetry run pytest -m "not slow"
π Deployment
Kubernetes
# Apply manifests
kubectl apply -f kubernetes/
# Check deployment
kubectl get pods -l app=job-analyzer
kubectl logs -f deployment/job-analyzer
Production Checklist
- Environment variables configured
- Database migrations applied
- SSL certificates configured
- Monitoring dashboards set up
- Log aggregation configured
- Backup strategy implemented
- Rate limiting configured
- Resource limits set
π§ Development
Project Structure
job-url-analyzer/
βββ src/job_url_analyzer/ # Main application code
β βββ enricher/ # Enrichment providers
β βββ main.py # FastAPI application
β βββ config.py # Configuration
β βββ models.py # Pydantic models
β βββ database.py # Database models
β βββ crawler.py # Web crawler
β βββ extractor.py # Content extraction
β βββ orchestrator.py # Main orchestrator
β βββ report_generator.py # Report generation
βββ tests/ # Test suite
βββ alembic/ # Database migrations
βββ kubernetes/ # K8s manifests
βββ monitoring/ # Monitoring configs
βββ docker-compose.yml # Development setup
βββ docker-compose.prod.yml # Production setup
βββ Dockerfile # Container definition
Code Quality
The project uses:
- Black for code formatting
- Ruff for linting
- MyPy for type checking
- Pre-commit hooks for quality gates
# Setup pre-commit
poetry run pre-commit install
# Run quality checks
poetry run black .
poetry run ruff check .
poetry run mypy src/
π Recent Changes
Dependency Updates
- Fixed: Replaced non-existent
aiohttp-robotparserdependency withrobotexclusionrulesparserfor robots.txt parsing - Improved: Setup process now works out-of-the-box without requiring
.envfile configuration
π€ Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
poetry run pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Support
- Documentation: This README and inline code comments
- Issues: GitHub Issues for bug reports and feature requests
- Discussions: GitHub Discussions for questions and community
Built with β€οΈ using FastAPI, SQLAlchemy, and modern Python tooling.
Quick Actions
Key Features
Model Context Protocol
Secure Communication
Real-time Updates
Open Source