JUHE API Marketplace
adamd9 avatar
MCP Server

MCP Job Search

Helps users find suitable LinkedIn job opportunities by automatically scraping listings, analyzing compatibility with user profiles using AI, and sending custom match reports via email.

0
GitHub Stars
8/23/2025
Last Updated
No Configuration
Please check the documentation below.

README Documentation

MCP Job Search

Deploy to Cloudflare Workers

This project implements a LinkedIn job scraper with persistent job indexing, deep scanning, and filtering capabilities using Cloudflare Workers. It scrapes LinkedIn job listings, performs detailed analysis of each job against a candidate profile using OpenAI, stores matches in Cloudflare KV, and exposes MCP-compatible HTTP endpoints.

Architecture

This implementation uses:

  • Cloudflare Workers for serverless execution
  • Cloudflare's Playwright fork for web scraping
  • Cloudflare KV for persistent data storage
  • OpenAI API for job analysis and matching
  • MCP (Model Context Protocol) for tool integration

Current Status

The worker is fully functional and can be run locally. It includes a complete MCP server with authentication, CORS handling, a health check endpoint, and SSE endpoints for real-time updates.

Some tools are currently implemented as stubs returning mock data, allowing for testing the end-to-end flow. The plan management tools (get_plan, create_plan, update_plan) and email digest functionality (send_digest) are fully implemented.

Setup

Prerequisites

  • Node.js (v18 or later)
  • npm or yarn
  • Cloudflare account (for deployment)

Environment Variables

Create a .env file in the project root with the following variables:

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key
OPENAI_MODEL=gpt-4o

# LinkedIn Credentials (for scraping)
LINKEDIN_EMAIL=your-linkedin-email@example.com
LINKEDIN_PASSWORD=your-linkedin-password

# Email Configuration (for digests)
SMTP_HOST=smtp.example.com
SMTP_PORT=587
SMTP_USER=your-smtp-username
SMTP_PASS=your-smtp-password
DIGEST_FROM=jobs@example.com
DIGEST_TO=you@example.com

# Application Settings
TIMEZONE=Australia/Sydney
ACCESS_TOKEN=your-secure-random-token
DEEP_SCAN_CONCURRENCY=2

The ACCESS_TOKEN is used for API authentication and should be a secure random string.

Installation

  1. Install dependencies:

    npm install
    
  2. Set up your environment variables in .env

  3. Create a job search plan (see Plan Management section below)

Running the Worker

Start the development server:

npm run dev

The worker will be available at http://localhost:8787.

Core Features

Plan-Driven Search

Define your profile, search terms, and scan criteria in a job search plan. The system uses this plan to:

  • Target relevant job searches on LinkedIn
  • Analyze job matches against your profile using OpenAI
  • Score jobs based on fit and requirements

Persistent Job Index

All scraped jobs are stored persistently in Cloudflare KV with:

  • Job details and metadata
  • Match scores and analysis
  • Scan history and timestamps
  • Deduplication to avoid processing the same job twice

Deep Scanning

Visits each job posting to extract comprehensive details:

  • Full job description and requirements
  • Company information and culture
  • Salary and benefits information
  • AI-powered analysis against your profile

Email Digests

Automated email summaries of your best job matches:

  • Configurable match score thresholds
  • Rich HTML formatting with job details
  • Direct links to job postings
  • Scheduled delivery options

API Reference

MCP Tools

The following tools are available via the MCP server:

Plan Management

  • get_plan: Get the current job search plan
  • create_plan: Create a new job search plan from a description
  • update_plan: Update the existing job search plan

Job Scanning & Analysis

  • scan: Scan LinkedIn job pages using Playwright - if URL provided, scans that page; otherwise uses plan URLs
  • rescan: Rescan existing jobs using URLs from the last scan or current plan
  • deep_scan_job: Manually deep scan a specific LinkedIn job URL for testing and debugging
  • failed_jobs: Get a report of jobs that failed during deep scanning with error analysis

Job Index Management

  • get_job_index: Get the current raw job index data for inspection (with filtering options)
  • reset_job_index: Reset the job index to start fresh - removes all stored jobs

System Operations

  • status: Check the status of background jobs (scan progress, errors, etc.)
  • send_digest: Send digest email with job matches to specified email address

HTTP Endpoints

The worker exposes HTTP endpoints for direct API access:

Core Endpoints

  • GET /health - Health check endpoint (no authentication required)
  • POST /mcp - MCP server endpoint (handles all tool calls with authentication)

Note: All MCP tools are accessed via the /mcp endpoint using the MCP protocol. The worker uses token-based authentication for the MCP endpoint.

Plan Management

The job search plan is the core configuration that drives the entire system. It defines:

Plan Structure

{
  "profile": {
    "name": "Your Name",
    "experience": "Senior Software Engineer with 8+ years...",
    "skills": ["JavaScript", "React", "Node.js", "AWS"],
    "preferences": {
      "remote": true,
      "location": "Sydney, Australia",
      "salary_min": 120000
    }
  },
  "searches": [
    {
      "keywords": "Senior Software Engineer React",
      "location": "Sydney",
      "filters": {
        "experience_level": "mid_senior",
        "job_type": "full_time"
      }
    }
  ],
  "scan_prompt": "Analyze this job posting for a senior software engineer..."
}

Creating a Plan

You can create a plan in several ways:

  1. Via MCP Tool: Use the create_plan tool with a natural language description
  2. Via HTTP API: POST to /plan with either JSON or description
  3. Direct File: Create a plan.json file in the project root

Plan Examples

Natural Language Description:

I'm a senior full-stack developer with 8 years experience in React, Node.js, and AWS. 
I'm looking for remote senior engineer roles in fintech or healthcare, 
preferably $120k+ with equity options.

Structured JSON:

{
  "profile": {
    "name": "Senior Developer",
    "experience": "8+ years full-stack development",
    "skills": ["React", "Node.js", "AWS", "TypeScript"]
  },
  "searches": [
    {
      "keywords": "Senior Full Stack Engineer",
      "location": "Remote"
    }
  ]
}

Deployment

Local Development

For local development, the worker runs using Wrangler:

npm run dev

This starts a local development server at http://localhost:8787.

Production Deployment

To deploy to Cloudflare Workers:

  1. Configure Wrangler: Ensure you have a wrangler.toml file configured
  2. Set Environment Variables: Configure secrets in Cloudflare Workers dashboard
  3. Deploy: Run the deployment command
npm run deploy

Environment Variables in Production

Set these as secrets in your Cloudflare Workers environment:

wrangler secret put OPENAI_API_KEY
wrangler secret put LINKEDIN_EMAIL
wrangler secret put LINKEDIN_PASSWORD
wrangler secret put SMTP_HOST
wrangler secret put SMTP_USER
wrangler secret put SMTP_PASS
wrangler secret put ACCESS_TOKEN

Implementation Status

✅ Fully Implemented Features

Core Infrastructure

  • MCP server with complete tool integration
  • Cloudflare Workers runtime environment
  • Token-based authentication and CORS handling
  • Background job processing with status tracking

Plan Management

  • Plan Creation & Updates: AI-powered plan generation from natural language descriptions
  • Plan Storage: Persistent storage in Cloudflare KV
  • Search URL Generation: Automatic LinkedIn search URL creation
  • Plan Feedback: AI analysis and recommendations for plan improvement

Job Scanning & Analysis

  • LinkedIn Scraping: Full Playwright-based job page scraping
  • Deep Scanning: Individual job analysis with OpenAI integration
  • Background Processing: Non-blocking scan operations with status tracking
  • Error Handling: Comprehensive error reporting and failed job analysis
  • Fallback Matching: Keyword-based matching when AI is unavailable

Job Index Management

  • Persistent Storage: Cloudflare KV-based job index with deduplication
  • Job Tracking: Scan status, match scores, and metadata storage
  • Index Inspection: Detailed job index viewing with filtering options
  • Index Reset: Complete job index cleanup functionality

Email Digest System

  • SMTP Integration: Nodemailer-based email sending
  • HTML Email Generation: Rich formatting with job details and links
  • Auto-digest: Automatic email sending after scan completion
  • Job Tracking: Mark jobs as sent to avoid duplicates

Debugging & Monitoring

  • Manual Deep Scan: Test individual job URLs for debugging
  • Failed Jobs Report: Detailed analysis of scan failures with error categorization
  • Status Monitoring: Real-time background job status tracking

Authentication

API endpoints are protected with token-based authentication. Include your ACCESS_TOKEN in requests:

curl -H "Authorization: Bearer your-access-token" http://localhost:8787/status

Troubleshooting

Common Issues

  1. LinkedIn Authentication: Ensure your LinkedIn credentials are correct and the account isn't locked
  2. OpenAI API: Verify your API key has sufficient credits and proper permissions
  3. Email Delivery: Check SMTP settings and ensure the sender email is authorized
  4. Environment Variables: Verify all required variables are set in your .env file

Known Warnings

When testing email functionality, you may see network-related warnings in the Cloudflare Workers environment:

  • "Failed to resolve IPv4 addresses with current network"
  • "Possible EventEmitter memory leak detected"

These are environmental warnings and don't prevent functionality from working correctly.

Development Tips

  • Use the /health endpoint to verify the worker is running
  • Check the browser console for detailed error messages
  • Use the mock data endpoints for testing without external dependencies
  • Test plan creation with natural language descriptions before implementing complex JSON structures

Architecture Notes

Data Storage

The worker uses Cloudflare KV for persistent storage of job indexes, search plans, and scan history.

CORS Handling

Comprehensive CORS support is included for cross-origin requests from web applications.

SSE Support

Server-Sent Events are supported for real-time updates during long-running operations like job scanning.

Limitations

The Worker implementation has some limitations compared to the Node.js version:

  1. No Raw File Storage: The Worker cannot store raw HTML or job extraction files due to lack of filesystem access.
  2. No Screenshots: Screenshot capture is not supported in the Worker environment.
  3. Limited Storage: Job data is stored in Cloudflare KV, which has size limitations.

Quick Actions

Key Features

Model Context Protocol
Secure Communication
Real-time Updates
Open Source