MCP Job Search

This project implements a LinkedIn job scraper with persistent job indexing, deep scanning, and filtering capabilities using Cloudflare Workers. It scrapes LinkedIn job listings, performs detailed analysis of each job against a candidate profile using OpenAI, stores matches in Cloudflare KV, and exposes MCP-compatible HTTP endpoints.

Architecture

This implementation uses:

Cloudflare Workers for serverless execution
Cloudflare's Playwright fork for web scraping
Cloudflare KV for persistent data storage
OpenAI API for job analysis and matching
MCP (Model Context Protocol) for tool integration

Current Status

The worker is fully functional and can be run locally. It includes a complete MCP server with authentication, CORS handling, a health check endpoint, and SSE endpoints for real-time updates.

Some tools are currently implemented as stubs returning mock data, allowing for testing the end-to-end flow. The plan management tools (get_plan, create_plan, update_plan) and email digest functionality (send_digest) are fully implemented.

Setup

Prerequisites

Node.js (v18 or later)
npm or yarn
Cloudflare account (for deployment)

Environment Variables

Create a .env file in the project root with the following variables:

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key
OPENAI_MODEL=gpt-4o

# LinkedIn Credentials (for scraping)
LINKEDIN_EMAIL=your-linkedin-email@example.com
LINKEDIN_PASSWORD=your-linkedin-password

# Email Configuration (for digests)
SMTP_HOST=smtp.example.com
SMTP_PORT=587
SMTP_USER=your-smtp-username
SMTP_PASS=your-smtp-password
DIGEST_FROM=jobs@example.com
DIGEST_TO=you@example.com

# Application Settings
TIMEZONE=Australia/Sydney
ACCESS_TOKEN=your-secure-random-token
DEEP_SCAN_CONCURRENCY=2

The ACCESS_TOKEN is used for API authentication and should be a secure random string.

Installation

Install dependencies:
```
npm install
```
Set up your environment variables in .env
Create a job search plan (see Plan Management section below)

Running the Worker

Start the development server:

npm run dev

The worker will be available at http://localhost:8787.

Core Features

Plan-Driven Search

Define your profile, search terms, and scan criteria in a job search plan. The system uses this plan to:

Target relevant job searches on LinkedIn
Analyze job matches against your profile using OpenAI
Score jobs based on fit and requirements

Persistent Job Index

All scraped jobs are stored persistently in Cloudflare KV with:

Job details and metadata
Match scores and analysis
Scan history and timestamps
Deduplication to avoid processing the same job twice

Deep Scanning

Visits each job posting to extract comprehensive details:

Full job description and requirements
Company information and culture
Salary and benefits information
AI-powered analysis against your profile

Email Digests

Automated email summaries of your best job matches:

Configurable match score thresholds
Rich HTML formatting with job details
Direct links to job postings
Scheduled delivery options

API Reference

MCP Tools

The following tools are available via the MCP server:

Plan Management

get_plan: Get the current job search plan
create_plan: Create a new job search plan from a description
update_plan: Update the existing job search plan

Job Scanning & Analysis

scan: Scan LinkedIn job pages using Playwright - if URL provided, scans that page; otherwise uses plan URLs
rescan: Rescan existing jobs using URLs from the last scan or current plan
deep_scan_job: Manually deep scan a specific LinkedIn job URL for testing and debugging
failed_jobs: Get a report of jobs that failed during deep scanning with error analysis

Job Index Management

get_job_index: Get the current raw job index data for inspection (with filtering options)
reset_job_index: Reset the job index to start fresh - removes all stored jobs

System Operations

status: Check the status of background jobs (scan progress, errors, etc.)
send_digest: Send digest email with job matches to specified email address

HTTP Endpoints

The worker exposes HTTP endpoints for direct API access:

Core Endpoints

GET /health - Health check endpoint (no authentication required)
POST /mcp - MCP server endpoint (handles all tool calls with authentication)

Note: All MCP tools are accessed via the /mcp endpoint using the MCP protocol. The worker uses token-based authentication for the MCP endpoint.

Plan Management

The job search plan is the core configuration that drives the entire system. It defines:

Plan Structure

{
  "profile": {
    "name": "Your Name",
    "experience": "Senior Software Engineer with 8+ years...",
    "skills": ["JavaScript", "React", "Node.js", "AWS"],
    "preferences": {
      "remote": true,
      "location": "Sydney, Australia",
      "salary_min": 120000
    }
  },
  "searches": [
    {
      "keywords": "Senior Software Engineer React",
      "location": "Sydney",
      "filters": {
        "experience_level": "mid_senior",
        "job_type": "full_time"
      }
    }
  ],
  "scan_prompt": "Analyze this job posting for a senior software engineer..."
}

Creating a Plan

You can create a plan in several ways:

Via MCP Tool: Use the create_plan tool with a natural language description
Via HTTP API: POST to /plan with either JSON or description
Direct File: Create a plan.json file in the project root

Plan Examples

Natural Language Description:

I'm a senior full-stack developer with 8 years experience in React, Node.js, and AWS. 
I'm looking for remote senior engineer roles in fintech or healthcare, 
preferably $120k+ with equity options.

Structured JSON:

{
  "profile": {
    "name": "Senior Developer",
    "experience": "8+ years full-stack development",
    "skills": ["React", "Node.js", "AWS", "TypeScript"]
  },
  "searches": [
    {
      "keywords": "Senior Full Stack Engineer",
      "location": "Remote"
    }
  ]
}

Deployment

Local Development

For local development, the worker runs using Wrangler:

npm run dev

This starts a local development server at http://localhost:8787.

Production Deployment

To deploy to Cloudflare Workers:

Configure Wrangler: Ensure you have a wrangler.toml file configured
Set Environment Variables: Configure secrets in Cloudflare Workers dashboard
Deploy: Run the deployment command

npm run deploy

Environment Variables in Production

Set these as secrets in your Cloudflare Workers environment:

wrangler secret put OPENAI_API_KEY
wrangler secret put LINKEDIN_EMAIL
wrangler secret put LINKEDIN_PASSWORD
wrangler secret put SMTP_HOST
wrangler secret put SMTP_USER
wrangler secret put SMTP_PASS
wrangler secret put ACCESS_TOKEN

Implementation Status

✅ Fully Implemented Features

Core Infrastructure

MCP server with complete tool integration
Cloudflare Workers runtime environment
Token-based authentication and CORS handling
Background job processing with status tracking

Plan Management

Plan Creation & Updates: AI-powered plan generation from natural language descriptions
Plan Storage: Persistent storage in Cloudflare KV
Search URL Generation: Automatic LinkedIn search URL creation
Plan Feedback: AI analysis and recommendations for plan improvement

Job Scanning & Analysis

LinkedIn Scraping: Full Playwright-based job page scraping
Deep Scanning: Individual job analysis with OpenAI integration
Background Processing: Non-blocking scan operations with status tracking
Error Handling: Comprehensive error reporting and failed job analysis
Fallback Matching: Keyword-based matching when AI is unavailable

Job Index Management

Persistent Storage: Cloudflare KV-based job index with deduplication
Job Tracking: Scan status, match scores, and metadata storage
Index Inspection: Detailed job index viewing with filtering options
Index Reset: Complete job index cleanup functionality

Email Digest System

SMTP Integration: Nodemailer-based email sending
HTML Email Generation: Rich formatting with job details and links
Auto-digest: Automatic email sending after scan completion
Job Tracking: Mark jobs as sent to avoid duplicates

Debugging & Monitoring

Manual Deep Scan: Test individual job URLs for debugging
Failed Jobs Report: Detailed analysis of scan failures with error categorization
Status Monitoring: Real-time background job status tracking

Authentication

API endpoints are protected with token-based authentication. Include your ACCESS_TOKEN in requests:

curl -H "Authorization: Bearer your-access-token" http://localhost:8787/status

Troubleshooting

Common Issues

LinkedIn Authentication: Ensure your LinkedIn credentials are correct and the account isn't locked
OpenAI API: Verify your API key has sufficient credits and proper permissions
Email Delivery: Check SMTP settings and ensure the sender email is authorized
Environment Variables: Verify all required variables are set in your .env file

Known Warnings

When testing email functionality, you may see network-related warnings in the Cloudflare Workers environment:

"Failed to resolve IPv4 addresses with current network"
"Possible EventEmitter memory leak detected"

These are environmental warnings and don't prevent functionality from working correctly.

Development Tips

Use the /health endpoint to verify the worker is running
Check the browser console for detailed error messages
Use the mock data endpoints for testing without external dependencies
Test plan creation with natural language descriptions before implementing complex JSON structures

Architecture Notes

Data Storage

The worker uses Cloudflare KV for persistent storage of job indexes, search plans, and scan history.

CORS Handling

Comprehensive CORS support is included for cross-origin requests from web applications.

SSE Support

Server-Sent Events are supported for real-time updates during long-running operations like job scanning.

Limitations

The Worker implementation has some limitations compared to the Node.js version:

No Raw File Storage: The Worker cannot store raw HTML or job extraction files due to lack of filesystem access.
No Screenshots: Screenshot capture is not supported in the Worker environment.
Limited Storage: Job data is stored in Cloudflare KV, which has size limitations.

MCP Job Search

Architecture

This implementation uses:

Cloudflare Workers for serverless execution
Cloudflare's Playwright fork for web scraping
Cloudflare KV for persistent data storage
OpenAI API for job analysis and matching
MCP (Model Context Protocol) for tool integration

Current Status

The worker is fully functional and can be run locally. It includes a complete MCP server with authentication, CORS handling, a health check endpoint, and SSE endpoints for real-time updates.

Setup

Prerequisites

Node.js (v18 or later)
npm or yarn
Cloudflare account (for deployment)

Environment Variables

Create a .env file in the project root with the following variables:

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key
OPENAI_MODEL=gpt-4o

# LinkedIn Credentials (for scraping)
LINKEDIN_EMAIL=your-linkedin-email@example.com
LINKEDIN_PASSWORD=your-linkedin-password

# Email Configuration (for digests)
SMTP_HOST=smtp.example.com
SMTP_PORT=587
SMTP_USER=your-smtp-username
SMTP_PASS=your-smtp-password
DIGEST_FROM=jobs@example.com
DIGEST_TO=you@example.com

# Application Settings
TIMEZONE=Australia/Sydney
ACCESS_TOKEN=your-secure-random-token
DEEP_SCAN_CONCURRENCY=2

The ACCESS_TOKEN is used for API authentication and should be a secure random string.

Installation

Install dependencies:
```
npm install
```
Set up your environment variables in .env
Create a job search plan (see Plan Management section below)

Running the Worker

Start the development server:

npm run dev

The worker will be available at http://localhost:8787.

Core Features

Plan-Driven Search

Define your profile, search terms, and scan criteria in a job search plan. The system uses this plan to:

Target relevant job searches on LinkedIn
Analyze job matches against your profile using OpenAI
Score jobs based on fit and requirements

Persistent Job Index

All scraped jobs are stored persistently in Cloudflare KV with:

Job details and metadata
Match scores and analysis
Scan history and timestamps
Deduplication to avoid processing the same job twice

Deep Scanning

Visits each job posting to extract comprehensive details:

Full job description and requirements
Company information and culture
Salary and benefits information
AI-powered analysis against your profile

Email Digests

Automated email summaries of your best job matches:

Configurable match score thresholds
Rich HTML formatting with job details
Direct links to job postings
Scheduled delivery options

API Reference

MCP Tools

The following tools are available via the MCP server:

Plan Management

get_plan: Get the current job search plan
create_plan: Create a new job search plan from a description
update_plan: Update the existing job search plan

Job Scanning & Analysis

scan: Scan LinkedIn job pages using Playwright - if URL provided, scans that page; otherwise uses plan URLs
rescan: Rescan existing jobs using URLs from the last scan or current plan
deep_scan_job: Manually deep scan a specific LinkedIn job URL for testing and debugging
failed_jobs: Get a report of jobs that failed during deep scanning with error analysis

Job Index Management

get_job_index: Get the current raw job index data for inspection (with filtering options)
reset_job_index: Reset the job index to start fresh - removes all stored jobs

System Operations

status: Check the status of background jobs (scan progress, errors, etc.)
send_digest: Send digest email with job matches to specified email address

HTTP Endpoints

The worker exposes HTTP endpoints for direct API access:

Core Endpoints

GET /health - Health check endpoint (no authentication required)
POST /mcp - MCP server endpoint (handles all tool calls with authentication)

Note: All MCP tools are accessed via the /mcp endpoint using the MCP protocol. The worker uses token-based authentication for the MCP endpoint.

Plan Management

The job search plan is the core configuration that drives the entire system. It defines:

Plan Structure

{
  "profile": {
    "name": "Your Name",
    "experience": "Senior Software Engineer with 8+ years...",
    "skills": ["JavaScript", "React", "Node.js", "AWS"],
    "preferences": {
      "remote": true,
      "location": "Sydney, Australia",
      "salary_min": 120000
    }
  },
  "searches": [
    {
      "keywords": "Senior Software Engineer React",
      "location": "Sydney",
      "filters": {
        "experience_level": "mid_senior",
        "job_type": "full_time"
      }
    }
  ],
  "scan_prompt": "Analyze this job posting for a senior software engineer..."
}

Creating a Plan

You can create a plan in several ways:

Via MCP Tool: Use the create_plan tool with a natural language description
Via HTTP API: POST to /plan with either JSON or description
Direct File: Create a plan.json file in the project root

Plan Examples

Natural Language Description:

I'm a senior full-stack developer with 8 years experience in React, Node.js, and AWS. 
I'm looking for remote senior engineer roles in fintech or healthcare, 
preferably $120k+ with equity options.

Structured JSON:

{
  "profile": {
    "name": "Senior Developer",
    "experience": "8+ years full-stack development",
    "skills": ["React", "Node.js", "AWS", "TypeScript"]
  },
  "searches": [
    {
      "keywords": "Senior Full Stack Engineer",
      "location": "Remote"
    }
  ]
}

Deployment

Local Development

For local development, the worker runs using Wrangler:

npm run dev

This starts a local development server at http://localhost:8787.

Production Deployment

To deploy to Cloudflare Workers:

Configure Wrangler: Ensure you have a wrangler.toml file configured
Set Environment Variables: Configure secrets in Cloudflare Workers dashboard
Deploy: Run the deployment command

npm run deploy

Environment Variables in Production

Set these as secrets in your Cloudflare Workers environment:

wrangler secret put OPENAI_API_KEY
wrangler secret put LINKEDIN_EMAIL
wrangler secret put LINKEDIN_PASSWORD
wrangler secret put SMTP_HOST
wrangler secret put SMTP_USER
wrangler secret put SMTP_PASS
wrangler secret put ACCESS_TOKEN

Implementation Status

✅ Fully Implemented Features

Core Infrastructure

MCP server with complete tool integration
Cloudflare Workers runtime environment
Token-based authentication and CORS handling
Background job processing with status tracking

Plan Management

Plan Creation & Updates: AI-powered plan generation from natural language descriptions
Plan Storage: Persistent storage in Cloudflare KV
Search URL Generation: Automatic LinkedIn search URL creation
Plan Feedback: AI analysis and recommendations for plan improvement

Job Scanning & Analysis

LinkedIn Scraping: Full Playwright-based job page scraping
Deep Scanning: Individual job analysis with OpenAI integration
Background Processing: Non-blocking scan operations with status tracking
Error Handling: Comprehensive error reporting and failed job analysis
Fallback Matching: Keyword-based matching when AI is unavailable

Job Index Management

Persistent Storage: Cloudflare KV-based job index with deduplication
Job Tracking: Scan status, match scores, and metadata storage
Index Inspection: Detailed job index viewing with filtering options
Index Reset: Complete job index cleanup functionality

Email Digest System

SMTP Integration: Nodemailer-based email sending
HTML Email Generation: Rich formatting with job details and links
Auto-digest: Automatic email sending after scan completion
Job Tracking: Mark jobs as sent to avoid duplicates

Debugging & Monitoring

Manual Deep Scan: Test individual job URLs for debugging
Failed Jobs Report: Detailed analysis of scan failures with error categorization
Status Monitoring: Real-time background job status tracking

Authentication

API endpoints are protected with token-based authentication. Include your ACCESS_TOKEN in requests:

curl -H "Authorization: Bearer your-access-token" http://localhost:8787/status

Troubleshooting

Common Issues

LinkedIn Authentication: Ensure your LinkedIn credentials are correct and the account isn't locked
OpenAI API: Verify your API key has sufficient credits and proper permissions
Email Delivery: Check SMTP settings and ensure the sender email is authorized
Environment Variables: Verify all required variables are set in your .env file

Known Warnings

When testing email functionality, you may see network-related warnings in the Cloudflare Workers environment:

"Failed to resolve IPv4 addresses with current network"
"Possible EventEmitter memory leak detected"

These are environmental warnings and don't prevent functionality from working correctly.

Development Tips

Use the /health endpoint to verify the worker is running
Check the browser console for detailed error messages
Use the mock data endpoints for testing without external dependencies
Test plan creation with natural language descriptions before implementing complex JSON structures

Architecture Notes

Data Storage

The worker uses Cloudflare KV for persistent storage of job indexes, search plans, and scan history.

CORS Handling

Comprehensive CORS support is included for cross-origin requests from web applications.

SSE Support

Server-Sent Events are supported for real-time updates during long-running operations like job scanning.

Limitations

The Worker implementation has some limitations compared to the Node.js version:

No Raw File Storage: The Worker cannot store raw HTML or job extraction files due to lack of filesystem access.
No Screenshots: Screenshot capture is not supported in the Worker environment.
Limited Storage: Job data is stored in Cloudflare KV, which has size limitations.

MCP Job Search

README Documentation

MCP Job Search

Architecture

Current Status

Setup

Prerequisites

Environment Variables

Installation

Running the Worker

Core Features

Plan-Driven Search

Persistent Job Index

Deep Scanning

Email Digests

API Reference

MCP Tools

Plan Management

Job Scanning & Analysis

Job Index Management

System Operations

HTTP Endpoints

Core Endpoints

Plan Management

Plan Structure

Creating a Plan

Plan Examples

Deployment

Local Development

Production Deployment

Environment Variables in Production

Implementation Status

✅ Fully Implemented Features

Core Infrastructure

Plan Management

Job Scanning & Analysis

Job Index Management

Email Digest System

Debugging & Monitoring

Authentication

Troubleshooting

Common Issues

Known Warnings

Development Tips

Architecture Notes

Data Storage

CORS Handling

SSE Support

Limitations

Quick Actions

Key Features

MCP Job Search

README Documentation

MCP Job Search

Architecture

Current Status

Setup

Prerequisites

Environment Variables

Installation

Running the Worker

Core Features

Plan-Driven Search

Persistent Job Index

Deep Scanning

Email Digests

API Reference

MCP Tools

Plan Management

Job Scanning & Analysis

Job Index Management

System Operations

HTTP Endpoints

Core Endpoints

Plan Management

Plan Structure

Creating a Plan

Plan Examples

Deployment

Local Development