README Documentation
Debugg AI — MCP Server
AI-powered browser testing via the Model Context Protocol. Point it at any URL (or localhost) and describe what to test — an AI agent browses your app and returns pass/fail with screenshots.
Setup
Requires Node.js 20.20.0 or later (transitive requirement from posthog-node@^5.26.0).
Get an API key at debugg.ai, then add to your MCP client config:
{
"mcpServers": {
"debugg-ai": {
"command": "npx",
"args": ["-y", "@debugg-ai/debugg-ai-mcp"],
"env": {
"DEBUGGAI_API_KEY": "your_api_key_here"
}
}
}
}
Or with Docker:
docker run -i --rm --init -e DEBUGGAI_API_KEY=your_api_key quinnosha/debugg-ai-mcp
Tools
The server exposes 8 tools: three Browser tools plus one action-based tool per managed entity. The headline tools are check_app_in_browser (full AI agent) and probe_page (lightweight no-LLM page probe). The rest — project, environment, test_suite, test_case, executions — each take an action discriminator (e.g. {"action":"list"}) that selects the operation. Destructive delete actions require confirmation (an elicitation prompt where supported, otherwise confirm: true).
Browser
check_app_in_browser
Runs an AI browser agent against your app. The agent navigates, interacts, and reports back with screenshots. Localhost URLs are auto-tunneled via ngrok.
| Parameter | Type | Description |
|---|---|---|
description | string required | What to test (natural language) |
url | string required | Target URL — http://localhost:3000 is auto-tunneled |
environmentId | string | UUID of a specific environment |
credentialId | string | UUID of a specific credential |
credentialRole | string | Pick a credential by role (e.g. admin, guest) |
username | string | Username for login (ephemeral — not persisted) |
password | string | Password for login (ephemeral — not persisted) |
repoName | string | Override auto-detected git repo name (e.g. my-org/my-repo) |
One focused check per call. The agent has a ~25-step internal budget; split broader suites across multiple calls.
Every successful run returns a browserSession block alongside the screenshot — presigned S3 URLs for the captured HAR (full network trace) and console log (every JS console message). Use them to detect refetch loops, hydration errors, and other runtime issues that pass type-checks and unit tests:
"browserSession": {
"harUrl": "https://...session_18139.har?X-Amz-...",
"consoleLogUrl": "https://...session_18139_console.json?X-Amz-...",
"recordingUrl": "https://...session_18139_recording.webm?X-Amz-...",
"harStatus": "downloaded",
"consoleLogStatus": "downloaded",
"harRedactionStatus": "redacted",
"consoleLogRedactionStatus": "redacted"
}
URLs are short-lived presigned S3 — refetch the parent execution via executions {action:"get", uuid} to renew. harStatus / consoleLogStatus disambiguate 'downloaded' (URL fetchable), 'not_available' (page emitted nothing), 'failed' (capture broke). On a fresh run the URLs are commonly null because capture uploads async after the agent finishes — poll executions {action:"get", uuid: executionId} until status reaches 'downloaded'. Authorization / Cookie / token/secret/api_key headers are scrubbed server-side before the artifacts are persisted.
trigger_crawl
Fires a server-side browser-agent crawl to populate the project's knowledge graph. Localhost URLs tunnel automatically. Returns {executionId, status, targetUrl, durationMs, outcome?, crawlSummary?, knowledgeGraph?, browserSession?} with knowledgeGraph.imported === true on successful ingestion. The browserSession block (HAR + console-log URLs, same shape as above) is also present on completed crawls.
probe_page
Lightweight no-LLM batch page probe. Pass 1-20 URLs; each navigates, waits for load, and returns rendered state — screenshot + page metadata + structured console errors + network summary. No agent loop, no LLM cost, no scenario assertions. Use it for "did I just break /settings?", multi-route smoke after a refactor, CI per-PR sweeps, and quick is-it-up checks where check_app_in_browser's 60-150s agent loop is overkill.
| Parameter | Type | Description |
|---|---|---|
targets | array required | 1-20 entries: [{url, waitForSelector?, waitForLoadState?, timeoutMs?}] |
targets[].url | string required | Public URL or localhost (auto-tunneled) |
targets[].waitForLoadState | enum | 'load' (default) / 'domcontentloaded' / 'networkidle' |
targets[].waitForSelector | string | Optional CSS selector to wait for after navigation |
targets[].timeoutMs | number | Per-URL timeout, 1000-30000 (default 10000) |
includeHtml | boolean | Return raw HTML in each result (default false) |
captureScreenshots | boolean | Return one PNG per target (default true) |
The whole batch shares a single backend execution + browser session + tunnel — 5 URLs in one call is dramatically faster than 5 parallel single-URL calls. Per-URL error field preserves batch resilience: a single failed target doesn't fail the others.
networkSummary aggregation key is origin + pathname — refetch loops (?n=0..4 repeatedly hitting the same endpoint) collapse into a single entry with the count, so /api/poll showing up with count: 47 is the actionable "infinite refetch loop" signal users originally asked for.
Performance budget: <10s for 1 URL, <25s for 20. Localhost dead-port returns LocalServerUnreachable in <2s without burning a workflow execution.
project
| Action | Params | Result |
|---|---|---|
get | {uuid} | Curated project detail |
list | {q?, page?, pageSize?} | Paginated summaries |
create | {name, platform, (teamUuid|teamName), (repoUuid|repoName)} | Created project |
Team and repo resolve by either uuid or name (case-insensitive exact match; NotFound if none, AmbiguousMatch if multiple). There is no update/delete — rename or delete a project from the DebuggAI web app.
environment
| Action | Params | Result |
|---|---|---|
get | {uuid, projectUuid?} | Env with credentials inlined (passwords never returned) |
list | {projectUuid?, q?, page?, pageSize?} | Paginated envs, each with a credentials array |
create | {name, url, description?, projectUuid?, credentials?} | Created env (optionally seeds credentials) |
update | {uuid, name?, url?, description?, addCredentials?, updateCredentials?, removeCredentialIds?} | Patched env; credential ops run remove → update → add |
delete | {uuid, projectUuid?, confirm?} | Deletes env (cascades credentials) — requires confirmation |
projectUuid auto-resolves from the git repo when omitted. Per-cred failures surface in credentialWarnings[] without blocking the env op.
test_suite
| Action | Params | Result |
|---|---|---|
list | {projectUuid|projectName, search?, page?, pageSize?} | Paginated suites with status + pass rate |
create | {name, description, projectUuid|projectName} | Created suite |
run | {suiteUuid|(suiteName+project), targetUrl?} | Triggers all tests async |
results | {suiteUuid|(suiteName+project)} | Suite + per-test outcomes |
delete | {suiteUuid|(suiteName+project), confirm?} | Soft-delete — requires confirmation |
test_case
| Action | Params | Result |
|---|---|---|
create | {name, description, agentTaskDescription, suiteUuid|(suiteName+project), relativeUrl?, maxSteps?} | Created test case (not auto-run) |
update | {testUuid, name?, description?, agentTaskDescription?} | Patched test case |
delete | {testUuid, confirm?} | Soft-delete — requires confirmation |
executions
| Action | Params | Result |
|---|---|---|
get | {uuid} | Full detail (nodeExecutions + state + errorInfo) + screenshot/gif artifacts |
list | {status?, projectUuid?, page?, pageSize?} | Paginated summaries |
404 from the backend surfaces as isError: true with {error: 'NotFound', message, uuid}. Credentials are always returned without passwords.
Pagination
Every filter-mode response is paginated. Response shape:
{
"filter": { "...echoed query params..." },
"pageInfo": { "page": 1, "pageSize": 20, "totalCount": 47, "totalPages": 3, "hasMore": true },
"<items>": [ ... ]
}
Pass optional page (1-indexed, default 1) and pageSize (default 20, max 200; oversized values are clamped). No response is ever silently truncated.
Resources
Alongside tools, the server exposes the read-only entities as MCP resources so clients can browse and @-mention them as context:
| URI | What |
|---|---|
debugg-ai://projects | All projects (first page) |
debugg-ai://environments | Environments for the auto-detected project |
debugg-ai://executions | Recent executions (first page) |
debugg-ai://project/{uuid} | One project, full detail |
debugg-ai://environment/{uuid} | One environment (credentials inline, passwords redacted) |
debugg-ai://execution/{uuid} | One execution, full node detail + artifact links |
Reads dispatch to the same handlers as the project / environment /
executions tools, so the data and auth are identical. Resources are additive —
clients without resource support keep using the tools.
Security invariants
- Passwords are write-only. They never appear in any response body from any tool.
- Tunnel URLs (
*.ngrok.debugg.ai) are stripped from all browser-agent responses, including agent-authored text. - 404s from the backend surface as
isError: truewith{error: 'NotFound', ...}, never as thrown exceptions. - Missing
DEBUGGAI_API_KEYsurfaces as a structured tool error on first invocation — the server still registers and lists tools normally.
Migration to v3.0.0 (action-based tools)
v3 consolidated the 20 per-verb tools into 8 action-based tools. Old tool → new tool {action}:
| Removed | Replacement |
|---|---|
search_projects | project {action:"get"} / project {action:"list"} |
create_project | project {action:"create"} |
update_project, delete_project | Dropped — use the DebuggAI web app |
search_environments | environment {action:"get"} / {action:"list"} |
create_environment / update_environment / delete_environment | environment {action:"create"|"update"|"delete"} |
create_test_suite / search_test_suites / run_test_suite / get_test_suite_results / delete_test_suite | test_suite {action:"create"|"list"|"run"|"results"|"delete"} |
create_test_case / update_test_case / delete_test_case | test_case {action:"create"|"update"|"delete"} |
search_executions | executions {action:"get"|"list"} |
trigger_crawl headless param | Dropped — always headless |
delete actions now require confirmation (elicitation prompt, or confirm: true). Clients pick up the new surface on MCP restart.
Migration from v1.x (breaking change in v2.0.0)
v2 collapsed a 22-tool surface to 11. Old-tool → new-tool mapping:
| Removed | Replacement |
|---|---|
list_projects, get_project | search_projects (uuid mode vs filter mode) |
list_environments, get_environment | search_environments |
list_credentials, get_credential | search_environments — credentials inline on each env |
create_credential | create_environment({credentials: [...]}) seed, or update_environment({addCredentials: [...]}) |
update_credential | update_environment({updateCredentials: [{uuid, ...patch}]}) |
delete_credential | update_environment({removeCredentialIds: [uuid]}) |
list_teams, list_repos | create_project({teamName, repoName}) — name resolution with ambiguity handling |
list_executions, get_execution | search_executions |
cancel_execution | Dropped — backend spin-down is automatic |
Response-shape changes: the bare count field on list responses is gone — use pageInfo.totalCount.
Configuration
| Env var | Required | Purpose |
|---|---|---|
DEBUGGAI_API_KEY | yes | Backend API key. Aliases: DEBUGGAI_API_TOKEN, DEBUGGAI_JWT_TOKEN. |
DEBUGGAI_API_URL | no | Backend base URL. Defaults to https://api.debugg.ai. |
DEBUGGAI_TOKEN_TYPE | no | token (default) or bearer. |
LOG_LEVEL | no | error / warn / info (default) / debug. |
POSTHOG_API_KEY | no | Override the embedded telemetry project key (e.g. private fork). |
DEBUGGAI_TELEMETRY_DISABLED | no | Set to 1 / true / yes / on to disable telemetry entirely. |
DEBUGGAI_API_KEY=your_api_key
Remote / HTTP transport (optional)
By default the server speaks stdio (local npx). It can instead run as a
hosted, multi-user remote MCP over stateless Streamable HTTP + OAuth:
DEBUGGAI_MCP_TRANSPORT=http PORT=3000 DEBUGGAI_TOKEN_TYPE=bearer npx -y @debugg-ai/debugg-ai-mcp@latest
It is an OAuth Resource Server: every POST /mcp needs
Authorization: Bearer <token>; missing/invalid tokens get a 401 with a
WWW-Authenticate pointing at the RFC 9728 metadata, and clients run the OAuth
flow against the advertised authorization server. The bearer is request-scoped —
api.debugg.ai validates it.
| Endpoint | Purpose |
|---|---|
POST /mcp | MCP Streamable HTTP (bearer-protected) |
GET /.well-known/oauth-protected-resource | RFC 9728 metadata (authorization server discovery) |
GET /health | Load-balancer / ECS health check |
| Env var | Default | Purpose |
|---|---|---|
DEBUGGAI_MCP_TRANSPORT | stdio | Set to http for the remote transport |
PORT | 3000 | HTTP listen port |
DEBUGGAI_MCP_PUBLIC_URL | https://mcp.debugg.ai | This server's public resource URL (RFC 9728 resource) |
DEBUGGAI_OAUTH_ISSUER | https://auth.debugg.ai | Authorization server advertised to clients |
DEBUGGAI_TOKEN_TYPE | token | Set to bearer so OAuth tokens forward as Authorization: Bearer |
stdio installs need none of these.
Telemetry
The MCP server ships with telemetry enabled by default — an embedded write-only PostHog project key (phc_*) so the team can observe cache hit rates, poll cadence, tunnel reliability, and other operational metrics across the install base. Captured events:
| Event | When |
|---|---|
tool.executed / tool.failed | Per tool call |
workflow.executed | Per browser-agent execution (carries pollCount, durationMs, finalIntervalMs) |
tunnel.provisioned / tunnel.provision_retry / tunnel.stopped | Per tunnel lifecycle event |
template.lookup / project.lookup | Cache hit/miss with durationMs on cold-call |
Privacy posture:
- The distinct ID is
SHA-256(api_key).slice(0, 16)— never the raw key, no PII. phc_*keys are write-only by PostHog convention; safe to embed in source.- Set
DEBUGGAI_TELEMETRY_DISABLED=1to opt out entirely (resolves to a no-op provider; no events leave the process).
The active mode is logged at boot:
Telemetry enabled (PostHog, DebuggAI default project). Set DEBUGGAI_TELEMETRY_DISABLED=1 to opt out.
Telemetry enabled (PostHog, custom POSTHOG_API_KEY)
Telemetry disabled (DEBUGGAI_TELEMETRY_DISABLED is set)
Local Development
npm install
npm run build
npm run test:e2e # real end-to-end evals against the backend
The eval suite spawns the built MCP server as a subprocess, exercises every tool against a real backend, and writes per-flow artifacts to scripts/evals/artifacts/<timestamp>/. See scripts/evals/flows/ for the individual scenarios.
MCP registration: debugg-ai-local vs debugg-ai
This repo ships a .mcp.json that registers a project-scoped server named debugg-ai-local pointing at node dist/index.js — the freshly-built local code. It only activates when Claude Code's working directory is this repo.
Your other projects should use the user-scoped debugg-ai registration that pulls from the published npm package:
npm run mcp:global # registers debugg-ai in ~/.claude.json to npx -y @debugg-ai/debugg-ai-mcp
After editing code here, run npm run mcp:local (which just rebuilds) so the next invocation of debugg-ai-local picks up your changes.
Links
Dashboard · Docs · Issues · Discord
Apache-2.0 License © 2025 DebuggAI