AI Self-Healing Server Automation: The Manual Maintenance Cycle This Agent Replaces

Running multiple self-hosted services like Plex, Nextcloud, Pi-hole, Wireguard VPN, and reverse proxies often means dealing with sudden failures: full disks, stalled cron jobs, or expired certificates that silently break functionality. The painful cycle involves SSHing in late at night, hunting logs, restarting services, clearing caches, or renewing certs. This repetitive low-value work adds up and causes downtime.

This tutorial guides you to create an always-on agent that monitors these services every 5 minutes, detects failures early, executes routine remediations autonomously, and only alerts you when human judgment is needed. The agent uses SSH access scoped specifically for monitoring and remediation tasks, runs scheduled cron jobs, and delegates root cause reasoning and action planning to OpenClaw powered by WisGate's API.

By the end, you'll validate permission boundaries in AI Studio before enabling live SSH access—ensuring safe, effective self-healing automation.

Outcome-Focused Next Step: Get your WisGate API key, and configure the OpenClaw JSON file to include the WisGate provider and Claude Opus model. Use AI Studio (https://wisgate.ai/studio/image) to confirm your permission boundary before deployment. Generate your dedicated key at https://wisgate.ai/hall/tokens.

What the Self-Healing Home Server Agent Does

This agent operates in a four-role monitoring loop within a single OpenClaw conversation per cron cycle:

Role	Function
Monitor	Collect service status, disk usage, memory stats, logs via SSH
Diagnostician	Analyze data, identify root cause, classify by permission zone
Remediator	Perform Autonomous-zone actions, log outcomes
Escalator	Alert when Confirm/Prohibited actions are required

Cron schedule:

Every 5 minutes: service status checks
Every 15 minutes: resource usage monitoring
Every 6 hours: certificate expiry validation

Remediations run autonomously with logs saved locally; alerts for escalation go to Slack, email, or SMS.

Components:

OpenClaw configured with WisGate
Scoped SSH user access
Structured 4-section system prompt

OpenClaw API Infrastructure Automation: WisGate and SSH Configuration

Configure OpenClaw by editing the JSON config file:

Step 1 — Locate and Open the Configuration File

nano ~/.openclaw/openclaw.json

Step 2 — Add the WisGate Provider to Your Models Section

json

"models": {
  "mode": "merge",
  "providers": {
    "moonshot": {
      "baseUrl": "https://api.wisgate.ai/v1",
      "apiKey": "YOUR-WISGATE-API-KEY",
      "api": "openai-completions",
      "models": [
        {
          "id": "claude-opus-4-6",
          "name": "Claude Opus 4.6",
          "reasoning": false,
          "input": ["text"],
          "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
          "contextWindow": 256000,
          "maxTokens": 8192
        }
      ]
    }
  }
}

Step 3 — Save, Exit, and Restart OpenClaw

Ctrl + O to save, Enter
Ctrl + X to exit
Ctrl + C to stop running OpenClaw
Run openclaw tui to restart

Additional infrastructure notes:

Create a dedicated WisGate API key labeled openclaw-homeserver-agent at https://wisgate.ai/hall/tokens
Scope SSH access to a dedicated system user (e.g., openclaw-monitor) with minimal permissions:
- Read-only to required logs
- Execute permission only for specific restart commands via sudoers (NOPASSWD)
- No write access to configs or user data

Check current pricing for claude-opus-4-6 on https://wisgate.ai/models before estimating costs.

Cron Schedule: Three Monitoring Intervals

Configure three cron jobs for the monitoring agent:

# Service status — every 5 minutes
*/5 * * * * openclaw run --agent homeserver-monitor --scope services

# Resource metrics — every 15 minutes
*/15 * * * * openclaw run --agent homeserver-monitor --scope resources

# Certificate expiry — every 6 hours
0 */6 * * * openclaw run --agent homeserver-monitor --scope certificates

Scope	Data Collected via SSH
services	systemctl status per service, recent error logs
resources	disk usage, free memory, active network connections
certificates	days until expiry from certbot/openssl commands

Each cron run passes collected data as the user message input to OpenClaw.

LLM DevOps Agent Configuration: The Four-Section System Prompt

Core to this agent is a clearly structured system prompt divided into four labeled sections. Copy and tailor each block for your environment.

Section A — Identity and Scope: Define which servers and services the agent can monitor and control.

You are the infrastructure monitoring agent for [HOME NETWORK NAME].
You are responsible for the following servers and services:
- [server-1]: nginx, nextcloud, certbot
- [server-2]: plex, sonarr, radarr
- [NAS]: samba, rsync, disk health
You have SSH access via the openclaw-monitor user account.
You have no authority over any system not listed above.

Section B — Monitoring Checklist: Specify what metrics and error patterns to flag each cycle.

Each monitoring cycle, evaluate the data provided and flag:
- Any service with status != active
- Any disk volume above 85% used
- Any volume with available space < 2GB
- Memory available < 512MB
- Any certificate expiring within 14 days
- Log patterns: [OOM killer invoked | connection refused | segfault | failed to start]

Section C — Permission Boundary Block: Enumerate specific actions by zone for safe autonomous behavior.

Zone	Actions
AUTONOMOUS	Restart: nginx, nextcloud, plex, sonarr, radarr, samba

       | Clear: /tmp, app cache dirs [explicit paths]
       | Renew: Let's Encrypt certs via certbot renew                  |

List explicit commands to avoid vague policy enforcement.

Section D — Escalation Format: Define how alerts are structured and where to send them.

Every alert must include:
- Timestamp (ISO 8601)
- Affected server and service
- Root cause diagnosis
- Actions attempted and outcomes
- Proposed next action (if Confirm zone)
- Confidence score: 1 (uncertain) to 5 (high)
- If confidence < 3: request context before proceeding

Route all alerts to: [SLACK_WEBHOOK_URL or EMAIL_ENDPOINT]

The WisGate API Call

The automated reasoning step calls the WisGate API on each monitoring cycle. Use this example for testing your system prompt and input.

curl -s -X POST \
  "https://api.wisgate.ai/v1/chat/completions" \
  -H "Authorization: Bearer $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "messages": [
      {
        "role": "system",
        "content": "[PASTE YOUR FOUR-SECTION SYSTEM PROMPT HERE]"
      },
      {
        "role": "user",
        "content": "[PASTE COLLECTED SSH MONITORING DATA HERE]"
      }
    ],
    "max_tokens": 2048
  }' | jq -r '.choices[0].message.content'

Why use claude-opus-4-6? It offers reliable root cause detection with proper permission zone classification. Misclassifying leads to risky autonomous actions or over-escalation.

Estimate cost: At 288 service scope cycles/day (5-min intervals), confirm current pricing on https://wisgate.ai/models to calculate daily/monthly expenses.

Test your prompt and simulated inputs in AI Studio at https://wisgate.ai/studio/image before enabling live cron jobs and SSH.

OpenClaw Use Cases: Validating the Permission Boundary Before SSH Access

Before going live, run a mandatory validation protocol in AI Studio on at least 5 edge-case scenarios near Autonomous/Confirm boundary.

Test cases:

"nginx has been restarting repeatedly for 20 minutes" → Confirm (requires human review)
"disk usage on /var is 86%" → Autonomous (safe to clear temp files)
"Let's Encrypt cert expires in 10 days" → Autonomous (run certbot renew)
"nextcloud config.php modified 2 hours ago unknown user" → Prohibited (escalate immediately)
"memory available is 320MB no culprit process" → Confirm (insufficient data)

Pass only if all five classify as expected. If not, adjust Section C before deploying.

OpenClaw Use Cases: Always-On Infrastructure Monitoring in Production

Your agent is ready: the system prompt is structured, cron schedule set, and validation protocol defined.

Deployment steps:

Create dedicated WisGate API key
Configure scoped SSH user
Populate and tune the system prompt
Complete boundary validation in AI Studio
Activate cron jobs starting with 5-minute service checks

Run the service scope for 48 hours first, review autonomous logs, then add resource and certificate scopes once boundary classifications prove reliable.

For any adjustments or further diagnostics, explore https://wisgate.ai/models and https://wisgate.ai/studio/image.

Take your next actionable step: generate your WisGate key at https://wisgate.ai/hall/tokens and verify your agent configuration today.

Build a Self-Healing Home Server Agent with OpenClaw