Server Health Monitoring

If Moltbot runs on your VPS, you already have an AI agent sitting on the server. Put it to work as a lightweight monitoring system that checks disk usage, memory, and error logs — and only notifies you when something is wrong.

This is not a replacement for Prometheus or Datadog. It is a simple, zero-infrastructure alerting layer that requires nothing more than the Moltbot you already have running.

Prerequisites

Moltbot running on a VPS or server — This recipe requires Moltbot to have local access to the server it is monitoring
Shell or command execution MCP tool — Moltbot needs the ability to run system commands like df, free, and journalctl
Scheduled tasks enabled — See Scheduled Tasks

How It Works

A cron job triggers Moltbot at regular intervals. The prompt instructs it to run system commands, interpret the output, and decide whether the results warrant a notification. The critical design principle is: alert on anomalies only, no noise.

If disk usage is at 45% and there are no errors in the logs, you hear nothing. If disk usage spikes to 85% or a critical error appears, you get a Telegram message immediately.

Setup

Step 1: Basic Health Check

Start with a straightforward configuration:

yaml

cron:
  - name: server-health
    schedule: "0 */6 * * *"
    channel: telegram
    prompt: |
      Run these checks:
      1. Disk usage (df -h)
      2. Memory usage (free -m)
      3. Recent error logs (journalctl -p err --since "6 hours ago")
      If disk usage exceeds 80% or there are critical errors, notify me.
      Otherwise, send nothing.

This runs every 6 hours. The "send nothing" instruction is essential — without it, you would receive four "everything is fine" messages per day.

Step 2: Add More Checks

Expand the health check to cover additional concerns:

yaml

cron:
  - name: server-health-extended
    schedule: "0 */6 * * *"
    channel: telegram
    prompt: |
      Run the following server health checks:

      1. Disk usage (df -h) — alert if any partition exceeds 80%
      2. Memory usage (free -m) — alert if available memory is below 500MB
      3. CPU load average (uptime) — alert if 15-min average exceeds 4.0
      4. Docker containers (docker ps -a) — alert if any container has exited or is restarting
      5. Recent errors (journalctl -p err --since "6 hours ago") — summarize if any found
      6. SSL certificate expiry (echo | openssl s_client -connect mydomain.com:443 2>/dev/null | openssl x509 -noout -dates) — alert if expiring within 14 days

      For each issue found, include:
      - What the problem is
      - The actual value vs. the threshold
      - A suggested action

      If everything is healthy, send nothing.

Step 3: Process-Specific Monitoring

Monitor critical services that must stay running:

yaml

cron:
  - name: service-watchdog
    schedule: "*/15 * * * *"
    channel: telegram
    prompt: |
      Check if these services are running:
      - nginx (systemctl is-active nginx)
      - postgresql (systemctl is-active postgresql)
      - moltbot itself (docker ps | grep moltbot)
      - redis (systemctl is-active redis)

      If any service is down, notify me immediately with the service name
      and the output of its status command.
      If all services are running, send nothing.

Step 4: Log Analysis

Go beyond simple error detection with AI-powered log analysis:

yaml

cron:
  - name: log-analysis
    schedule: "0 8 * * *"
    channel: telegram
    prompt: |
      Analyze the last 24 hours of logs:
      1. Run: journalctl --since "24 hours ago" -p warning
      2. Run: tail -100 /var/log/nginx/error.log
      3. Run: docker logs moltbot --since 24h 2>&1 | tail -50

      Look for:
      - Repeated errors (same error appearing many times)
      - New errors not seen before
      - Patterns that suggest an emerging problem

      Provide a brief analysis. If nothing notable, send nothing.

Step 5: Resource Trend Tracking (Optional)

Use memory to track trends over time:

yaml

cron:
  - name: resource-snapshot
    schedule: "0 */6 * * *"
    channel: telegram
    prompt: |
      Record current resource usage to memory:
      - Disk usage percentage for /
      - Memory usage percentage
      - Number of running Docker containers

      Compare with the previous snapshot in memory.
      If disk usage grew by more than 5% since last check, alert me.
      If memory usage has been above 80% for 3 consecutive checks, alert me.
      Otherwise, just save the snapshot silently (do NOT send a message).

This gives you trend-based alerting, not just point-in-time checks.

Edge Cases and Troubleshooting

Permission issues: Some commands (e.g., journalctl, docker ps) require specific permissions. Make sure the user or container running Moltbot has the necessary access. Running Moltbot as root is not recommended; instead, add the user to the docker and systemd-journal groups.
Command availability: Not all servers have the same tools installed. journalctl is systemd-specific; Alpine-based containers use syslog. Adjust commands for your environment.
False positives: A brief CPU spike during a backup might trigger an alert. Tune thresholds to avoid noise: use 15-minute load averages instead of 1-minute, and set disk thresholds appropriate for your server's capacity.
Alert fatigue: If a known issue causes repeated alerts (e.g., a disk that is always at 82%), either fix the underlying issue or adjust the threshold temporarily: "Ignore disk usage on /data until I expand the volume next week."
Time-based noise: Log files often have scheduled spikes (e.g., cron jobs that produce errors at specific times). If you notice patterns, add exceptions to the prompt: "Ignore the 'backup rotation' warnings from logrotate."

Pro Tips

Use memory for incident history. When an alert fires, Moltbot can save it to memory. Later, you can ask: "Show me all server alerts from the past month. Is there a pattern?" This is lightweight incident tracking without any additional tools.
Combine checks intelligently. Instead of separate cron jobs for each check, consolidate into one comprehensive health check. This reduces cron entries and ensures a single, coherent alert message when multiple things go wrong simultaneously.
Pair with a real monitoring stack. If you use Prometheus/Grafana for dashboards, Moltbot can complement them as a notification layer. It adds AI interpretation — instead of "disk at 83%," you get "disk at 83%, growing 2% per day, you have approximately 8 days before it fills up."
Monitor external services too. Add URL health checks: "Curl https://myapp.com/health and alert if the response is not 200 or if response time exceeds 5 seconds."
Set up an escalation chain. "If the same alert fires twice in a row (2 consecutive checks), escalate by including 'URGENT' in the message title."

Scheduled Tasks — Cron configuration for health checks
MCP Tools — Shell execution tools
Memory System — Track resource trends and incident history
Personalized Daily Briefing — Include server status in your morning briefing
Creative Use Cases — More automation ideas

Server Health Monitoring ​

Prerequisites ​

How It Works ​

Setup ​

Step 1: Basic Health Check ​

Step 2: Add More Checks ​

Step 3: Process-Specific Monitoring ​

Step 4: Log Analysis ​

Step 5: Resource Trend Tracking (Optional) ​

Edge Cases and Troubleshooting ​

Pro Tips ​

Related Pages ​

Server Health Monitoring

Prerequisites

How It Works

Setup

Step 1: Basic Health Check

Step 2: Add More Checks

Step 3: Process-Specific Monitoring

Step 4: Log Analysis

Step 5: Resource Trend Tracking (Optional)

Edge Cases and Troubleshooting

Pro Tips

Related Pages