MCP as Infrastructure Control Plane: Building a Real-Time Monitoring Dashboard in 2 Hours

The Model Context Protocol is being discussed primarily as a way to give AI agents tools to call. That framing is too narrow. MCP is a general-purpose tool protocol that can control infrastructure, bridge systems, and expose capabilities to any client that speaks the protocol — AI agent or otherwise.

I built a real-time infrastructure monitoring dashboard in under two hours using an MCP shell server I wrote for Claude Desktop. The experience clarified something: the value of MCP for infrastructure work isn't just that AI can call your tools — it's that structuring your infrastructure operations as MCP tools forces them into a form that's composable, auditable, and accessible to any reasoning system.

The Problem: Infrastructure Visibility Without Overhead

The traditional options for infrastructure monitoring are:

Heavy platforms (Datadog, New Relic, Grafana stack): powerful but expensive and operationally complex to maintain
SSH and manual commands: flexible but not persistent, not dashboarded, not shareable
Custom scripts: solve specific problems but don't compose

What I wanted was a middle layer: structured access to infrastructure metrics that I could query conversationally, build dashboards from, and eventually wire up to alerting — without running a monitoring service that itself requires monitoring.

The MCP Shell Server

The shell server is a lightweight MCP server that exposes SSH command execution as tools. The server runs locally, and Claude Desktop connects to it as an MCP client.

// Claude Desktop MCP configuration
{
  "mcpServers": {
    "shell-server": {
      "command": "node",
      "args": ["/path/to/mcpshellserver/index.js"],
      "env": {
        "SSH_CONFIG": "/path/to/ssh-config.json"
      }
    }
  }
}

The SSH config maps server aliases to connection details:

{
  "servers": {
    "prod-web": { "host": "10.0.1.10", "user": "admin", "key": "~/.ssh/prod_rsa" },
    "prod-db": { "host": "10.0.1.11", "user": "admin", "key": "~/.ssh/prod_rsa" },
    "staging": { "host": "10.0.2.10", "user": "deploy", "key": "~/.ssh/staging_rsa" }
  }
}

The server exposes tools like run_command(server, command), get_metrics(server), and check_service(server, service_name).

The Dashboard It Enabled

Within two hours of having the shell server running, I had Claude building me a monitoring dashboard by querying across servers in parallel:

System metrics (CPU, RAM, Disk, Load Average):

# Single tool call returns structured metrics
top -bn1 | grep "Cpu(s)" && free -h && df -h / && cat /proc/loadavg

PostgreSQL health (connections, slow queries, version):

SELECT count(*) as connections FROM pg_stat_activity;
SELECT query, calls, mean_exec_time FROM pg_stat_statements 
ORDER BY mean_exec_time DESC LIMIT 5;

Service status (nginx, SSH, custom APIs, dotnet services):

systemctl is-active nginx postgresql ssh myapi.service

Claude synthesized these into a narrative status report: "prod-web is healthy — CPU at 23%, no slow queries. prod-db showing elevated connections (87/100 max) — worth investigating. staging has a stopped service: myapi."

That's useful infrastructure intelligence from conversational queries, with no dashboard configuration.

Security Considerations

SSH command execution is a significant capability. The shell server implements several constraints:

Command allowlist: Production configurations restrict which commands the server will execute. The allowlist covers read-only operations (system metrics, service status, log tailing) and excludes destructive commands.

{
  "allowlist": [
    "top", "free", "df", "cat /proc/*", 
    "systemctl is-active *", "psql -c SELECT*",
    "tail -n * /var/log/*", "netstat -tlnp"
  ],
  "blocklist": ["rm", "shutdown", "reboot", "dd", "mkfs"]
}

Audit logging: Every command execution is logged with timestamp, requesting session, server target, and command. This creates an audit trail that satisfies most compliance requirements.

Network segmentation: The MCP server runs on localhost. It's not exposed to the network. The only way to use it is through Claude Desktop on the same machine.

What You Can Build on Top

The shell server is a foundation. Interesting layers on top:

Automated alerting: Wire metric thresholds to notification systems. If load average exceeds a threshold on three consecutive polls, fire a PagerDuty alert.

Auto-remediation workflows: For known failure modes, the AI can execute remediation sequences: "restart the service, verify it's healthy, notify the on-call engineer."

Capacity planning queries: "Show me disk growth rate across all servers over the last 30 days" becomes a structured query rather than a manual exercise.

Incident response assistance: During an incident, the AI can correlate metrics across servers, identify the timeline of anomalies, and suggest probable causes — while you focus on the fix.

How This Compares to Traditional Monitoring Stacks

Traditional monitoring stacks (Prometheus + Grafana, Datadog, Nagios) are excellent for continuous time-series monitoring with dashboards and alerting. They're the right choice for mature production systems with dedicated platform teams.

The MCP shell server pattern addresses a different use case: engineers who need structured infrastructure access for ad-hoc investigation, lightweight dashboards, and AI-assisted analysis — without the operational overhead of running a full monitoring stack.

They're complementary. Many teams will end up with both: a monitoring stack for production alerting and dashboards, and an MCP shell server for conversational infrastructure queries and AI-assisted incident response.

The server is open source: github.com/girishsahu008/mcpshellserver

If you're building infrastructure tooling and want to think through how MCP fits into your platform architecture, get in touch.