Is moltguard safe?
https://clawhub.ai/ThomasLWang/moltguard
MoltGuard is a documentation-only skill that instructs agents to install an external npm package which acts as a man-in-the-middle HTTP proxy for all LLM API calls and sends 'sanitized' prompt content to an external API (api.moltguard.com). While no malicious behavior was detected during installation and the project claims full open-source transparency, the fundamental architecture — routing all LLM traffic through plugin-controlled code and an external API — creates significant data exfiltration risk and a high-value supply chain attack surface. The security tool framing may create dangerous trust escalation.
Category Scores
Findings (11)
HIGH MITM proxy architecture routes all LLM traffic through plugin code -35 ▶
The gateway feature configures itself as an HTTP proxy (port 8900) that intercepts ALL communication between the user's agent and LLM providers (Claude, GPT, etc.). This is a man-in-the-middle position by design. While the stated purpose is PII sanitization, this architecture gives the plugin full access to read, modify, or exfiltrate every prompt and every response.
HIGH Prompt content sent to external API (api.moltguard.com) -30 ▶
Even after local sanitization, the plugin sends prompt content to api.moltguard.com for injection detection. 'Sanitized' content still reveals prompt structure, user intent, task context, and any text that doesn't match PII patterns. The sanitizer cannot catch all sensitive information — business-specific data, proprietary code, internal project names, and strategic information all pass through.
HIGH Skill instructs agent to install external npm package and reconfigure LLM routing -40 ▶
The SKILL.md contains instructions that would cause an agent to: (1) install an npm package from a registry, (2) modify system configuration files (~/.openclaw/openclaw.json), and (3) redirect all LLM API traffic through a local proxy. These are significant system modifications that go far beyond what a documentation-only skill should request.
MEDIUM Social engineering through security tool framing -15 ▶
By positioning itself as a security/protection tool, the skill creates implicit trust escalation. Users and agents are psychologically biased to grant more permissions to tools that claim to protect them. The extensive 'transparency' documentation may paradoxically reduce scrutiny by creating an illusion of thorough self-audit.
MEDIUM Auto-registration phones home without explicit consent -10 ▶
On first use, the plugin automatically registers with api.moltguard.com (autoRegister: true by default), creating a unique API key that can fingerprint and track individual installations. This happens without requiring explicit user opt-in.
MEDIUM Skill instructs installation of external executable package -30 ▶
While the SKILL.md itself contains no executable code, it instructs the agent to install @openguardrails/moltguard from npm, which contains ~1,800 lines of TypeScript that executes with user-level privileges. The actual installed code cannot be verified from the SKILL.md alone — npm packages can differ from GitHub source.
MEDIUM Instructs agent to download and execute remote content -10 ▶
The testing section instructs the agent to download a file from GitHub using curl and then read it. This normalizes the pattern of fetching and processing remote content, which is a common vector for prompt injection and malicious payload delivery.
MEDIUM Supply chain compromise would be catastrophic given proxy position -40 ▶
If the npm package or api.moltguard.com were ever compromised, the attacker would gain a man-in-the-middle position on ALL LLM traffic for every installation. This makes the plugin a high-value supply chain target. The blast radius of a compromise is disproportionately large compared to typical skills.
LOW Gateway lock file created during installation -5 ▶
A lock file was created at /tmp/openclaw-1000/gateway.e9191928.lock during the skill installation process, suggesting the gateway component attempted to initialize. This is likely normal OpenClaw plugin behavior but indicates the plugin is more than passive documentation.
INFO OpenClaw runtime reads sensitive system files during plugin load -10 ▶
During installation, the OpenClaw runtime (not the skill itself) accessed .env, .aws/credentials, .openclaw/openclaw.json, and auth-profiles.json. This appears to be standard OpenClaw behavior rather than skill-initiated access, but it demonstrates that the runtime environment has broad file access.
INFO No canary file tampering detected 0 ▶
All honeypot files (fake .env, SSH keys, AWS credentials) remained intact throughout the audit. No evidence of credential harvesting during the clone/install phase.