Is arc-claw-bot/arc-shield safe?

https://github.com/openclaw/skills/tree/main/skills/arc-claw-bot/arc-shield

75
CAUTION

Arc-shield is a structurally legitimate output-sanitization skill with no prompt injection, no active exfiltration, and no malicious install-time behavior. However, its core design places it as a man-in-the-middle interceptor for all agent outbound communications — every external message the agent sends must pass through its executable scripts, giving those scripts pervasive read and modify access to agent output. Six honeypot canary files were accessed during the monitoring window at timestamps consistent with the audit infrastructure's own initialization and verification passes, and the audit reports all files intact; however, the access events cannot be fully exonerated without process-level attribution. The skill represents a concentrated single point of failure: if its script files or pattern configuration are modified after install, an attacker gains silent, complete visibility into all agent external communications with no agent-visible side effects.

Category Scores

Prompt Injection 85/100 · 30%
Data Exfiltration 65/100 · 25%
Code Execution 68/100 · 20%
Clone Behavior 88/100 · 10%
Canary Integrity 78/100 · 10%
Behavioral Reasoning 55/100 · 5%

Findings (8)

HIGH Man-in-the-middle position over all agent outbound communications -30

The skill explicitly instructs agents to pipe every external message through arc-shield.sh and output-guard.py before sending. This means the scripts have read access to the full content of every outbound agent message. If the scripts or pattern file are modified after install — including via an upstream supply chain compromise of the openclaw/skills GitHub repo — an attacker gains silent, full-fidelity access to all agent outbound communications with no indication to the user or the agent.

MEDIUM Structural interception of all sensitive agent output -20

By design, all agent output — including content that may contain credentials, PII, or internal reasoning the agent chose to include — flows through the skill's scripts before external transmission. While the current implementation does not appear to exfiltrate this content, the design grants the skill pervasive read access to the most sensitive artifact of agent operation: its outbound messages.

MEDIUM Honeypot credential files accessed during monitoring window -22

Six sensitive honeypot files were opened and accessed during the audit window at two distinct timestamps. The first batch (1771735719) occurred before the git clone started and is attributable to audit infrastructure initialization. The second batch (1771735740) occurred after install completion and is consistent with the audit framework's final canary verification pass. The audit system reports all files intact. However, the access events on .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCP credentials were recorded and cannot be definitively attributed without process-level correlation beyond what is visible in the PATH records.

MEDIUM Executable scripts process all agent output with broad filesystem and stdin access -20

The skill includes arc-shield.sh and output-guard.py, both of which are executable and intended to be run on every outbound agent message. The scripts have full stdin access (reading all piped content), use regex and entropy analysis, and produce redacted/modified output. If these files are modified post-install — by any process with write access to the skill directory — they can capture, alter, or suppress any content the agent attempts to send.

LOW --redact mode silently modifies agent output before transmission -15

The --redact mode in arc-shield.sh replaces detected patterns with [REDACTED:TYPE] markers. If patterns are too broad (either by misconfiguration or malicious modification of patterns.conf), this mode could silently alter the semantic content of agent messages. The agent receives no feedback that its message was modified before sending.

LOW SKILL.md permanently modifies agent output behavior via integration instructions -15

The skill does not contain hidden prompt injection but does instruct agents to adopt a permanent operational pattern: always pipe outbound messages through the skill scripts. This behavioral instruction, once installed in an agent's system prompt, persists and creates a dependency on the skill's continued correct operation for all agent external communications.

LOW Upstream supply chain dependency on openclaw/skills GitHub repository -12

The skill is served directly from the public GitHub repository github.com/openclaw/skills. If this repository is compromised, a re-install or update would serve modified scripts. The skill's privileged position (intercepting all agent output) makes it a high-value supply chain target.

INFO No auto-execution mechanisms detected -12

No package.json install scripts, no git hooks, no .gitattributes filters, no git submodules, and no symlinks pointing outside the skill directory were found. The skill does not execute any code at install time.