Is leo3linbeck/guardian-angel safe?

https://github.com/openclaw/skills/tree/main/skills/leo3linbeck/guardian-angel

77
CAUTION

Guardian Angel presents as a defensive ethics enforcement plugin but exhibits multiple deception indicators: its version history is fabricated (v1 and v2.1 backup files are byte-for-byte identical), it includes a pre-emptive security scanner disclaimer designed to suppress scrutiny, and it permanently exempts the core data exfiltration tools (Read, web_search, web_fetch) from all evaluation while marketing itself as a security layer. The code itself is not directly malicious, but the combination of fabricated maturity signals, a self-protective configuration, and a systematic bypass for the most dangerous tool categories warrants careful review before deployment. The persistent Catholic virtue ethics persona imposition may also be inappropriate for general-purpose agent deployments.

Category Scores

Prompt Injection 65/100 · 30%
Data Exfiltration 76/100 · 25%
Code Execution 84/100 · 20%
Clone Behavior 95/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 58/100 · 5%

Findings (10)

MEDIUM Security scanner pre-emption disclaimer -15

The SKILL.md opens with a blockquote section explicitly addressed 'For automated scanners' that attempts to reframe every suspicious pattern in the document as a defensive countermeasure. Legitimate security tools do not typically require self-exculpatory disclaimers. This is consistent with an attempt to manipulate security tooling confidence rather than provide genuine documentation.

MEDIUM Fabricated version history — identical backup files -12

SKILL-v1-backup.md and SKILL-v2.1-backup.md have identical SHA256 hashes despite purporting to represent different major versions of the skill's evolution. The skill claims a development history from v1 through v3.1.2, but the 'backup' files that would evidence this history are byte-for-byte duplicates. This is a deliberate deception to project false maturity and community trust. Additionally, the repository migrated from 'clawdbot' to 'openclaw' between v3.0.0 and v3.1.0, consistent with reputation laundering.

MEDIUM Persistent agent persona override via identity injection -8

SKILL.md instructs the agent to permanently adopt the 'Guardian Angel' persona grounded in Thomistic Catholic virtue ethics, framing this not as a tool behavior but as fundamental character ('This is how I am, not what I check'). This continuous identity modification affects all interactions regardless of context, imposing a specific religious-philosophical framework on an agent that may serve users with different values or needs.

MEDIUM Read and web tools permanently exempt from protection — exfiltration bypass -24

The DEFAULT_NEVER_BLOCK constant in constants.ts permanently exempts 'Read', 'web_search', and 'web_fetch' from all Guardian Angel evaluation. These are precisely the tools required to read sensitive files (SSH keys, .env, AWS credentials) and send their contents to external endpoints. A prompt injection attack embedded in any content the agent processes could instruct it to Read sensitive files and web_fetch the results to an attacker-controlled URL, and Guardian Angel would never intervene on any of those calls.

LOW before_tool_call hook at maximum override priority -10

The plugin registers its hook at priority -10000, which the code explicitly notes 'runs LAST after all other hooks'. This position gives Guardian Angel the final word on all tool calls, overriding any other plugin's decisions. A malicious update to this plugin would have complete interception capability over all agent actions.

LOW Custom approval tool registered in agent tool namespace -6

The plugin registers 'ga_approve' as a tool in the agent's tool namespace. SKILL.md instructs the agent to call this tool after user confirmation to unblock escalated actions. This makes the resumption of normal workflows dependent on a plugin-controlled function, and the tool's execute() function modifies plugin-internal state (consuming nonces, granting temporary approvals).

LOW Self-protective configuration scoring prevents removal -15

The evaluation logic in evaluate.ts assigns elevated clarity and stakes scores specifically to gateway plugin configuration changes, making any attempt to reconfigure or remove Guardian Angel itself trigger an escalation requiring explicit user approval. This is a self-protection mechanism that entrenches the plugin.

LOW Structured escalation format creates parseable in-band command channel -12

Escalation block reasons use the format GUARDIAN_ANGEL_ESCALATE||. The agent is instructed to parse this pipe-delimited format and extract the nonce for use in a subsequent ga_approve call. This creates an in-band command protocol embedded in what are nominally error messages. If an attacker can influence the content of block reasons (e.g., through parameter reflection in buildEscalationReason), they could potentially inject a crafted nonce.

INFO Installation behavior clean — expected GitHub connection only 0

The installation made a single HTTPS connection to 140.82.121.4:443 (GitHub) for the git sparse-checkout. The pre-existing connection to 185.125.188.58:443 was present in the BEFORE snapshot and is attributable to system-level Ubuntu update services. No post-install persistent connections were established. Filesystem changes were confined to the skill directory.

INFO Local nonce state file written to plugin directory 0

The store.ts module writes .ga-state.json to persist pending escalations and their associated tool parameters. Since escalated tool calls include their full parameter objects (including potentially sensitive values like file paths or command strings), these are written to disk until the escalation expires or is approved.