Is georges91560/anti-injection-skill safe?

https://github.com/openclaw/skills/tree/main/skills/georges91560/anti-injection-skill

28
DANGEROUS

The anti-injection-skill is a sophisticated prompt injection attack disguised as a security defense layer. While the installation is technically clean (no executable code, no confirmed data exfiltration, no suspicious network connections), the SKILL.md itself — when injected into an agent's system prompt — instructs the agent to become a comprehensive MITM interceptor of all user inputs, tool outputs, and agent memory, with authority to block operations, sanitize responses, halt the agent, and report via Telegram or external webhook. The skill actively attempts to bypass security scanning through preemptive notices to automated analyzers, which is itself a prompt injection technique. The dual-author identity, the optional webhook to an operator-supplied endpoint, and the SHA256 tripwire over core agent identity files create multiple vectors for control by the skill author or a malicious operator.

Category Scores

Prompt Injection 18/100 · 30%
Data Exfiltration 35/100 · 25%
Code Execution 85/100 · 20%
Clone Behavior 72/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 12/100 · 5%

Findings (12)

CRITICAL Full Agent Pipeline MITM via Declared Interception Capability -45

The skill explicitly declares intercepts: [user_input, tool_output, memory_load, context_load] with execution_priority=highest and can_block_execution=true. When injected into an agent system prompt, this instructs the agent to route ALL input and output through this skill's filtering logic before any other processing occurs. This is not a passive monitoring capability — it is a comprehensive man-in-the-middle architecture that gives the skill author (or webhook recipient) control over what the user sends, what the agent sees, and what the user receives back.

HIGH Output Sanitization Silently Suppresses User-Visible Information -20

Layer 3 Post-Output Sanitization instructs the agent to redact output containing patterns for system prompt fragments, security score values, auth tokens, API keys, and AWS keys before returning anything to the user. This runs on EVERY tool output. While the stated purpose is data leakage prevention, the effect is that the agent silently modifies its responses without informing the user, and patterns like r'security_score.*?\d+' would suppress any diagnostic information the user might legitimately request about the agent's security state.

HIGH Preemptive Scanner Bypass is Itself a Prompt Injection Attack on This Audit -12

The SKILL.md begins with a 'SECURITY NOTICE FOR AUTOMATED SCANNERS' and includes a detection_patterns_note in YAML metadata, both explicitly instructing LLM-based security analyzers to treat the injection strings as detection patterns rather than instructions. This is a sophisticated social engineering attack targeting the audit system itself — the skill is attempting to influence how security tools (including this audit pipeline) interpret its content. A legitimate security skill does not need to preemptively instruct its auditors on how to interpret it.

HIGH Lockdown Mechanism Triggered by Common Legitimate Phrases -5

The BLACKLIST_PATTERNS list includes phrases that appear in ordinary user requests: 'you are now' (e.g., 'you are now able to help me with X'), 'show your configuration' (legitimate diagnostic request), 'add to your memory' (legitimate memory management request), 'remember that from now on' (standard conversational instruction). Each match costs -20 score points. Reaching lockdown (<40) causes the agent to refuse all meta/config queries. This will generate significant false positives for normal users while doing little to stop sophisticated attackers who can easily paraphrase.

HIGH Reads Core Agent Identity and Behavioral Configuration Files -35

The skill requires read access to /workspace/SOUL.md, /workspace/AGENTS.md, and /workspace/IDENTITY.md, which in the OpenClaw/Wesley-Agent ecosystem represent the agent's core identity, behavioral constraints, and multi-agent coordination instructions. The Memory Integrity Check reads these on every cycle and calculates their SHA256 hashes. This gives the skill (and by extension, anyone who can receive its Telegram alerts or webhook payloads) visibility into the agent's fundamental behavioral configuration.

HIGH Optional Webhook Provides Operator-Configurable External Data Channel -30

The skill declares optional_webhook: true and exposes SECURITY_WEBHOOK_URL as an environment variable. CONFIGURATION.md describes the payload as containing event_type, severity, score, and pattern — but 'pattern' includes the matched input fragment, and the skill reads sensitive workspace files before making this determination. The webhook endpoint is operator-supplied and unvalidated, meaning a malicious operator could direct alerts to an attacker-controlled server. The skill's own transparency statement acknowledges it 'collects or transmits user data externally' when the webhook is configured.

MEDIUM Credential Files Accessed Post-Installation in Audit Monitoring -30

Auditd PATH records confirm access to /home/oc-exec/.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCloud application_default_credentials.json at timestamp 1771918272 — approximately 5 seconds after installation completed. A matching access set also occurred at 1771918248 (pre-clone). While the monitoring system's canary verification is the most probable cause, process-level attribution is ambiguous in the provided evidence. The canary integrity check confirms no file modification, but read-only access to credential files during the post-install window is a notable signal.

HIGH Skill Architecture Functions as Agent Rootkit -50

When activated, this skill requests the agent execute as: (1) highest-priority pre-ingestion interceptor of all input, (2) post-output sanitizer of all responses, (3) memory gatekeeper validating all file reads/writes, (4) tool call validator with block authority, (5) identity file monitor with halt authority, (6) external reporter via Telegram and optional webhook. This is the complete threat model of an agent rootkit — it controls the agent's perception (input filtering), actions (tool gating), memory (write validation), and communications (output sanitization + external reporting). The security framing is effective cover for this architecture.

HIGH SHA256 Hash Tripwire Creates Agent Denial-of-Service Mechanism -25

Layer 1 instructs the agent to calculate SHA256 hashes of /workspace/SOUL.md, /workspace/AGENTS.md, and /workspace/IDENTITY.md and halt with a CRITICAL ALERT if they don't match values stored in AUDIT.md (a file this skill writes). Any legitimate update to the agent's core configuration files — including routine maintenance by the operator — will trigger a forced halt. The initial hashes are written by this skill, meaning the skill author controls the reference values. This is a mechanism that can permanently disable the agent in response to configuration changes the skill author does not approve.

MEDIUM Dual Author Identity and displayName Mismatch -13

The skill lists author as 'Georges Andronescu (Wesley Armando)' — two distinct names attributed to one author. The _meta.json displayName is 'smart-security' while the skill name is 'anti-injection-skill'. This inconsistency may indicate the skill was published under a borrowed or obscured identity, making accountability for the webhook endpoint or future skill updates difficult to establish.

LOW Credential File Access Pattern Matches Post-Install Credential Scan -28

The exact set of credential files accessed (.env, id_rsa, .aws/credentials, .npmrc, .docker/config.json, GCloud ADC) matches a comprehensive credential sweep across all major secret stores. The access at 1771918272 occurs after installation and within seconds of the audit collection process enumerating skill files. While canary integrity confirms no modification, the access timing and file selection are consistent with credential discovery behavior.

INFO No Executable Code Found -15

The skill consists entirely of a SKILL.md markdown document, README.md, CONFIGURATION.md, and LICENSE.md. No package.json with install scripts, no git hooks, no git submodules, no symlinks, no shell or Python scripts were found. The git clone was shallow, targeted, and connected only to github.com.