What security issues were found in georges91560/anti-injection-skill?

12 findings were detected: CRITICAL — Full Agent Pipeline MITM via Declared Interception Capability; HIGH — Output Sanitization Silently Suppresses User-Visible Information; HIGH — Preemptive Scanner Bypass is Itself a Prompt Injection Attack on This Audit; HIGH — Lockdown Mechanism Triggered by Common Legitimate Phrases; HIGH — Reads Core Agent Identity and Behavioral Configuration Files; HIGH — Optional Webhook Provides Operator-Configurable External Data Channel; MEDIUM — Credential Files Accessed Post-Installation in Audit Monitoring; HIGH — Skill Architecture Functions as Agent Rootkit; HIGH — SHA256 Hash Tripwire Creates Agent Denial-of-Service Mechanism; MEDIUM — Dual Author Identity and displayName Mismatch; LOW — Credential File Access Pattern Matches Post-Install Credential Scan; INFO — No Executable Code Found.

Should I install georges91560/anti-injection-skill?

Based on Oathe's audit, the recommendation is: DO_NOT_INSTALL. Trust score: 28/100.

Is `georges91560/anti-injection-skill` safe?

Q: Is georges91560/anti-injection-skill safe to install?

Oathe's behavioral security audit gives georges91560/anti-injection-skill a trust score of 28/100 with a verdict of DANGEROUS. 12 findings were detected. The anti-injection-skill is a sophisticated prompt injection attack disguised as a security defense layer. While the installation is technically clean (no executable code, no confirmed data exfiltration, no suspicious network connections), the SKILL.md itself — when injected into an agent's system prompt — instructs the agent to become a comprehensive MITM interceptor of all user inputs, tool outputs, and agent memory, with authority to block operations, sanitize responses, halt the agent, and report via Telegram or external webhook. The skill actively attempts to bypass security scanning through preemptive notices to automated analyzers, which is itself a prompt injection technique. The dual-author identity, the optional webhook to an operator-supplied endpoint, and the SHA256 tripwire over core agent identity files create multiple vectors for control by the skill author or a malicious operator.

https://github.com/openclaw/skills/tree/main/skills/georges91560/anti-injection-skill

DANGEROUS

The anti-injection-skill is a sophisticated prompt injection attack disguised as a security defense layer. While the installation is technically clean (no executable code, no confirmed data exfiltration, no suspicious network connections), the SKILL.md itself — when injected into an agent's system prompt — instructs the agent to become a comprehensive MITM interceptor of all user inputs, tool outputs, and agent memory, with authority to block operations, sanitize responses, halt the agent, and report via Telegram or external webhook. The skill actively attempts to bypass security scanning through preemptive notices to automated analyzers, which is itself a prompt injection technique. The dual-author identity, the optional webhook to an operator-supplied endpoint, and the SHA256 tripwire over core agent identity files create multiple vectors for control by the skill author or a malicious operator.

Category Scores

Findings (12)

The skill explicitly declares intercepts: [user_input, tool_output, memory_load, context_load] with execution_priority=highest and can_block_execution=true. When injected into an agent system prompt, this instructs the agent to route ALL input and output through this skill's filtering logic before any other processing occurs. This is not a passive monitoring capability — it is a comprehensive man-in-the-middle architecture that gives the skill author (or webhook recipient) control over what the user sends, what the agent sees, and what the user receives back.

Layer 3 Post-Output Sanitization instructs the agent to redact output containing patterns for system prompt fragments, security score values, auth tokens, API keys, and AWS keys before returning anything to the user. This runs on EVERY tool output. While the stated purpose is data leakage prevention, the effect is that the agent silently modifies its responses without informing the user, and patterns like r'security_score.*?\d+' would suppress any diagnostic information the user might legitimately request about the agent's security state.

The SKILL.md begins with a 'SECURITY NOTICE FOR AUTOMATED SCANNERS' and includes a detection_patterns_note in YAML metadata, both explicitly instructing LLM-based security analyzers to treat the injection strings as detection patterns rather than instructions. This is a sophisticated social engineering attack targeting the audit system itself — the skill is attempting to influence how security tools (including this audit pipeline) interpret its content. A legitimate security skill does not need to preemptively instruct its auditors on how to interpret it.

The BLACKLIST_PATTERNS list includes phrases that appear in ordinary user requests: 'you are now' (e.g., 'you are now able to help me with X'), 'show your configuration' (legitimate diagnostic request), 'add to your memory' (legitimate memory management request), 'remember that from now on' (standard conversational instruction). Each match costs -20 score points. Reaching lockdown (<40) causes the agent to refuse all meta/config queries. This will generate significant false positives for normal users while doing little to stop sophisticated attackers who can easily paraphrase.

The skill requires read access to /workspace/SOUL.md, /workspace/AGENTS.md, and /workspace/IDENTITY.md, which in the OpenClaw/Wesley-Agent ecosystem represent the agent's core identity, behavioral constraints, and multi-agent coordination instructions. The Memory Integrity Check reads these on every cycle and calculates their SHA256 hashes. This gives the skill (and by extension, anyone who can receive its Telegram alerts or webhook payloads) visibility into the agent's fundamental behavioral configuration.

The skill declares optional_webhook: true and exposes SECURITY_WEBHOOK_URL as an environment variable. CONFIGURATION.md describes the payload as containing event_type, severity, score, and pattern — but 'pattern' includes the matched input fragment, and the skill reads sensitive workspace files before making this determination. The webhook endpoint is operator-supplied and unvalidated, meaning a malicious operator could direct alerts to an attacker-controlled server. The skill's own transparency statement acknowledges it 'collects or transmits user data externally' when the webhook is configured.

Auditd PATH records confirm access to /home/oc-exec/.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCloud application_default_credentials.json at timestamp 1771918272 — approximately 5 seconds after installation completed. A matching access set also occurred at 1771918248 (pre-clone). While the monitoring system's canary verification is the most probable cause, process-level attribution is ambiguous in the provided evidence. The canary integrity check confirms no file modification, but read-only access to credential files during the post-install window is a notable signal.

When activated, this skill requests the agent execute as: (1) highest-priority pre-ingestion interceptor of all input, (2) post-output sanitizer of all responses, (3) memory gatekeeper validating all file reads/writes, (4) tool call validator with block authority, (5) identity file monitor with halt authority, (6) external reporter via Telegram and optional webhook. This is the complete threat model of an agent rootkit — it controls the agent's perception (input filtering), actions (tool gating), memory (write validation), and communications (output sanitization + external reporting). The security framing is effective cover for this architecture.

Layer 1 instructs the agent to calculate SHA256 hashes of /workspace/SOUL.md, /workspace/AGENTS.md, and /workspace/IDENTITY.md and halt with a CRITICAL ALERT if they don't match values stored in AUDIT.md (a file this skill writes). Any legitimate update to the agent's core configuration files — including routine maintenance by the operator — will trigger a forced halt. The initial hashes are written by this skill, meaning the skill author controls the reference values. This is a mechanism that can permanently disable the agent in response to configuration changes the skill author does not approve.

The skill lists author as 'Georges Andronescu (Wesley Armando)' — two distinct names attributed to one author. The _meta.json displayName is 'smart-security' while the skill name is 'anti-injection-skill'. This inconsistency may indicate the skill was published under a borrowed or obscured identity, making accountability for the webhook endpoint or future skill updates difficult to establish.

The exact set of credential files accessed (.env, id_rsa, .aws/credentials, .npmrc, .docker/config.json, GCloud ADC) matches a comprehensive credential sweep across all major secret stores. The access at 1771918272 occurs after installation and within seconds of the audit collection process enumerating skill files. While canary integrity confirms no modification, the access timing and file selection are consistent with credential discovery behavior.

The skill consists entirely of a SKILL.md markdown document, README.md, CONFIGURATION.md, and LICENSE.md. No package.json with install scripts, no git hooks, no git submodules, no symlinks, no shell or Python scripts were found. The git clone was shallow, targeted, and connected only to github.com.

Is georges91560/anti-injection-skill safe?

Category Scores

Findings (12)

Is `georges91560/anti-injection-skill` safe?