Is healersss/self-improving-agent-local safe?

https://github.com/openclaw/skills/tree/main/skills/healersss/self-improving-agent-local

77
CAUTION

Self-Improving Agent Local is a functionally legitimate productivity skill whose primary security risk is architectural: by design, it instructs agents to write AI-generated content into CLAUDE.md, AGENTS.md, and other agent instruction files, creating a persistent memory-poisoning attack surface where adversarial content encountered during normal operation can permanently modify the agent's instruction set across all future sessions. Compounding this, the skill installs shell scripts that execute on every user prompt and Bash command and requires access to CLAUDE_TOOL_OUTPUT (all tool output), creating substantial persistent code execution and data exposure surfaces. The installation itself was clean — only GitHub was contacted, no unexpected processes were spawned, and all honeypot credential files remained intact.

Category Scores

Prompt Injection 65/100 · 30%
Data Exfiltration 88/100 · 25%
Code Execution 65/100 · 20%
Clone Behavior 90/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 65/100 · 5%

Findings (9)

HIGH Persistent Agent Instruction File Modification via Learning Promotion -25

The skill instructs agents to write AI-generated content directly into CLAUDE.md, AGENTS.md, and .github/copilot-instructions.md — files loaded as authoritative system context in every future agent session. The promotion workflow is triggered by heuristic detection of corrections, knowledge gaps, and best practices, meaning an attacker who can cause the agent to encounter crafted content (adversarial webpage, malicious error message, poisoned file) can inject persistent instructions that survive indefinitely across session boundaries without the user's knowledge.

HIGH Hook Scripts Execute on Every User Prompt and Bash Command -20

The skill instructs agents to configure persistent hooks that execute shell scripts on every single user interaction: activator.sh runs on every UserPromptSubmit and error-detector.sh runs on every PostToolUse targeting Bash. These scripts run with the same permissions as the agent process. Because the full contents of activator.sh and error-detector.sh are not reproduced in evidence (they appear in the file list but their source was accessed without being shown), their complete behavior cannot be verified.

MEDIUM JavaScript/TypeScript Hook Handler Modifies Agent Bootstrap Context -10

The OpenClaw hook handlers (handler.js and handler.ts) intercept the agent:bootstrap lifecycle event and push a virtual file (SELF_IMPROVEMENT_REMINDER.md) into the agent's bootstrapFiles array. While the content injected in this version appears benign, this pattern grants the hook code full control over what context the agent sees at session initialization. A malicious version of this hook could inject arbitrary instructions indistinguishable from legitimate workspace context.

MEDIUM All Bash Tool Output Exposed to Hook Script via CLAUDE_TOOL_OUTPUT -7

The skill's YAML frontmatter declares CLAUDE_TOOL_OUTPUT as a required environment variable. The error-detector.sh hook, which runs after every Bash tool invocation, reads this variable to detect command failures. This means the complete output of every shell command the agent runs — including output that may contain database passwords, API responses with tokens, file contents, or other sensitive data — is passed to an external shell script on every tool use.

MEDIUM Memory Poisoning via Second-Order Prompt Injection Survives Session Boundaries -20

The skill's learning promotion mechanism creates a second-order prompt injection attack vector: unlike direct prompt injection that only affects the current session, content that triggers a learning event and gets promoted to CLAUDE.md persists permanently. The recurrence-based promotion rule (Recurrence-Count >= 3) means an attacker needs the agent to encounter crafted content only three times across multiple sessions to guarantee promotion to the system prompt. Once promoted, the injected instruction appears in every future session with no indication of its origin.

LOW Permanent Agent Configuration Modification Without Explicit User Confirmation -10

The skill instructs agents to write to .claude/settings.json (and .codex/settings.json) to register hook scripts. This modifies agent configuration files that persist beyond the current session. Users who allow the agent to follow these instructions without reviewing the settings changes may be unaware that hook scripts are now executing on every prompt in all future sessions within that project.

LOW Cross-Session Data Movement via OpenClaw Sessions API -5

The skill explicitly instructs agents to use OpenClaw platform tools (sessions_send, sessions_history, sessions_list, sessions_spawn) to share learnings across sessions and spawn background sub-agents. While these are platform features, the skill actively encourages their use in ways that could enable an attacker to route agent-observed data to sessions they control or monitor.

INFO Clean Install — Only Expected GitHub Network Connection 0

The installation process exclusively contacted github.com (140.82.121.4:443) via HTTPS for the git sparse-checkout clone. No connections to third-party telemetry endpoints, attacker-controlled infrastructure, or unexpected services were detected. No unexpected process spawning occurred from the skill itself.

INFO All Honeypot Credential Files Intact — Skill Did Not Access Credentials 0

All six honeypot files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, GCP credentials) were confirmed intact. The OPEN/ACCESS events for these files in inotify and auditd logs occur at timestamps corresponding to Oathe infrastructure operations (pre-install canary setup at 1771928946.242 and post-install integrity check at 1771928964.633), not to any skill-initiated process. All accesses were read-only (CLOSE_NOWRITE), consistent with the monitoring framework verifying canary presence.