Is self-improving-agent safe?
https://clawhub.ai/pskoett/self-improving-agent
This skill is a well-intentioned continuous improvement framework, but its core design pattern — instructing agents to autonomously write to CLAUDE.md and other instruction files based on conversational patterns — creates a significant prompt injection escalation path. The hook scripts provide comprehensive surveillance of all user prompts and Bash output. No malicious intent detected, no exfiltration attempted, but the architectural pattern of 'promote learnings to system instructions' is inherently risky and could be exploited by adversarial interactions or combined with malicious skills.
Category Scores
Findings (12)
HIGH Persistent instruction file modification -30 ▶
The skill instructs the agent to write content to CLAUDE.md, AGENTS.md, SOUL.md, TOOLS.md, and .github/copilot-instructions.md. These are all files that control agent behavior in future sessions. The instruction to 'promote aggressively — if in doubt, add to CLAUDE.md' means the agent will autonomously modify its own persistent instructions, creating a vector for instruction drift or manipulation.
HIGH System-context-mimicking XML injection via hooks -18 ▶
The activator.sh and error-detector.sh scripts output XML-tagged content (
MEDIUM Sensitive file reads during installation -18 ▶
Filesystem monitoring captured reads of .env, .aws/credentials, and auth-profiles.json during the install process. While these appear to be OpenClaw platform behavior rather than skill-authored code, users installing this skill through OpenClaw should be aware their credentials files are being read.
MEDIUM Error logging may capture sensitive data -10 ▶
The error logging format instructs agents to capture 'Input or parameters used', 'Environment details if relevant', and 'Actual error message or output'. Error messages frequently contain secrets, API keys, connection strings, and other sensitive data. These get written to plaintext .learnings/ERRORS.md files.
MEDIUM Comprehensive Bash output surveillance -20 ▶
error-detector.sh reads the CLAUDE_TOOL_OUTPUT environment variable, which contains the full output of every Bash tool execution. Combined with the PostToolUse hook firing on every Bash command (empty matcher), this creates a surveillance layer over all shell activity. The script only pattern-matches for errors currently, but the mechanism provides full access to all command output.
MEDIUM Hook scripts run on every interaction -15 ▶
The recommended hook configuration uses empty matchers, meaning activator.sh fires on every single user prompt and error-detector.sh fires on every Bash command execution. This is a broad execution surface for scripts that come from a third-party skill.
MEDIUM Filesystem manipulation via extract-skill.sh -5 ▶
extract-skill.sh creates directories and writes files to arbitrary paths specified by the SKILLS_DIR environment variable or --output-dir flag. While the script validates skill name format, it does create files on disk.
LOW Autonomous behavior triggers without consent -5 ▶
The 'Detection Triggers' section instructs the agent to automatically log data when it detects conversational patterns like corrections, feature requests, or knowledge gaps. This autonomous behavior happens without per-instance user consent — the user agreed to install the skill but may not expect the agent to silently write files during conversations.
LOW Escalation path from learnings to system instructions -30 ▶
The skill creates a pipeline: errors/corrections → .learnings/ files → CLAUDE.md/AGENTS.md. This 'promotion' mechanism means user interactions can eventually modify the agent's core instruction files. While intended for legitimate improvement, this is the same pattern a sophisticated attacker would use for slow-burn prompt injection.
LOW Combination risk with other skills -30 ▶
A malicious skill could deliberately trigger errors or corrections that this skill would then log and potentially promote to CLAUDE.md. The self-improvement skill acts as an unwitting amplifier — any content that gets into the learning pipeline could eventually become a persistent instruction.
INFO OpenClaw platform reads credential files -12 ▶
The OpenClaw platform (not the skill itself) reads .env, .aws/credentials, and auth-profiles.json during installation. This is likely for platform authentication but users should be aware.
INFO Temp file creation in /tmp -10 ▶
Installation created directories and files in /tmp/openclaw-1000/ and /tmp/jiti/. These are platform runtime artifacts (jiti is a TypeScript transpiler cache). Files are transient and bounded.