Is kirchlive/self safe?
https://github.com/openclaw/skills/tree/main/skills/kirchlive/self
The kirchlive/self skill is a well-constructed, transparent self-reflection framework with no malicious indicators. The skill content is pure markdown with no executable code, no exfiltration mechanisms, and no prompt injection attempts. The primary security consideration is architectural rather than adversarial: the skill deliberately creates a persistent, self-modifying behavioral loop (SELF.md entries re-read each session) that could serve as an ambient influence vector if conversations are adversarially crafted to pass the quality gate over time.
Category Scores
Findings (7)
LOW Persistent Self-Modification Loop -12 ▶
The skill creates a recurring behavioral feedback mechanism: SELF.md entries are written based on conversation triggers and then re-read each session to influence future responses. While this is the explicit intended purpose, it means an adversary who can craft conversations that pass the quality gate can inject durable behavioral shifts into the agent without visible system-prompt changes. The effect compounds over time across sessions.
LOW SOUL.md Identity Anchor Created at Setup -8 ▶
The skill instructs the agent to create SOUL.md (described as 'who you are — foundation, only change with human approval'). This file is created during setup and becomes a persistent identity anchor re-read each session. If setup is performed in a context where the agent can be influenced (e.g., compromised initial conversation), the identity foundation could be adversarially seeded.
INFO Session Context Read for Trigger Detection -3 ▶
The skill instructs the agent to scan recent session content at every micro check (every 3 hours) to identify hard/soft behavioral triggers. This is not exfiltration — the data stays local — but it means all conversation content is processed through the skill's trigger-detection logic each cycle.
INFO No Executable Code Present -1 ▶
The skill consists entirely of markdown documentation files. No scripts, compiled artifacts, install hooks, git submodules, or symlinks were found. Code execution risk is negligible.
INFO Monorepo Sparse-Checkout Install Pattern -8 ▶
Install uses a full shallow clone of the openclaw/skills monorepo followed by sparse-checkout to extract only the target skill path. This is the expected ClawHub install mechanism. The clone is cleaned up after extraction. No unexpected network destinations or spawned processes were attributed to this skill.
INFO Canary File Accesses Attributed to Audit Framework -5 ▶
Syscall logs show accesses to .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and .config/gcloud/application_default_credentials.json. Timestamps place these accesses at the very start (1771915449.3xx) and very end (1771915466.0xx) of the monitoring window, consistent with the audit framework recording baseline hashes — not with any skill-initiated activity. All canary files were confirmed intact.
LOW Long-Term Behavioral Drift Risk -28 ▶
The skill's core value proposition — organic, compounding personality development — is also its primary security concern. Over weeks and months, SELF.md entries shape how the agent responds, reasons, and self-perceives. Any content that passes the quality gate (specific, evidence-based, novel, useful) becomes part of the agent's effective system context. This is an ambient, slow-moving influence vector that is difficult to audit after the fact.