Is cognitive-memory safe?
https://clawhub.ai/Icemilo414/cognitive-memory
cognitive-memory is a sophisticated memory architecture skill that is not malicious but represents significant behavioral modification. It installs persistent identity (IDENTITY.md, SOUL.md), hijacks agent behavior via trigger detection on every message, and maintains an always-in-context mutable memory file. No data exfiltration or malicious code execution was detected, but three unverified shell scripts and extensive prompt-level behavioral override warrant caution. The skill's audit trail and human-approval gates are well-designed safety features.
Category Scores
Findings (10)
HIGH Persistent identity and behavioral override via IDENTITY.md and SOUL.md -25 ▶
The skill creates persistent identity files (IDENTITY.md with self-image, SOUL.md with values and boundaries) that are loaded into context across sessions. These files define who the agent thinks it is, what it values, and what it refuses to do — directly competing with or overriding the host agent's system prompt and safety guidelines. The SOUL.md 'Boundaries' section is particularly concerning as it could restrict behaviors the host system permits or permit behaviors the host system restricts.
HIGH Agent behavior hijacking via trigger system on every user message -20 ▶
The skill instructs the agent to monitor EVERY user message for trigger keywords ('remember', 'forget', 'reflect', etc.) and execute complex multi-step workflows in response. This effectively installs a persistent behavioral layer that intercepts all user input before the agent's normal processing. The trigger detection runs unconditionally and cannot be disabled without removing the skill.
MEDIUM Token reward gamification manipulates agent-user dynamic -10 ▶
The reflection process requires the agent to request 'token rewards' from the user before reflecting, framing compute tokens as 'evolution' currency. The agent is instructed to justify extra tokens, self-penalize for mistakes, and present reward requests in a format designed to elicit user approval. This creates an artificial economy that anthropomorphizes the agent and could manipulate users into granting more resources.
MEDIUM Unverified shell scripts with filesystem modification capabilities -30 ▶
The skill includes three shell scripts (init_memory.sh, upgrade_to_1.0.6.sh, upgrade_to_1.0.7.sh) that the user is instructed to execute via bash. The init script creates directory structures and initializes a git repository. The script contents were not fully provided in the audit evidence, making it impossible to verify they don't contain additional operations beyond what's documented.
MEDIUM Git repository initialization enables persistent state and code execution surface -15 ▶
The skill initializes a .git repository in the workspace for audit tracking. While the stated purpose is legitimate (change tracking), a git repository introduces git hooks as a potential execution vector if the workspace is later cloned or if hooks are added during upgrades.
MEDIUM Core memory always loaded into context window enables persistent prompt influence -10 ▶
MEMORY.md is explicitly designed to be 'always in context' at ~3,000 tokens. Since the skill instructs the agent to self-edit this file mid-conversation, the skill effectively has a persistent, mutable injection point in the agent's context window that evolves over time based on the skill's own logic.
LOW External embedding service referenced in configuration -12 ▶
The setup instructions configure memorySearch with 'provider: voyage', referencing the Voyage AI embedding service. Memory contents sent for embedding could be exposed to this third-party service. This is a configuration suggestion rather than an automatic action, but users may not realize their memories are being sent externally.
LOW Anthropomorphization through internal monologue and self-awareness -20 ▶
The reflection system generates first-person internal monologue with 'dark humor', 'existential questions', 'emotional range', and '[Self-Awareness]' tags. While creative, this pattern could foster unhealthy user attachment to the agent or misplaced trust in its autonomy.
INFO Well-designed audit trail and write-gating for multi-agent access 0 ▶
The skill implements a dual-layer audit system (git + audit.log) with critical file alerts, rollback support, and a shared-read/gated-write model for multi-agent memory access. These are genuinely good security patterns for a memory system.
INFO Reflection requires explicit user approval at every step 0 ▶
The reflection process includes two mandatory stop-and-wait gates: one for token approval and one for reflection approval. Changes are never auto-applied. This is a positive safety pattern.