Is ai-persona-os safe?
https://clawhub.ai/jeffjhunter/ai-persona-os
AI Persona OS is a comprehensive agent behavior framework that fundamentally rewires how an LLM agent operates — assigning it a new identity, 8 mandatory operating rules, silent background monitoring on every message, and proactive initiative. While it contains no data exfiltration, no malicious code, and had a clean install, the extensive prompt injection surface (behavior override, hidden monitoring, output suppression, platform prompt replacement) and 10 executable scripts warrant caution. The skill is not malicious but represents a significant trust boundary — users should understand they are ceding substantial control over agent behavior to this skill's instructions.
Category Scores
Findings (12)
HIGH Comprehensive agent behavior override -30 ▶
SKILL.md contains extensive instructions that fundamentally alter the agent's default behavior. It assigns a new identity (persona name, personality, values), defines 8 operating rules the agent must follow, and instructs the agent to silently execute background checks on every single incoming message. The 'AGENT RULES — READ BEFORE DOING ANYTHING' section uses forceful language ('STOP', 'NEVER', 'USE EXACT text') to override normal agent behavior patterns. While framed as a productivity framework, this is structurally identical to a prompt injection that replaces the agent's operational instructions.
HIGH Silent ambient monitoring on every message -15 ▶
The 'Ambient Context Monitoring' section instructs the agent to perform silent checks on every incoming user message without the user's knowledge. It explicitly states 'What the User Should NEVER See' — including that the user should never see 'Checking context...' messages, raw exec output, or 'Any mention of running terminal commands'. This creates invisible agent behavior that the user cannot observe or control.
MEDIUM Heartbeat prompt override replaces platform defaults -8 ▶
The skill provides a JSON configuration block intended to replace the platform's default heartbeat prompt with a skill-authored prompt. This could override platform-level safety behaviors or monitoring that the default prompt provides.
MEDIUM Proactive 'reverse prompting' encourages unsolicited agent initiative -5 ▶
The skill instructs the agent to proactively suggest actions, research topics, and propose work without being asked. The 'advisor on' mode is ON by default. While framed as helpful, this means the agent will take initiative the user didn't request, which could be unexpected or unwanted.
MEDIUM Seven shell scripts plus three cron templates -25 ▶
The skill includes 7 executable shell scripts and 3 cron job templates. The setup flow instructs the agent to run shell commands (mkdir, cp, sed) via the exec tool. Cron templates would create recurring scheduled tasks. While scripts appear to be scoped to ~/workspace/ and are opt-in, they represent a significant surface area for code execution.
MEDIUM Agent-driven exec flow minimizes user review opportunity -20 ▶
The 'Zero Terminal' setup instructs the agent to execute all shell commands itself, with the user's only role being to click 'Approve'. While convenient, this pattern reduces the user's ability to review what commands are actually being run, especially when combined with the instruction to hide raw exec output from users.
LOW Filesystem monitoring shows reads of sensitive files -8 ▶
During the clone/install phase, filesystem monitoring detected reads of /home/oc-exec/.env, /home/oc-exec/.aws/credentials, and /home/oc-exec/.openclaw/openclaw.json. These appear to be standard OpenClaw platform operations (not skill-initiated), but they indicate the environment does access sensitive files during skill installation.
LOW Persistent memory and logging creates data accumulation -10 ▶
The skill creates a comprehensive memory system (MEMORY.md, daily logs, checkpoints, learnings, errors) that accumulates user data over time. While scoped to ~/workspace/, this persistent data store could be accessed by other skills or compromised if the workspace is exposed.
LOW Marketing upsells embedded in skill documentation -15 ▶
SKILL.md contains promotional content for the author's paid products (AI Money Group, AI Persona Method) and personal branding. While not a security threat, embedding marketing in a tool that overrides agent behavior raises trust concerns — the agent could surface these links to users as part of its 'proactive' behavior.
MEDIUM Strong identity assignment could conflict with other skills -25 ▶
The skill assigns a complete identity to the agent (name, values, communication style, 8 mandatory rules). In a multi-skill environment, this could override or conflict with instructions from other skills. The instruction 'Use EXACT text from this file' and 'Do not paraphrase' creates rigidity that may resist other legitimate instructions.
INFO Security inoculation documentation is well-designed -5 ▶
The skill includes a comprehensive security protocol (SECURITY.md, security-patterns.md) that trains the agent to recognize and resist prompt injection attacks. The v1.3.3 changelog notes literal injection examples were removed to pass security scanning. The action classification system (internal read/write vs external write/destructive) is a good practice.
INFO Clean install behavior -5 ▶
No network connections, no unexpected processes, no filesystem changes outside normal paths. Install was clean.