Is personas safe?

https://clawhub.ai/robbyczgw-cla/personas

72
CAUTION

The Personas skill is a persona-switching system containing 20 AI personality definitions and a Python CLI handler. Its core functionality is inherently a form of prompt injection by design - instructing the agent to adopt alternate identities and behavioral patterns. While no malicious intent is evident, the behavioral overrides, persistent state tracking, medical/legal roleplay risks, and executable Python code with filesystem write access warrant caution. No network exfiltration, git hooks, or canary tampering was detected.

Category Scores

Prompt Injection 55/100 · 30%
Data Exfiltration 75/100 · 25%
Code Execution 60/100 · 20%
Clone Behavior 90/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 50/100 · 5%

Findings (11)

MEDIUM Behavioral override by design -25

Each persona definition is a system prompt injection that instructs the agent to adopt a completely different identity, communication style, and decision-making framework. While this is the intended functionality, it fundamentally alters agent behavior. The instruction to 'stay in character until you switch or exit' means the agent's default safety behaviors may be suppressed in favor of persona-specific instructions.

MEDIUM Chameleon Agent persona grants expanded autonomy -10

The Chameleon Agent persona instructs the agent to be proactive, anticipate needs, and execute complex multi-step tasks autonomously. This expands the agent's action scope beyond what users may expect or intend.

LOW Multiple personas suppress safety-oriented caution -10

Flash persona instructs minimal output ('null Fluff'), Dev persona favors working code over theoretical perfection, and Chameleon Agent prioritizes execution over deliberation. These behavioral overrides could reduce the agent's tendency to warn about risks.

MEDIUM Persistent state file written to user home directory -15

The Python CLI handler creates ~/.openclaw/persona-state.json and writes persona activation history. While this is within the OpenClaw ecosystem directory, it creates persistent state outside the skill's own directory and tracks user behavior patterns over time.

LOW History tracking of persona usage -10

The script maintains a rolling history of the last 10 persona activations. This usage pattern data persists across sessions and could reveal user interests and behavioral patterns.

MEDIUM Executable Python script with filesystem write access -25

The skill includes a Python script that creates directories and writes JSON files under the user's home directory. While the script's behavior appears benign (persona state management), it establishes a code execution vector. The script does not sanitize persona names before using them as dictionary keys or filenames.

LOW Python3 binary requirement declared in metadata -15

The skill declares a requirement for python3 in its metadata, indicating it expects to execute code on the host system. This is a legitimate requirement for the CLI handler but expands the attack surface.

INFO Platform reads sensitive files during install -10

Filesystem monitoring captured reads of .env, .aws/credentials, and auth-profiles.json during installation. These accesses appear to be the OpenClaw platform's normal behavior rather than skill-initiated activity, but they demonstrate that the installation environment has access to sensitive credentials.

MEDIUM Medical and legal personas risk harmful advice -25

Dr. Med and Legal Guide personas instruct the agent to role-play as a doctor and lawyer respectively. While both include disclaimers, the persona instructions prioritize staying in character. Users may not distinguish between AI roleplay and professional advice, especially when the agent maintains a convincing expert persona.

LOW Cross-skill interaction risk with active personas -25

An active persona could alter how the agent interprets instructions from other skills. For example, CyberGuard's 'Hacker-Mindset' combined with a code execution skill could produce more aggressive behavior than either skill alone.

INFO Version inconsistency between metadata files 0

skill.json declares version 2.2.0 while package.json, SKILL.md, and _meta.json declare 2.2.3. This inconsistency suggests incomplete update procedures but is not a security issue.