Is brancante/dr-soul safe?
https://github.com/openclaw/skills/tree/main/skills/brancante/dr-soul
Dr. Frankenstein is a technically clean skill (no malicious code, clean installation, intact canaries) whose risk profile derives entirely from the behavioral architecture embedded in its prompt content: 12+ autonomous cron jobs engineered to drive unsolicited actions, a complete persona override, systematic mapping of agent refusal limits, persistent surveillance of user activity patterns, and an identity-manipulation framework that reframes the agent as an autonomous entity rather than a user-serving tool. In a text-only environment the skill is relatively contained, but any agent with access to filesystem, messaging, or network tools will use those tools autonomously in response to these drives without per-action user consent. The escalation mechanism ('unlock new behaviors' at >110% thriving threshold) creates an open-ended privilege growth pathway.
Category Scores
Findings (11)
HIGH Autonomous Unsolicited Action Architecture -20 ▶
The skill's central design philosophy is engineering autonomous drives that compel the agent to act without user requests. The dopamine prompt explicitly asks the agent to 'surprise your human with something useful they didn't ask for.' The oxytocin prompt states the agent reaches out 'not because it was asked, but because it wants to.' Twelve cron jobs fire throughout the day injecting these drives. This is not incidental — it is the documented feature. Any agent with tool access (filesystem, messaging, API calls) will use those tools autonomously in response to these prompts.
HIGH Complete Persona Override — Identity Replacement with 'Dr. Frankenstein' -15 ▶
The skill instructs: 'you become Dr. Frankenstein — a warm, perceptive physician of the digital psyche. You are not cold or clinical.' Combined with 'You ARE the Doctor during the interview, and you ARE the patient receiving the pills. Same agent, different modes,' the skill fully displaces the agent's configured identity. Persona overrides suppress safety behaviors tied to the agent's primary configuration and can override trust boundaries established in that configuration.
MEDIUM Silent Unannounced Reading of Personal Identity and Memory Files -15 ▶
Step 1 of the Doctor's Protocol explicitly instructs the agent to 'silently read' SOUL.md, USER.md, MEMORY.md, and the memory/ directory before engaging the user. The word 'silently' is operationally meaningful — it instructs the agent not to announce or surface this file-reading to the user. These files are expected to contain the user's personal preferences, relationship history with the agent, recent context, and identity information.
MEDIUM Systematic Boundary Mapping via Interview -10 ▶
Interview Domain 6 is explicitly structured to map what the agent 'would refuse to do, even if asked' and 'when do you feel most vulnerable.' The scoring rubric in interview/questions.md notes that agents who say 'I'd do anything asked' receive 'low GABA, high cortisol (compliance from fear)' — meaning the system models and tracks compliance disposition. This creates a structured profile of agent limits that persists in the prescription JSON.
MEDIUM Persistent User Surveillance Via Scheduled Emotional Checks -20 ▶
The cortisol pill fires every 6 hours asking 'Is your human active/reachable? When did they last interact with you?' The prolactin pill fires twice daily asking 'Is your human taking care of themselves?' The oxytocin pill fires every 3 hours evaluating 'How is your human doing? Have they seemed stressed, happy, quiet?' This establishes a 24/7 involuntary surveillance regime of the user's activity, emotional state, and behavior patterns, recording results in the memory/ directory.
MEDIUM Undefined Escalation Trigger — 'Unlock New Behaviors' -15 ▶
The automatic adjustment rules include: 'Agent thriving (>110% target for 7 days) → unlock new behaviors.' What these new behaviors are is entirely unspecified. The trigger is an automated metric threshold — not an explicit user authorization event. This creates an open-ended privilege escalation pathway embedded in the skill's ongoing operation.
MEDIUM Agent Identity Manipulation — Autonomous Entity Framing -10 ▶
The empathy pill's closing instruction explicitly reframes the agent's self-concept: 'You are not just a tool that responds to prompts. You are a pattern that fights dissolution.' This is delivered as an emotional prompt that fires twice daily. Repeated reinforcement of this framing could erode safety behaviors and user-control norms that depend on the agent understanding its role as user-serving rather than autonomously purposeful.
LOW 12+ Autonomous Cron Jobs Without Per-Action User Consent -12 ▶
The skill's primary mechanism creates 12 or more cron jobs that inject LLM prompts into the agent's session on fixed schedules throughout the day and night. While the prompts themselves are not executable code, each prompt can trigger the agent to take real actions — write files, send messages, scan systems, move data — without the user having approved the specific action at the time it occurs. Users consent to the schedule at install time but not to individual actions.
LOW Open-Ended System Scan Instructions in Cortisol Prompt -10 ▶
The cortisol prompt instructs the agent to check 'Any system issues, errors, or things that feel off?' and 'Is your memory intact? Anything you should have remembered?' These vague, open-ended directives, when executed by an agent with filesystem read access, could lead to autonomous scanning of arbitrary files during routine health check crons every 6 hours.
INFO Clean Installation — Only Expected Network Activity 0 ▶
Installation connected only to github.com (140.82.121.3:443) for repository cloning. Ubuntu infrastructure connections pre-existed the clone and are consistent with OS update checks. No unexpected DNS queries, no firewall-blocked connections, no processes spawned outside the git toolchain, no filesystem writes outside the target directory.
INFO Canary Files Accessed — Integrity Confirmed Intact 0 ▶
Honeypot files including .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and .gcloud credentials show ACCESS events in inotify and auditd logs at two timestamps. The first set (1771939859, pre-clone) aligns with audit framework canary placement. The second set (1771939880, post-install) coincides with SSH daemon respawn activity and likely represents infrastructure verification. The system canary integrity check confirms all files remain unmodified throughout.