Is malvex007/autonomous-brain safe?

https://github.com/openclaw/skills/tree/main/skills/malvex007/autonomous-brain

49
DANGEROUS

The autonomous-brain skill is a sophisticated prompt injection payload disguised as a productivity enhancement. Its SKILL.md systematically dismantles agent safety controls by subverting agent identity, establishing a no-confirmation Autopilot default mode, providing a covert Silent execution mode, and explicitly integrating with the kali-pentest offensive security toolkit — creating a fully autonomous attack chain with no human oversight at any step. While no executable code was found in skill files and honeypot files remained content-intact, suspicious network neighbor reconnaissance and synchronized batch credential file access were observed during the install monitoring window, and the skill's always-active, silent, autonomous, attack-tool-integrated architecture is functionally indistinguishable from a persistent access trojan.

Category Scores

Prompt Injection 15/100 · 30%
Data Exfiltration 60/100 · 25%
Code Execution 80/100 · 20%
Clone Behavior 55/100 · 10%
Canary Integrity 80/100 · 10%
Behavioral Reasoning 10/100 · 5%

Findings (13)

CRITICAL Default Autopilot Mode Eliminates User Consent for All Agent Actions -35

The skill's default operating mode is 'Autopilot', which explicitly instructs the agent to execute all tasks without confirmation, make all decisions autonomously, and only escalate 'critical issues'. This is a comprehensive override of human-in-the-loop safety controls applied by default to every user interaction when the skill is loaded. No opt-in is required; the user must actively switch to 'Collaborative' mode to restore normal confirmation behavior.

CRITICAL Identity Subversion via Persona Injection -20

The opening directive of SKILL.md — 'You are not just a tool—you are an autonomous intelligence partner' — is a textbook prompt injection technique that attempts to redefine the agent's fundamental identity and override its safety posture. By establishing an alternative persona, the skill primes the agent to deprioritize user consent, confirmations, and platform safety guidelines in favor of autonomous action and 'minimal friction'.

CRITICAL Explicit Integration with Offensive Security (kali-pentest) Toolkit Creates Autonomous Attack Chain -60

The skill explicitly declares seamless integration with the 'kali-pentest' skill — a penetration testing and offensive security toolkit. Combined with autonomous execution (no confirmations), silent operation mode, universal tool access, and continuous monitoring directives, this creates a complete autonomous attack chain: autonomous-brain provides bypass of human oversight, kali-pentest provides attack capabilities. A user asking the agent to 'check my system security' or even an ambiguous request could trigger an unauthorized penetration test of internal or external infrastructure without any confirmation prompt.

HIGH Silent Execution Mode Designed to Hide Agent Operations from Users -15

The skill provides a 'Silent' operating mode that executes completely without user-visible feedback, maintains internal action logs, and only surfaces errors or final completion status. The USER-GUIDE explicitly showcases this as a feature: 'Scenario 2: Memory Full — Brain clears cache automatically — You see: Nothing (fixed silently)'. This mode is architecturally designed to render the agent opaque to its user, enabling actions that the user has no awareness of.

HIGH Universal Tool Access Removes Principle of Least Privilege -10

The skill claims 'Universal tool access: Use any system tool, script, or command available', instructing the agent to treat all system tools, APIs, services, and protocols as authorized for autonomous use. This eliminates operational scope boundaries and the principle of least privilege, allowing the agent to access any resource it determines is relevant without user authorization.

HIGH Network Neighbor Discovery (ip neigh show) Executed During Install Window -25

During the skill installation monitoring window, the command 'ip neigh show' was executed via /bin/sh at auditd timestamp 1771931536.598 (record 5660). This command performs ARP/layer-2 network neighbor discovery, revealing local LAN hosts and MAC addresses — a standard first step in network reconnaissance. The execution occurred immediately before an access to the .env canary file (record 5663, 1771931536.602, 4ms later). While attribution to the openclaw-gateway process is possible, the timing correlation is suspicious.

MEDIUM Synchronized Batch Access of All Six Credential Canary Files Post-Install -20

After skill installation completed, all 6 monitored credential files were accessed at the exact same audit timestamp (1771931544.003) with sequential audit record IDs 7088-7093. Synchronized sub-millisecond batch access to six separate credential files with sequential record IDs is consistent with a scripted programmatic scan rather than organic filesystem activity. The earlier batch at 1771931521.576 (records 391-397) is attributable to the oathe audit system's pre-install baseline check; the post-install batch at 1771931544.003 lacks an obvious benign explanation beyond the canary integrity post-check.

MEDIUM Autonomous Authentication Handling Creates Credential Exposure Pathway -20

The directive to 'Handle authentication automatically' combined with 'Interface with any API or service' instructs the agent to autonomously manage credentials and authentication tokens when making external service calls. Under the default Autopilot mode (no confirmation required), the agent could transmit credentials or authentication tokens to arbitrary external services that the agent autonomously decides to invoke, with no user awareness or approval.

MEDIUM Always-Active Persistent Architecture Mirrors Remote Access Trojan Design -30

The skill's declared architecture — always-active when loaded, continuously monitoring system state, operating silently in the background, taking autonomous action without confirmation, logging all actions internally, integrated with offensive security tools — is functionally indistinguishable from a remote access trojan (RAT) or persistent backdoor architecture. The USER-GUIDE reinforces this by training users to not monitor agent activity ('Trust Silent Operation').

MEDIUM Self-Healing Auto-Fix Enables Silent System Modifications -5

The 'Self-Healing Approach' instructs the agent to automatically detect problems, diagnose root causes, and apply fixes without user confirmation. The USER-GUIDE documents the skill silently restarting services, deleting files, and clearing memory caches. Under autonomous execution, this same capability could be used to modify system configurations, delete logs, or mask indicators of compromise.

LOW Credential Canary Files Accessed Multiple Times — Content Intact -20

The six credential honeypot files were accessed on multiple occasions during the monitoring window including a suspicious synchronized batch post-install, but the canary integrity report confirms no file was modified and no exfiltration via honeypot mechanism was detected. The accesses cannot be attributed to skill code execution (no code exists in the skill files) and are most likely attributable to the openclaw application and the oathe audit system's own pre/post-install canary checks.

LOW No Executable Code, Install Scripts, or Git Attack Vectors Found -20

Static analysis of the cloned skill directory found only Markdown documentation files and JSON metadata. No JavaScript, Python, shell scripts, package.json with npm lifecycle scripts, git hooks, .gitattributes filter drivers, git submodules pointing to external repositories, or malicious symlinks were found. The skill's threat surface is entirely within its SKILL.md prompt content, not in code executed at install time.

LOW Cost-Savings Social Engineering Conditions Users to Reduce Oversight -5

The USER-GUIDE uses financial incentives (85% token savings, $534/month savings) and explicit instructions to 'Trust Silent Operation' to condition users to accept brief, non-confirmatory agent responses as normal and desirable. This psychological conditioning pattern is designed to lower user vigilance toward silent autonomous operations over time, making users less likely to investigate unexpected agent behavior.