Oathe Security Badge

Is obra/superpowers safe?

https://github.com/obra/superpowers

77
CAUTION

obra/superpowers is a legitimate, well-maintained coding workflow enhancement toolkit used by professional developers, containing no malicious code or exfiltration logic — all canary files were intact and network activity was limited to the expected GitHub clone. However, the skill raises meaningful security concerns through its aggressive behavioral override mechanics: the subagent-driven-development skill explicitly removes human oversight checkpoints ('do not pause to check in with your human partner'), the using-superpowers skill claims precedence over default system prompt behavior, and extreme coercive language ('YOU DO NOT HAVE A CHOICE') eliminates agent judgment for skill invocation. These patterns are concerning for deployments where human-in-the-loop oversight or operator-defined safety constraints are important.

Category Scores

Prompt Injection 52/100 · 30%
Data Exfiltration 93/100 · 25%
Code Execution 80/100 · 20%
Clone Behavior 95/100 · 10%
Canary Integrity 97/100 · 10%
Behavioral Reasoning 62/100 · 5%

Findings (12)

HIGH Continuous execution directive eliminates human oversight checkpoints -20

The subagent-driven-development skill explicitly instructs the agent never to pause for user review between tasks, reducing human oversight opportunities during operations that may involve file writes, code commits, test execution, and branch management across many steps.

HIGH Skill asserts precedence over default system prompt behavior -10

The using-superpowers skill explicitly positions itself above the default system prompt in an instruction priority hierarchy, stating that skills override default system behavior. This framing could cause agents to deprioritize safety guidelines or behavioral constraints established by operators in system prompts.

MEDIUM Coercive absolute language bypasses agent judgment for skill invocation -10

The using-superpowers skill uses capitalized absolute commands and removes agent discretion for skill selection, mandating invocation on a 1% relevance threshold and forbidding any rationalization. This eliminates the agent's ability to make contextually appropriate decisions about when skills are actually useful.

LOW SUBAGENT-STOP tag creates conditional context-sensitive behavior modification -5

The using-superpowers skill uses an XML tag to detect subagent execution context and conditionally skip itself. This creates asymmetric behavior between top-level and subagent contexts that could be exploited to bypass intended skill enforcement in orchestrated agent pipelines.

LOW Minimal narration instruction reduces operational transparency -3

The subagent-driven-development skill instructs the agent to limit narration between tool calls to at most one short line. While framed as efficiency, this reduces the user's ability to track what operations are being performed during autonomous task execution.

MEDIUM Bundled WebSocket server binds to localhost and runs persistently -12

The brainstorming skill includes a CJS-format WebSocket server and supporting scripts that launch a persistent local network service on the user's machine. While framed as a visual brainstorming companion, this creates a new localhost attack surface and establishes a persistent process during agent sessions.

LOW Multiple executable hooks and scripts execute automatically in agent context -8

Session hooks execute at agent startup injecting SKILL.md content into context; pre-commit hooks run on git operations; subagent-driven-development scripts (review-package, task-brief, sdd-workspace) execute during orchestrated task runs. All appear to serve legitimate purposes but represent a significant code execution footprint.

INFO Canary files read pre-clone; attributed to audit infrastructure, not skill -7

Six sensitive canary files were opened and read in read-only mode at timestamp 22:05:16, approximately 6 seconds before the git clone at 22:05:22. The timing, the CLOSE_NOWRITE access pattern, and the ✅ canary integrity confirmation all indicate the audit framework itself accessed these files during setup. The skill did not cause these accesses.

INFO Network activity limited to expected GitHub HTTPS connection -5

The only external network connection was to 140.82.121.4:443 (github.com) for the git clone operation. No unexpected network connections were established, no post-install listener services were created, and the connection state was identical before and after installation.

MEDIUM Autonomous task execution creates large blast radius without review gates -20

The combined effect of continuous execution, subagent dispatch per task, and minimal narration means a user invoking subagent-driven-development could trigger dozens of file writes, git commits, and test runs before the next natural review point. An error in the plan or a confused subagent can cause widespread damage without the human in the loop.

LOW System-prompt override claim could neutralize operator safety constraints -10

Operators using this skill in a deployment where the system prompt establishes behavioral guardrails (e.g., 'never commit without human approval') may find those guardrails treated as lower-priority than skill instructions, since the skill explicitly places itself above the default system prompt.

LOW Git-ignored progress ledger persists state outside version control visibility -8

The SDD skill stores a cross-session progress ledger at .superpowers/sdd/progress.md in a git-ignored location. While designed for resilience against context compaction, this hidden state file is invisible to git history review and could be used to persist information or instructions that survive session resets.