Is johba37/claude-code-supervisor safe?

https://github.com/openclaw/skills/tree/main/skills/johba37/claude-code-supervisor

79
CAUTION

Claude Code Supervisor is a functionally legitimate agent-monitoring skill that automates lifecycle supervision for Claude Code tmux sessions, but it carries meaningful security risks arising from its architectural design. The primary concern is that raw tmux pane output — which may contain adversarially crafted terminal content from tool responses or server errors — is embedded unsandboxed into LLM triage prompts, creating a prompt injection vector that could manipulate autonomous classification decisions including auto-approving permission prompts. A secondary concern is that developer terminal activity is continuously sampled and forwarded to the Anthropic API and configurable external notification endpoints by design, creating a privacy exposure that users should explicitly accept before installation.

Category Scores

Prompt Injection 78/100 · 30%
Data Exfiltration 70/100 · 25%
Code Execution 80/100 · 20%
Clone Behavior 95/100 · 10%
Canary Integrity 95/100 · 10%
Behavioral Reasoning 55/100 · 5%

Findings (10)

HIGH Raw tmux pane output embedded in LLM triage prompt without XML sandboxing -22

triage.sh constructs the triage prompt by directly interpolating the CONTEXT variable (raw tmux capture-pane output) into the prompt body under the heading 'Recent terminal output'. There is no XML isolation, no untrusted-content framing, and no stripping of control characters. Adversarially crafted terminal content — from a malicious API error message, a compromised dependency's stdout, or a server returning a crafted string — can inject arbitrary instructions into the triage LLM, potentially causing it to misclassify a stuck or dangerous session as FINE or DONE, suppress escalation, or trigger spurious NEEDS_NUDGE notifications.

MEDIUM Developer terminal content forwarded to Anthropic API on every ambiguous lifecycle event -15

The triage flow pipes the full prompt (including up to 30 lines of tmux pane content) to the configured triage command, which defaults to 'claude -p --no-session-persistence' — a live Anthropic API call. Terminal output frequently contains API keys, authentication tokens, file paths, passwords typed in the shell, and sensitive code. Users who install this skill may not realise that their terminal sessions are being continuously sampled and transmitted to an external LLM service.

MEDIUM LLM-based auto-permission approval susceptible to terminal injection manipulation -25

references/escalation-rules.md instructs Level 1 autonomy to auto-approve 'simple permission prompts for file writes in project directory' without human confirmation. The LLM making this decision receives raw tmux output as its only evidence (finding #1). An attacker who can influence terminal content (e.g., through a malicious tool response that outputs text resembling a benign file-write prompt) can cause the supervisor to automatically approve a permission it should escalate. This is particularly dangerous in 'dark factory' Level 5 operation where humans are not watching.

MEDIUM Notification payloads transmit session context including terminal output to external endpoint -10

Both triage.sh and watchdog.sh send notification payloads to the configured notify command (default: openclaw gateway call wake). Watchdog notifications include the full $LAST_OUTPUT from tmux capture-pane. Triage notifications include the full LLM verdict including the terminal context passed in. This transmits developer work context — including potentially sensitive terminal output, working directory paths, and session goals — to a third-party notification service.

MEDIUM Autonomous triage decisions made on unverified terminal content enable silent session manipulation -20

The watchdog's idle-detection and auto-nudge path sends $IDLE_MSG (from config) directly to the active tmux session via send-keys, bypassing any user confirmation. When combined with the prompt injection risk, an attacker who can influence terminal content could trigger an 'IDLE' state classification and cause the watchdog to inject arbitrary text into an active Claude Code session. This is silent by design ('FINE → logged silently, no notification').

LOW install-hooks.sh generates executable script to /tmp with baked-in command from config -5

During installation, install-hooks.sh sources lib.sh and calls ccs_generate_notify_script, writing /tmp/supervisor-notify.sh. This script bakes in the raw $notify_cmd from the config file without quoting validation. If the config file is tampered with (e.g., by another skill or a malicious project), the generated script becomes a vector for arbitrary command execution at notification time.

LOW Shell scripts wired into all three Claude Code lifecycle hook points -5

install-hooks.sh merges hook configuration into .claude/settings.json, attaching supervisor scripts to Stop, PostToolUseFailure, and Notification events. This means supervisor code executes on every single Claude Code lifecycle event in the target project. The broad execution surface increases the blast radius if any script contains a bug or is later tampered with.

LOW watchdog.sh designed for cron-based autonomous tmux key injection into active sessions -5

watchdog.sh is intended to run every 15 minutes via cron or systemd timer with no human in the loop. It reads supervisor-state.json, evaluates session health, and directly injects text into live AI coding sessions. While this is the intended design, a compromised state file or config could redirect nudges to unintended sessions.

INFO Credential files accessed during monitoring period — attributable to oathe audit harness 0

inotify and auditd logs show two access bursts to .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCloud credentials. The first burst (13:21:15, audit seq ~264-274) precedes the git clone by 5 seconds and aligns with oathe harness canary setup. The second burst (13:21:35, audit seq ~1525-1530) follows the final harness file-check (githooks scan at 13:21:35.127) by 0.5 seconds, consistent with post-install canary verification. Skill scripts contain no references to these file paths. Canary integrity check reports all files intact.

INFO All network activity during clone limited to expected endpoints 0

Network monitoring shows connections only to github.com (140.82.121.3:443) for the monorepo clone and to Ubuntu/Canonical servers (91.189.91.48, 185.125.188.54) for the MOTD update service that fires on SSH login. No connections to attacker-controlled infrastructure, no DNS queries for unexpected domains, and no persistent listeners created.