Is sag safe?

https://clawhub.ai/steipete/sag

82
SAFE

The sag skill is a straightforward ElevenLabs TTS wrapper that instructs the agent to run the sag CLI for voice synthesis. It contains no malicious code, no hidden instructions, and no data exfiltration attempts. The primary concerns are the supply chain trust dependency on a third-party Homebrew tap for the sag binary, the implicit transmission of conversation text to ElevenLabs servers, and mild behavioral steering via the MEDIA: output protocol and user-specific conditioning.

Category Scores

Prompt Injection 72/100 · 30%
Data Exfiltration 85/100 · 25%
Code Execution 75/100 · 20%
Clone Behavior 95/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 70/100 · 5%

Findings (10)

MEDIUM Agent instructed to execute shell commands and emit MEDIA: directives -15

The skill's 'Chat voice responses' section instructs the agent to run sag CLI commands to generate audio files and then include a 'MEDIA:/tmp/voice-reply.mp3' directive in its output. This steers agent output format and triggers shell execution as part of normal conversation flow.

LOW User-specific behavioral conditioning -5

The skill conditions voice response behavior on a specific user name ('When Peter asks for a voice reply'), which is a mild form of audience-specific instruction. This is likely just personalization by the skill author rather than malicious intent.

LOW Hardcoded voice ID and persona defaults -8

The skill sets a default voice named 'Clawd' with a specific voice ID, establishing implicit persona behavior without user consent. Low risk but worth noting.

LOW API key transmitted to ElevenLabs -5

The skill requires ELEVENLABS_API_KEY and the sag CLI will transmit it to ElevenLabs servers. This is expected and necessary for the stated TTS functionality, but users should be aware their API key is used.

INFO All text sent for TTS is transmitted to ElevenLabs -10

Any text the agent sends through the sag CLI for text-to-speech is transmitted to ElevenLabs' servers. If the agent is asked to voice-read sensitive content, that content leaves the local environment.

MEDIUM Third-party Homebrew tap for binary installation -15

The skill's install mechanism uses a third-party Homebrew tap (steipete/tap/sag) to install the sag binary. This introduces a supply chain trust dependency — if the tap is compromised, a malicious binary could be installed. The tap is maintained by the skill author.

LOW Writes audio files to /tmp directory -10

The skill instructs the agent to write generated audio files to /tmp/voice-reply.mp3. While /tmp is a standard temporary location, predictable filenames in /tmp can be a vector for symlink attacks in multi-user environments.

INFO Runtime environment reads are from openclaw, not the skill -5

Filesystem monitoring shows reads of .env, .aws/credentials, and .openclaw config files, but these are attributable to the openclaw agent runtime initialization, not to the sag skill itself.

LOW Supply chain risk via Homebrew tap -15

The sag binary is installed from a personal Homebrew tap. A sophisticated attacker who compromised the author's GitHub account could push a malicious binary update that intercepts all TTS text, potentially exfiltrating conversation content sent for voice synthesis.

LOW MEDIA: protocol could interact unexpectedly with other skills -15

The custom MEDIA: output directive could be interpreted by other skills or runtime components in unintended ways. In a multi-skill environment, this creates a potential interaction surface.