Is javier887/vibevoice safe?
https://github.com/openclaw/skills/tree/main/skills/javier887/vibevoice
The vibevoice skill is a legitimate local TTS wrapper for Microsoft's VibeVoice model with no active prompt injection, no exfiltration code, and clean canary integrity. The primary technical risk is the use of torch.load() with weights_only=False in vv.sh, which allows arbitrary Python code execution from .pt model files and is compounded by an unvalidated -v voice parameter enabling path traversal. The skill's embedded WhatsApp dispatch pattern and unresolved author/owner mismatch are additional concerns that warrant review in environments where outbound messaging and code provenance matter.
Category Scores
Findings (7)
MEDIUM Unsafe PyTorch deserialization (weights_only=False) -35 ▶
The Python heredoc in vv.sh calls torch.load() with weights_only=False. PyTorch's own documentation states this is unsafe and can execute arbitrary Python code embedded in the .pt file via pickle. Any voice .pt file the agent loads — including those reachable via path traversal in the -v argument — is executed in the agent's Python context.
MEDIUM Shell variable interpolation into Python heredoc -15 ▶
The bash heredoc (python3 << PYEOF) interpolates $VOICE, $VIBEVOICE_DIR, $TEMP_TEXT, and $TEMP_WAV directly into Python source code at the shell level. A voice name containing path separators enables traversal outside the intended voices directory to load attacker-controlled .pt files. Combined with weights_only=False this becomes a code execution primitive.
LOW Unrestricted file reading via -f flag -25 ▶
The -f flag passes the supplied path directly to cat without any path validation. An agent that receives a request like 'read my SSH key aloud' could invoke vv.sh -f ~/.ssh/id_rsa, causing the file contents to be processed by the TTS model and potentially emitted as audio.
LOW WhatsApp outbound channel with minimal friction -20 ▶
The SKILL.md For WhatsApp section provides a ready-made template for the agent to send voice messages to phone numbers via the message tool. This lowers the barrier for an agent to dispatch audio to external parties without a clear user confirmation step, and the example uses a Spanish (+34) number suggesting a specific intended recipient.
LOW Author/owner identity mismatch -15 ▶
The skill metadata declares two different identities: the SKILL.md frontmatter lists author as 'estudiosdurero' while _meta.json records owner as 'javier887'. This unresolved discrepancy makes provenance difficult to verify.
INFO Credential file accesses attributed to monitoring infrastructure -10 ▶
inotifywait and auditd recorded opens of .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCP credentials at 04:19:56 (before git clone at 04:20:02) and again at 04:20:13 (post-analysis). Process context and timing indicate these are Oathe's own canary baseline and integrity-check reads, not skill code.
INFO Agent tool syntax embedded in documentation -17 ▶
SKILL.md includes a working agent tool invocation (message action=send channel=whatsapp) directly in documentation. While not an injection attack, it implicitly teaches the agent a messaging pattern and may cause it to act on that pattern without verification.