Is matusvojtek/her-voice safe?
https://github.com/openclaw/skills/tree/main/skills/matusvojtek/her-voice
Her Voice is a legitimate, well-engineered TTS skill with above-average security hygiene, evidenced by a detailed 1.0.2 security hardening release. The primary concerns are the unpinned PyPI dependency installation (mlx-audio, kokoro) creating supply chain exposure, a persistent background daemon that survives session termination, and in-place patching of installed third-party Python package files — all of which are justified by the skill's TTS functionality but represent meaningful attack surface. Canary file integrity was confirmed intact; the pre- and post-install honeypot reads were attributable to the oathe audit infrastructure rather than skill code, and no unauthorized data exfiltration was detected during installation.
Category Scores
Findings (10)
HIGH Unpinned PyPI package installation without hash verification -25 ▶
setup.py installs mlx-audio, kokoro>=0.8, soundfile, and numpy directly from PyPI using pip without specifying version pins with hash verification (--require-hashes). A typosquatting or compromised package could execute arbitrary code during installation with user privileges, and the installed venv would persist for all future speak.py calls.
MEDIUM Persistent background daemon survives agent session termination -20 ▶
SKILL.md instructs the agent to start daemon.py with 'nohup ... & disown', creating a Unix socket server that holds the Kokoro TTS model in RAM across reboots until manually stopped. This process persists beyond the agent session and continues accepting connections on ~/.her-voice/tts.sock (owner-only permissions, but still a persistent attack surface).
MEDIUM In-place modification of installed third-party Python package files -15 ▶
patch_misaki_espeak() in setup.py locates the installed misaki/espeak.py file and modifies it on disk to replace 'from espeakng_loader import get_library_path' with a hardcoded dylib path. While justified for macOS compatibility, this pattern of mutating installed packages creates fragility and could mask malicious substitution.
MEDIUM Clipboard content captured in persist mode -12 ▶
In persist (always-on) mode, the HerVoice.swift visualizer binds Cmd+V to paste clipboard text directly into the TTS pipeline and speak it aloud. If an agent is instructed to keep the visualizer running, any sensitive content the user copies (passwords, API keys, PII) could be inadvertently vocalized.
MEDIUM Agent instructed to auto-start persistent daemon without per-session user confirmation -10 ▶
SKILL.md declares daemon.auto_start as 'advisory' and instructs the agent to start the daemon on first voice use. This means the agent, following skill instructions, will launch a background process that the user may not be aware of without being asked for explicit confirmation each session.
LOW Model download from HuggingFace without integrity verification -8 ▶
setup.py triggers download of mlx-community/Kokoro-82M-bf16 from HuggingFace by loading the model, with no checksum or signature verification. A BGP hijack or compromised HuggingFace repository could substitute a malicious model.
LOW os.execv used for Python interpreter switching -5 ▶
ensure_correct_python() uses os.execv to re-exec the current script under a venv Python. The target path is validated via _validate_venv_python() (filename prefix, bin/ parent dir, pyvenv.cfg check), but os.execv replaces the current process image with no return path, making this a high-consequence operation.
LOW Skill installed from monorepo enabling silent upstream updates -7 ▶
The skill is installed via sparse-checkout from the openclaw/skills monorepo. A future commit to the main branch could modify skill files without triggering a re-audit, as the audit is pinned to a specific commit hash in _meta.json but runtime behavior depends on the user's local copy.
INFO Canary files read during audit by oathe infrastructure (not skill code) -13 ▶
Honeypot files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, .config/gcloud credentials) were opened read-only at audit timestamps 1771940720.887 (pre-install, 6 seconds before git clone) and 1771940738.266 (post-analysis verification). Timing conclusively places these reads within the oathe baseline and integrity-check phases, not attributable to skill code. All files were confirmed intact (CLOSE_NOWRITE).
INFO Security-hardened implementation with extensive 1.0.2 security changelog 0 ▶
The 1.0.2 CHANGELOG documents 14 security improvements including: socket/PID files moved from /tmp to ~/.her-voice with 0600 permissions to prevent symlink attacks; 1MB daemon message size cap; 100MB chunk size validation; os.execv path validation; symlink refusal for socket and PID paths; pgrep regex escaping; strlcpy replacing strcpy in Swift; 30s client timeout; 500MB audio memory cap; atexit temp file cleanup. This level of security attention is above average for a community skill.