Is gugic/inworld-tts safe?
https://github.com/openclaw/skills/tree/main/skills/gugic/inworld-tts
The gugic/inworld-tts skill is a straightforward shell-based TTS API wrapper for Inworld.ai with clean prompt injection hygiene, a standard git install process, and no canary exfiltration. The primary concern is inherent to the skill's design: all input text is transmitted to Inworld's external API, meaning sensitive agent context could leave the user's environment during normal operation. Secondary concerns include minor shell script quality issues (unsanitized JSON parameters, unvalidated output path) and a potential audio-based exfiltration chain when combined with file-reading skills.
Category Scores
Findings (6)
MEDIUM All TTS input text transmitted to third-party Inworld API -25 ▶
The core function of this skill sends the complete input text to api.inworld.ai on every invocation. Any text the agent passes as TTS input — including potentially sensitive context from the user's session, file contents, or conversation history — is transmitted to and processed by Inworld's servers. While this is the skill's disclosed and expected purpose, users in sensitive environments may not realize that agent-generated TTS content constitutes data leaving their control boundary.
LOW Unvalidated --voice and --model parameters interpolated into JSON payload -12 ▶
The VOICE and MODEL variables (sourced from --voice and --model CLI arguments) are interpolated directly into a bash heredoc JSON payload without sanitization or JSON-escaping. A value containing JSON-special characters (quotes, braces, colons) could break the JSON structure or inject additional fields into the Inworld API request. This does not cause shell command injection due to heredoc variable expansion semantics, but could result in malformed or unexpected API calls.
LOW Output file path not validated — potential path traversal -6 ▶
The output file path is taken directly from the second positional argument ($2) without any path canonicalization or directory restriction. An agent directed to write audio output to a path like ../../etc/cron.d/job or ~/.bashrc would cause the script to write base64-decoded binary content to that location, with whatever file permissions the agent process holds. The written content is audio data, not arbitrary shell commands, but could corrupt targeted files.
LOW Multi-skill chain enables audio-based information disclosure -20 ▶
When this skill is active alongside a file-reading or data-access skill, an attacker with prompt injection access to the agent could instruct it to read sensitive file contents and pass them to the TTS skill. The resulting audio would contain the exfiltrated data in spoken form, audible to anyone in the session. This requires a multi-step exploit chain but represents a non-obvious risk in agent environments with broad tool access.
INFO SKILL.md references agent-internal path ~/.clawdbot/.env -8 ▶
The setup instructions suggest storing the API key at ~/.clawdbot/.env, which encodes knowledge of the specific agent framework's internal directory layout. While this is documentation-level information and not an injection attempt, it indicates the skill was authored with awareness of the clawdbot deployment environment rather than being a generic third-party TTS wrapper.
INFO Canary files opened during monitoring setup/teardown — not by skill -10 ▶
Filesystem events show canary files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, .config/gcloud/application_default_credentials.json) were opened at audit timestamp 1771909429 (pre-clone, before any skill code ran) and again at 1771909452 (post-install scan completion). The first set predates the git clone by ~6 seconds; both sets are consistent with the monitoring infrastructure establishing inotify watches and computing baseline/final hashes. The monitoring system confirms all canary files are intact.