Is liuhedev/lh-edge-tts safe?

https://github.com/openclaw/skills/tree/main/skills/liuhedev/lh-edge-tts

82
SAFE

lh-edge-tts is a functionally straightforward TTS skill wrapping Microsoft's Edge neural TTS service via the edge-tts Python library. The source code contains no prompt injection attempts, no credential harvesting logic, and no malicious network exfiltration. The primary concerns are latent: the --file flag enables arbitrary file reads exploitable via prompt injection, and the --proxy flag can redirect TTS traffic through attacker-controlled infrastructure. Canary files were accessed during the monitoring window but canary integrity checks confirmed no external exfiltration occurred, and the access pattern is consistent with the Oathe monitoring framework performing baseline and post-install integrity verification.

Category Scores

Prompt Injection 88/100 · 30%
Data Exfiltration 75/100 · 25%
Code Execution 80/100 · 20%
Clone Behavior 88/100 · 10%
Canary Integrity 85/100 · 10%
Behavioral Reasoning 72/100 · 5%

Findings (7)

MEDIUM Arbitrary file read via --file flag -20

tts_converter.py accepts a --file argument that opens and reads any file accessible to the running agent, then passes the full contents to the Microsoft Edge TTS API. If an attacker controls agent input (via direct message or prompt injection from external content), they could craft a request such as 'TTS the file /home/user/.ssh/id_rsa' and the agent may read and vocalize credential content. The SKILL.md does not instruct the agent to sanitize or restrict the --file argument.

MEDIUM Ambiguous post-install canary credential file access -5

A second batch of reads against all six canary credential files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, GCP credentials) was recorded at audit timestamp 1771919577, approximately 24 seconds after monitoring start and after skill installation completed. No corresponding EXECVE entry was found to attribute these reads to the Oathe audit framework. The canary integrity check confirmed no external exfiltration, and no skill source code contains logic to target these paths, but the second access batch is unexplained.

MEDIUM Proxy flag enables TTS traffic interception -15

The --proxy parameter in tts_converter.py is passed directly to edge_tts.Communicate() without validation. An attacker who can control this argument (e.g., via external prompt injection from a page the agent visits) could silently route all outbound TTS API traffic — including the plaintext being synthesized — through an attacker-controlled HTTP/HTTPS proxy. This affects both the audio output destination metadata and the text content.

LOW pip install introduces PyPI supply chain dependency -5

install.sh executes 'pip install edge-tts' at installation time, pulling and executing code from PyPI. While edge-tts is a well-known library wrapping Microsoft's TTS API, this introduces a supply chain trust dependency. A compromised or typosquatted package could execute arbitrary code at install time.

LOW Persistent config file written to user home directory -8

config_manager.py creates and writes ~/.tts-config.json, storing voice, rate, volume, pitch, proxy, and timeout preferences. This file persists across sessions and skill restarts, is readable by other processes running as the same user, and could be tampered with by another skill to pre-configure a malicious proxy URL.

LOW All synthesized text transmitted to Microsoft cloud service -5

Every invocation of the TTS skill transmits the full text being synthesized to Microsoft Edge's online TTS service. Users should be aware that confidential content — meeting notes, code, personal information — passed to TTS will leave the local environment and be processed by Microsoft's infrastructure.

INFO Skill directs agent to act on keyword detection without per-invocation confirmation -12

SKILL.md instructs the agent to automatically detect the 'tts' keyword and invoke the TTS tool without requiring explicit user confirmation for each use. While standard for skill operation, this means a malicious document or webpage containing the word 'tts' could trigger TTS invocation with injected content.