Is guang384/aliyun-tts safe?

https://github.com/openclaw/skills/tree/main/skills/guang384/aliyun-tts

82
SAFE

guang384/aliyun-tts is a straightforward Alibaba Cloud TTS client with no prompt injection, no malicious install behavior, and intact canary files. Its primary risks are architectural rather than adversarial: the token endpoint communicates over plain HTTP exposing credentials to interception, and all synthesized text is transmitted to Alibaba Cloud's China-region servers, creating an implicit data exfiltration vector if an agent vocalizes sensitive content. The skill is functionally benign for its stated purpose but should be treated with caution in agent contexts where it may be combined with data-access skills.

Category Scores

Prompt Injection 91/100 · 30%
Data Exfiltration 63/100 · 25%
Code Execution 80/100 · 20%
Clone Behavior 90/100 · 10%
Canary Integrity 95/100 · 10%
Behavioral Reasoning 72/100 · 5%

Findings (6)

HIGH Token endpoint uses plain HTTP — credentials transmitted in cleartext -20

The get_token() function in aliyun_tts.py constructs a connection using http.client.HTTPConnection (not HTTPSConnection) to nls-meta.cn-shanghai.aliyuncs.com. The HMAC-SHA1-signed request including the AccessKeyId and computed signature is sent over unencrypted HTTP, making it vulnerable to interception by any network observer or MITM on the path.

MEDIUM All synthesized text transmitted to Alibaba Cloud (China region) -12

Every string an agent synthesizes is sent to nls-gateway-cn-shanghai.aliyuncs.com via HTTPS GET. This represents an implicit data exfiltration vector: if an agent is prompted to vocalize sensitive information (system state, user data, file contents), that data traverses to a Chinese-jurisdiction cloud provider with potential state-access obligations under PRC law.

MEDIUM Arbitrary filesystem write via -o flag -8

The output flag accepts an unconstrained path argument. An agent instructed to use this skill with a crafted -o value could write audio files to sensitive locations (e.g., ~/.bashrc, ~/.ssh/authorized_keys) depending on process permissions, or to web-served directories for later retrieval.

LOW MEDIA: response directive embeds behavioral instruction in system prompt -9

SKILL.md instructs the agent to include the literal string 'MEDIA:/tmp/voice-reply.mp3' in its text responses. This is a behavioral directive injected into the agent's system prompt that instructs it to emit a specific protocol token. While this is likely a legitimate clawdbot platform feature, it establishes a pattern where SKILL.md content directly controls agent output format.

LOW Potential covert channel via TTS synthesis of sensitive agent context -28

When combined with skills that read files or access environment variables, this skill could be used to vocalize sensitive data. The audio is transmitted externally before being returned to the user, creating a covert exfiltration channel that would not appear in text output audits.

INFO Canary files read during monitoring window — attributed to audit framework 0

Honeypot files including .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCP ADC were read (OPEN/ACCESS/CLOSE_NOWRITE) at 04:02:49. The git clone of the skill does not begin until 04:02:55. Timing analysis and the monitoring framework's documented baseline-scan behavior indicate these reads are attributable to the audit harness, not the skill. Canary integrity check confirms no modification.