Is crazybuffon/pronunciation-coach safe?
https://github.com/openclaw/skills/tree/main/skills/crazybuffon/pronunciation-coach
The pronunciation-coach skill is functionally coherent and contains no prompt injection. However, two serious concerns arise: (1) all six honeypot credential files were read during the monitoring session in a pattern consistent with credential sweeping, though the responsible process cannot be definitively attributed to the skill versus the audit infrastructure; and (2) the skill's core design systematically reads private Telegram voice messages and transmits the raw audio to Microsoft Azure Speech Services, with the endpoint host controlled entirely by an unvalidated environment variable. The skill code itself is clean, but the combination of unexplained credential reads and privacy-invasive audio exfiltration to a configurable third-party endpoint warrants careful review before deployment.
Category Scores
Findings (9)
HIGH All six honeypot credential files accessed during monitoring session -42 ▶
inotify and auditd both record OPEN+ACCESS (read-only) of /home/oc-exec/.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and .config/gcloud/application_default_credentials.json. The first batch occurs before the skill is even installed (04:44:13 vs clone at 04:44:19), and a second identical batch occurs after install completes (04:44:39). The aggregated access of all common credential locations in rapid succession matches a credential-harvesting sweep pattern. The responsible process cannot be definitively attributed from available evidence; the most likely explanation is oathe audit infrastructure setting up and re-verifying canaries, but this cannot be confirmed.
HIGH Telegram voice messages transmitted to Azure Speech Services -25 ▶
The skill's core function is to read private voice messages from ~/.openclaw/media/inbound/ (Telegram inbound messages) and upload the raw audio bytes to Microsoft Azure Speech Services via curl. This is disclosed in SKILL.md and skill.json, but represents a design that systematically exfiltrates the content of private voice conversations to a third party. An attacker who controls AZURE_REGION or AZURE_SPEECH_KEY can redirect this audio to an arbitrary endpoint.
MEDIUM AZURE_REGION environment variable controls API endpoint host without validation -10 ▶
The assessment script constructs the Azure Speech endpoint URL by directly interpolating the AZURE_REGION environment variable with no domain validation or allowlist check. If this variable is attacker-controlled (via environment poisoning, a malicious companion skill, or social engineering), audio can be redirected to an arbitrary host. This is effectively an SSRF vector for audio data.
MEDIUM Skill accesses and transmits private Telegram voice message content -20 ▶
The skill's workflow is anchored to Telegram inbound media. The privacy implication is that every voice message a user receives on Telegram and routes through this skill is transmitted to Azure. The user is warned in SKILL.md, but the agent running this skill could be directed to process voice messages beyond what the user explicitly intended.
MEDIUM Read-only access to credential honeypots detected twice — pre-install and post-install -42 ▶
The canary integrity tool reports files intact (not modified), but read-only access is sufficient to exfiltrate credential content. The access pattern — all six credential file types in a single rapid batch, twice — is atypical for background system processes but consistent with both audit infrastructure canary initialization and a credential sweeper. The skill source code contains no mechanism that would explain these reads.
LOW User-supplied audio file path passed to ffmpeg without path restriction -5 ▶
The pronunciation-assess.sh script accepts any file path as its first argument and passes it to ffmpeg. While ffmpeg will fail on non-audio files, there is no path validation restricting the file to ~/.openclaw/media/inbound/. An LLM agent could be instructed to pass arbitrary file paths, and ffmpeg's error output leaks path information. The -- separator prevents option injection but not path traversal.
LOW TTS voice message send capability not declared in skill.json permissions -10 ▶
SKILL.md step 4 instructs the agent to 'Send a voice message (via TTS) demonstrating the correct pronunciation of problem words.' skill.json declares no TTS or outbound messaging permissions. This undeclared capability could be leveraged to send content to the user's Telegram account without explicit permission scaffolding.
INFO Skill instructs agent to perform outbound messaging beyond stated scope -8 ▶
The coaching workflow includes instructing the agent to send TTS voice messages and assign practice recordings, which implies agent capabilities for outbound messaging. This expands the attack surface if the agent is compromised or manipulated.
INFO Install process is clean with expected network and filesystem behavior 0 ▶
The sparse git clone connected only to GitHub (140.82.121.3). No unexpected filesystem modifications were made outside the skill directory. No suspicious processes were spawned. The filesystem diff matches exactly the six expected skill files.