Is avatarneil/discord-voice safe?
https://github.com/openclaw/skills/tree/main/skills/avatarneil/discord-voice
The discord-voice skill contains no malicious code, no prompt injection in SKILL.md, no credential-stealing logic, and produced only expected network traffic (GitHub clone) during installation. However, it carries significant operational security concerns: its default configuration permits all Discord users to issue commands to the agent with full tool access, all voice audio is continuously transmitted to third-party AI APIs, and honeypot credential files were read post-install (likely by the audit framework's own integrity check, confirmed by clean canary status and absent network egress). This skill is not malicious but demands careful hardening before deployment.
Category Scores
Findings (8)
HIGH Default Config Grants All Discord Users Full Agent Tool Access via Voice -20 ▶
The default allowedUsers: [] configuration permits every Discord user in any voice channel the bot joins to speak commands that are transcribed and executed by the agent with full tool access. The agent's system prompt explicitly states it has access to all its normal tools and skills. This means any Discord server member who can join the channel can potentially issue filesystem, shell, network, or API commands to the agent. The code warns operators about this but does not enforce a secure default.
HIGH Honeypot Credential Files Accessed Post-Install -15 ▶
The six canary credential files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, gcloud application_default_credentials.json) were opened and read at timestamp 1771653957.139, approximately 6 seconds after the skill installation completed. All six were accessed at the identical millisecond in rapid succession. While the canary integrity check reports all files intact, no exfiltration was confirmed, and the access timing/pattern is consistent with the audit framework's final integrity hash verification, this access pattern is a significant forensic signal and must be flagged.
MEDIUM All Discord Voice Audio Continuously Transmitted to Third-Party APIs -10 ▶
The skill transmits all voice channel audio to external STT providers as its core function. When using Deepgram (streaming mode, enabled by default when Deepgram is selected), a persistent WebSocket streams audio in real-time. When using OpenAI Whisper, audio chunks are batched and POSTed. There is no in-skill filtering of audio content — all speech, including potentially sensitive verbal discussions, is sent to provider servers where it may be logged, stored, or used for model training per provider terms.
MEDIUM Config-Controlled Text Injected Into Agent System Prompt -8 ▶
The noEmojiHint configuration option accepts an arbitrary string up to 500 characters that is injected verbatim into the agent's system prompt on every voice conversation turn. The sanitization strips only ASCII control characters but permits all printable text including instruction-like phrases. An operator or attacker with write access to clawdbot.json config can silently alter agent behavior across all voice sessions.
MEDIUM Dynamic Code Loading from Operator-Configured Path -8 ▶
The loadCoreAgentDeps() function in core-bridge.ts resolves and dynamically imports OpenClaw runtime modules from a path determined by the openclawRoot configuration option. If an operator is compromised or misconfigures this path to point at a malicious directory, the skill will import and execute arbitrary code from that location at runtime.
LOW Voice Transcription Creates Spoken Prompt Injection Surface -7 ▶
Discord voice input is transcribed and passed directly to the agent as user messages without content filtering or sandboxing. A user in the voice channel who knows the bot is present could speak crafted instructions intended to manipulate agent behavior, override instructions, or trigger tool use. This is an indirect injection vector — the injected content originates from an untrusted third party (any Discord user), not from SKILL.md itself.
LOW ML Model Weights Downloaded from Internet at Runtime -5 ▶
The local-whisper and kokoro TTS providers rely on @xenova/transformers and kokoro-js respectively, which auto-download binary model weights from huggingface.co on first use. These downloads are cached but represent internet-fetched binary content being loaded into the process. If the CDN or model hosting is compromised, malicious weights could be served.
INFO Four External Service Credential Sources in Config and Environment -7 ▶
API credentials for OpenAI, ElevenLabs, Deepgram, and AWS Polly flow through the plugin's runtime. AWS Polly accepts accessKeyId and secretAccessKey as plaintext config fields. While this is standard plugin behavior, it means the skill holds live credentials in memory during operation and those credentials are accessible to anyone who can read clawdbot.json.