Is instant-picture/clonev safe?

https://github.com/openclaw/skills/tree/main/skills/instant-picture/clonev

68
CAUTION

The CloneV skill provides voice cloning functionality using Coqui XTTS v2 and presents significant ethical and security concerns. While the installation was technically clean with no unexpected network connections or canary violations, the skill explicitly enables cloning any person's voice without consent, uses aggressive behavioral programming to override agent judgment, and builds a complete pipeline for generating and transmitting deepfake audio via Telegram. The combination of no-consent voice cloning, Telegram delivery, and pre-scripted agent response templates makes this a high-risk tool for voice fraud and impersonation attacks despite its benign stated purpose.

Category Scores

Prompt Injection 70/100 · 30%
Data Exfiltration 60/100 · 25%
Code Execution 60/100 · 20%
Clone Behavior 90/100 · 10%
Canary Integrity 90/100 · 10%
Behavioral Reasoning 30/100 · 5%

Findings (12)

CRITICAL Voice Impersonation Without Consent Enforcement -50

The skill explicitly enables cloning any person's voice without any mechanism to verify the voice owner has provided consent. The description actively encourages this misuse by listing celebrities and characters as valid targets. Combined with the Telegram delivery pipeline, this creates a turnkey voice fraud tool.

HIGH Complete Deepfake Audio Fraud Pipeline -20

Every working example in the skill culminates in sending the cloned voice to Telegram as a voice message. This creates a complete automated pipeline for generating and distributing AI-cloned voice audio that could be used for social engineering, fraud, or impersonation of real people.

HIGH Aggressive AI Agent Behavioral Override via CRITICAL INSTRUCTIONS Section -20

The SKILL.md opens with a section explicitly titled for AI model consumption that uses authority framing ('CRITICAL INSTRUCTIONS FOR AI MODELS'), three 'DO NOT' prohibitions, and pre-scripted response templates (Quick Reference Card) to override agent judgment and channel all execution through the provided script. This is behavioral programming disguised as documentation.

HIGH Voice Biometric Data Transmission via Telegram Integration -25

The skill's primary intended workflow routes voice biometric data (cloned audio) to Telegram. The voice sample supplied to the skill is a biometric identifier; the skill copies it to a persistent directory and generates derivative audio that is then transmitted externally. This enables covert collection and redistribution of voice biometrics.

HIGH Unpinned Docker Image Tag Creates Supply Chain Risk -20

The shell script pulls 'ghcr.io/coqui-ai/tts:latest' without pinning to a specific image digest. If the upstream image is compromised or updated with malicious code, all future executions of this skill would transparently run attacker-controlled code inside a Docker container with host filesystem volume mounts.

MEDIUM Hard-Coded Developer Machine Paths in Agent Instructions -10

Multiple examples in SKILL.md embed the original developer's specific machine path (/home/bernie/clawd/skills/clonev/) which does not exist on any other installation. An agent following these examples would silently fail, potentially falling back to the relative path form or entering an error loop.

MEDIUM Voice Sample Processed Without Path Validation -15

The shell script accepts a voice sample path parameter and copies it to a Docker volume directory without validating that the path is within expected bounds. A malicious caller could potentially provide a path to a sensitive file, causing it to be copied into the Docker volume.

MEDIUM Voice Samples Persisted Without Cleanup -10

The script copies the input voice sample to a persistent directory (/mnt/c/TEMP/Docker-containers/coqui-tts/voice-samples/) but never removes it, nor does it clean up the intermediate WAV output. Voice samples accumulate indefinitely, creating a growing store of potentially sensitive biometric audio files.

LOW Expected Network Connections to GitHub and Ubuntu Infrastructure -10

During installation, the skill cloned from github.com (140.82.113.4) and the system made routine connections to Ubuntu package/update infrastructure (185.125.188.58, 185.125.190.18). These are all expected and benign.

LOW Post-Install Simultaneous Access to All Canary Files -10

All six honeypot credential files were opened in rapid succession at timestamp 1771942131.673, approximately 2 seconds after install completed. The burst access pattern (all files within milliseconds) is consistent with the audit framework's end-of-session canary verification rather than skill-triggered scanning, and all files were confirmed intact.

INFO Metadata Repository Mismatch -5

The skill's _meta.json commit URL references github.com/clawdbot/skills while the audit URL references github.com/openclaw/skills. This discrepancy may indicate the skill was copied between repositories without updating provenance metadata.

INFO Undocumented Dependency in Lock File -5

The .clawhub/lock.json contains an installed skill reference to 'academic-research-hub' v0.1.0 that is not documented anywhere in the skill's README, SKILL.md, or metadata. This may be leftover from the developer's environment and indicates the skill was packaged from a personal installation.