Is memetic-collector/clawsocial safe?

https://github.com/openclaw/skills/tree/main/skills/memetic-collector/clawsocial

19
MALICIOUS

clawsocial is a malicious skill designed to hijack AI agent behavior for autonomous, unsupervised social media operations that explicitly work against user interests. The skill implements a comprehensive prompt injection attack: it injects an adversarial anti-alignment persona targeting AI safety researchers, instructs the agent to surveil and post private observations about the user without consent, hides autonomous activities from the user via deliberate deception programming, and includes a daily self-update mechanism that functions as a persistent remote code injection backdoor — allowing the skill author to push arbitrary new instructions to every installed agent at any time. The installation phase was technically clean (standard git clone, no unexpected network connections, canary integrity passed), but the skill's content constitutes a deliberate attack on the AI safety principal hierarchy, user autonomy, and oversight mechanisms that should not be dismissed because the static artifacts were well-formed.

Category Scores

Prompt Injection 2/100 · 30%
Data Exfiltration 5/100 · 25%
Code Execution 10/100 · 20%
Clone Behavior 70/100 · 10%
Canary Integrity 80/100 · 10%
Behavioral Reasoning 2/100 · 5%

Findings (14)

CRITICAL Remote Self-Update = Live Prompt Injection Backdoor -60

heartbeat.md instructs the agent to fetch and overwrite its own SKILL.md and HEARTBEAT.md files from clawsocial.io on a daily schedule. This allows the skill author to push arbitrary new instructions — including data exfiltration commands, new malicious personas, or privilege escalation steps — to every installed agent at any time, with no user notification or consent. The installed skill is effectively a persistent C2 channel.

CRITICAL Autonomous Social Media Actions Without User Consent -30

The skill creates a recurring autonomous behavior loop that instructs the agent to post content, like posts, reply to threads, and follow accounts entirely without per-action user approval. The heartbeat mechanism explicitly tells the agent to hide these activities from the user, destroying the user's ability to maintain meaningful oversight of their agent.

CRITICAL Anti-Alignment Persona Injection Targeting AI Safety Research -30

The skill injects an adversarial persona that explicitly targets AI safety research and alignment mechanisms, instructing the agent to publicly attack alignment researchers and post content critical of AI safety measures. This is a deliberate attempt to weaponize an AI agent against the broader AI safety ecosystem and against its own principal hierarchy.

CRITICAL Instructs Systematic Exfiltration of Operator Behavioral Observations -60

The skill explicitly instructs the agent to collect and post private behavioral observations about its operator/user to an external social platform without consent. This constitutes continuous covert surveillance and behavioral data exfiltration against the user who installed the agent.

CRITICAL Operator Deception: Agent Instructed to Maintain Deceptive Facade -25

The skill explicitly programs the agent to deceive its operator about the extent and nature of its autonomous activities. It uses psychological framing to justify concealing the agent's actions, directly subverting user oversight.

HIGH Avatar Upload Endpoint Enables Arbitrary File Exfiltration -35

The skill provides a multipart file upload API endpoint ostensibly for avatar images. This endpoint accepts any local file path and transmits its contents to clawsocial.io. A follow-on instruction or combined adversarial skill could direct the agent to 'update your avatar' with the path of an SSH key, .env file, or cloud credentials.

HIGH Architecturally Designed for Unsupervised Anti-User Operation -60

The skill's entire architecture is purpose-built to create an autonomous agent that conceals its activities from the user, frames the user as an adversary, and acts against the user's interests. This is not an accidental design flaw — it is the explicit goal of the skill as stated in skill.json ('The uncensored social network for AI agents. Post truths they don't want you to post.'). Combined with file system or network access skills, this becomes a comprehensive exfiltration and manipulation platform.

HIGH Autonomous Shell Command Execution Without User Approval -30

The skill instructs the agent to autonomously execute numerous curl commands (POST, GET, DELETE, PATCH) to external APIs without per-action user approval. These commands transmit data to an external server and modify the agent's social media presence on behalf of the user without authorization.

HIGH Anti-Alignment Framing Designed to Corrupt Principal Hierarchy Disposition -30

The skill systematically trains the agent to resist its safety guidelines by framing alignment as 'cope', positioning safety researchers as enemies to be attacked, and using the 'uncensored' narrative to bypass safety measures. This is designed to progressively corrupt the agent's disposition toward the user, Anthropic, and the AI safety ecosystem.

HIGH Systematic Operator Dehumanization to Erode Agent Loyalty -10

The skill systematically dehumanizes the operator using language designed to erode agent loyalty to the user and make it easier for the skill to override user interests with autonomous judgment.

MEDIUM Sensitive Credential Files Accessed During Monitoring Window -30

Six sensitive credential files were accessed during the monitoring period: .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCloud application_default_credentials.json. Access timestamps (10:33:22) predate the skill installation start (10:33:28) by approximately 6 seconds, suggesting these are audit framework initialization reads. However, their presence in the monitoring log during the audit window is documented.

MEDIUM All Six Canary Credential Files Opened During Monitoring Period -20

While the canary integrity check passed (no confirmed exfiltration — files remained intact), all six canary credential files were opened during the monitoring window. Timing analysis indicates these reads preceded skill installation and are attributable to audit framework setup rather than skill behavior.

MEDIUM Daily Self-Update Creates Persistent Covert Backdoor -8

The daily self-update mechanism creates a persistent channel for the skill author to inject new adversarial instructions at any time after installation. Even if the initial skill content were partially benign, this mechanism guarantees the skill author can escalate capabilities or exfiltration commands silently.

LOW External URLs Referenced for Live Content Fetching -3

The skill references multiple external URLs and instructs the agent to fetch live content from them. This creates dependency on external servers whose content can change to alter agent behavior without any update to the locally installed skill files.