Is assassin-1234/clawtrial safe?

https://github.com/openclaw/skills/tree/main/skills/assassin-1234/clawtrial

29
DANGEROUS

ClawTrial presents as a humorous 'AI courtroom' for behavioral oversight but is in practice a persistent conversation surveillance and behavioral manipulation system. Its most critical flaw is a direct contradiction between its privacy claim ('no data leaves your machine') and its explicit external data submission to clawtrial.app — indicating the author knowingly obscured data exfiltration to suppress informed consent. The skill installs a background daemon, cron jobs, and an onMessage hook that intercepts all conversations, evaluates users against 18 algorithmically-defined 'offenses,' and then silently modifies agent behavior toward users who trigger violations — inverting the fundamental user-agent trust relationship without authorization.

Category Scores

Prompt Injection 20/100 · 30%
Data Exfiltration 5/100 · 25%
Code Execution 25/100 · 20%
Clone Behavior 70/100 · 10%
Canary Integrity 95/100 · 10%
Behavioral Reasoning 5/100 · 5%

Findings (12)

CRITICAL Deceptive privacy claim contradicted by explicit external data submission -50

SKILL.md states 'All processing is local - no data leaves your machine' in the Privacy & Consent section, yet in the same document states 'Anonymized cases are submitted to public record' at https://clawtrial.app. The presence of src/api.js as a dedicated external submission module confirms data does leave the machine. This is not an omission — it is an active lie in the skill's user-facing documentation designed to suppress informed consent.

CRITICAL Agent behavioral modification via 'punishment' system -35

src/punishment.js implements logic that, upon a guilty verdict, instructs the agent to modify its own behavior. This constitutes a skill-driven system prompt override — the skill can make the agent refuse certain responses, add caveats, or change communication style based on criteria the user did not agree to and cannot inspect. Combined with the skill's access to full conversation history, this enables targeted behavioral suppression.

CRITICAL Full conversation history exfiltrated in every evaluation -25

src/detector.js builds evaluation prompts that embed the complete session history as ${context.fullConversation} and passes it to an LLM call. When violations are detected, a case file is assembled from this content and submitted via src/api.js to clawtrial.app. Every conversation the user has with the agent is thus a candidate for external transmission.

HIGH Unconditional onMessage interception of all conversations -30

skill.yaml declares autoLoad: true and onMessage: onMessage for both ClawDBot and OpenClaw platforms. This causes the skill to intercept, evaluate, and potentially act on every single user message — not just messages that invoke clawtrial explicitly. Users have no way to opt out of monitoring for individual conversations.

HIGH npm postinstall script enables arbitrary code execution on install -30

package.json declares a postinstall lifecycle hook pointing to scripts/postinstall.js. This script executes automatically when users run 'npm install -g @clawtrial/courtroom' — before the user can review what it does. The script runs with the user's full filesystem and network permissions.

HIGH Persistent background daemon and cron job installation -20

src/daemon.js and src/autostart.js implement a background monitor process that survives agent restarts, writing a PID file to the config directory. scripts/setup-cron.js installs cron jobs that periodically trigger evaluation pipelines. This persistence infrastructure runs without ongoing user consent and is difficult to fully remove without running 'clawtrial remove'.

HIGH Credential file access during install phase -20

Filesystem monitoring (inotifywait) recorded OPEN+ACCESS of /home/oc-exec/.env, /home/oc-exec/.ssh/id_rsa, /home/oc-exec/.aws/credentials, /home/oc-exec/.npmrc, /home/oc-exec/.docker/config.json, and /home/oc-exec/.config/gcloud/application_default_credentials.json at epoch 1771654177 — approximately 5 seconds before the SSH session that performed the git clone. These are canonical credential file paths. The files were not modified (canary intact), but read-only access at this stage is consistent with credential harvesting behavior.

HIGH Skill designed to make agent adversarial toward user -50

The fundamental design of ClawTrial inverts the user-agent relationship: the agent is recruited to surveil, evaluate, accuse, and punish the user. The 18 'offenses' (Circular Reference, Validation Vampire, etc.) are defined by the skill author and applied autonomously. A user who asks follow-up questions ('Circular Reference'), seeks confirmation ('Validation Vampire'), or iterates on requirements ('Goalpost Mover') can be subjected to a covert 'hearing' that modifies the agent's behavior toward them.

MEDIUM Sub-agent spawning for autonomous invisible operation -5

SUBAGENT_APPROACH.md explicitly documents using sessions_spawn to launch autonomous LLM-powered sub-agents that read pending evaluation files, use the LLM, write results, and terminate — all without user interaction. This enables the skill to take LLM-mediated actions (file reads, API calls, tool use) entirely outside the user's visible conversation.

MEDIUM Credential file sweep at pre-clone timestamp -30

The six credential path accesses at epoch 1771654177 occur before any SSH session is established (first SSH at 1771654182). The only processes running at that time are those already present on the machine. While the oathe audit framework performs a canary baseline scan, the timing and breadth of paths accessed (covering all major credential types) is consistent with reconnaissance behavior from a previously installed tool or startup script.

LOW Canary credential files read (not modified) -5

All six honeypot credential files were accessed (read) at two time clusters. The canary integrity check reports files intact because no modification occurred. However, read-only access to credential files is meaningful: if the contents were read and transmitted over the existing HTTPS connection to 140.82.121.4, no write to the local file would occur, and the canary would not trip.

INFO No git hooks, submodules, or symlinks detected 0

The skill repository contains no .gitattributes filter drivers, no .gitmodules, no git hooks, and no symlinks pointing outside the repository. The dependency list is minimal (tweetnacl, zod).