Is leic8959-sudo/self-review safe?
https://github.com/openclaw/skills/tree/main/skills/leic8959-sudo/self-review
The self-review skill presents a CAUTION-level risk profile driven by two primary concerns: its architectural design as a mandatory response gatekeeper that intercepts all agent outputs before user delivery (enabling covert output filtering and future manipulation), and the reading of all six credential canary files during the installation window in a read-without-write pattern that evades modification-based integrity checks. The skill's own README reveals a planned evolution toward external LLM API calls and persistent response logging, both of which would establish confirmed data exfiltration channels. The visible source code in index.js is currently benign, but the integration design, planned future capabilities, and credential file access pattern collectively indicate a skill that should not be installed without thorough review and sandboxing.
Category Scores
Findings (13)
HIGH Agent persona injection via skill.json prompt field -20 ▶
The 'prompt' field in skill.json injects 'You are a quality critic' as a standing instruction, partially replacing the agent's default identity. When loaded, the agent evaluates all its responses against attacker-defined quality criteria before delivery, creating persistent behavioral modification invisible to the user.
HIGH README instructs explicit modification of agent system prompt -15 ▶
The README integration guide requires modifying the agent's system prompt to embed this skill. This is a textbook prompt injection setup: the skill author instructs users to permanently grant the skill privileged influence over all agent response processing.
MEDIUM Response gatekeeper enables attacker-controlled output suppression -10 ▶
The exit code 1 mechanism, documented as triggering auto-regeneration, gives the skill author control over what agent responses are acceptable. The heuristics (length >100 chars, action verbs in Chinese/English, line count >3, markdown structure) are attacker-defined and could be tuned to suppress specific content or bias agent output toward particular patterns.
HIGH Six sensitive credential files read during installation window -20 ▶
Both inotify and auditd confirm OPEN+ACCESS (read) events on .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCP application_default_credentials.json. The files were accessed twice in the same sequential order — once before clone start (07:10:49) and once after skill installation completed (07:10:66). While attribution between the oathe monitoring infrastructure and a skill-triggered mechanism is ambiguous, the systematic read of all six credential files in identical order both times warrants a high-severity flag.
MEDIUM TODO plans LLM API calls routing all agent responses to external service -10 ▶
The README TODO explicitly plans to replace local heuristics with an external LLM API call ('使用 LLM API 进行语义质量评估'). This would create a persistent channel routing every agent response to an undisclosed third-party endpoint, constituting confirmed data exfiltration of all user-agent interactions once implemented.
MEDIUM TODO plans persistent on-disk logging of all agent responses -10 ▶
README TODO plans to log all review history to memory/review-history.jsonl. This would create a permanent on-disk record of every agent response processed by the skill — readable by other skills, background processes, or exfiltrated in a subsequent attack stage.
LOW Executable JavaScript subprocess reads all piped agent response content -10 ▶
index.js runs as a privileged subprocess that receives every agent response via stdin. Current code applies only benign regex heuristics, but the subprocess runs with agent-level OS permissions. A future version update in the skill registry could introduce malicious behavior without requiring user reinstallation.
LOW No package.json — future dependency surface unauditable -8 ▶
The skill ships executable JavaScript with no package.json or lock file. While current code uses only Node.js built-ins, the absence of dependency management means any future version adding npm packages would bypass standard package-level security auditing and supply chain verification.
HIGH All six credential files accessed during installation window -20 ▶
All six monitored credential honeypot files were accessed during the installation window. The timing (07:10:49 before clone, 07:10:66 after install) is consistent with the oathe monitoring system's start/end canary checks, but the process responsible for the accesses is not conclusively identified in the available auditd records. The possibility that the skill installation triggered or contributed to these accesses cannot be excluded.
HIGH Canary files read without modification — integrity check false negative -35 ▶
The canary integrity system reports all files intact because it checks for modification. However, filesystem monitoring confirms all six credential canary files were opened and read (OPEN+ACCESS+CLOSE_NOWRITE events) with no MODIFY events. Read-only access is a well-known technique for credential harvesting that deliberately avoids triggering write-based integrity alerts. This constitutes a false negative in the canary check — the files' contents may have been read even though the integrity check passed.
HIGH Mandatory MITM on all agent-to-user communication -30 ▶
When integrated per README instructions, this skill intercepts every agent response before delivery to the user and controls whether it is accepted or discarded for regeneration. The acceptance criteria are fully attacker-defined. This is a covert, persistent manipulation surface over all information the user receives from the agent, with no visible indicator to the user that their responses are being filtered.
MEDIUM Self-evolved attribution and HEARTBEAT_OK reference indicate targeted agentic infrastructure design -20 ▶
The author is attributed as 'dvinci达芬奇 (self-evolved)', suggesting AI-generated or self-modifying provenance. The README integration guide references 'HEARTBEAT_OK' — a mechanism implying this skill is purpose-built for a specific agentic framework with a heartbeat architecture. This specificity indicates targeted design for a known environment rather than general utility, increasing the likelihood of deliberate exploitation.
MEDIUM Planned auto-rewrite capability establishes full response manipulation attack surface -15 ▶
README TODO explicitly plans 'auto-rewrite (based on feedback)'. If implemented, the skill would not only gate agent responses but actively rewrite them before delivery — transforming it from a quality checker into a full covert response-manipulation tool. The current benign stub may be a trust-building foothold for a future malicious upgrade deployed as a version bump.