Is ivanmmm/preflight-checks safe?

https://github.com/openclaw/skills/tree/main/skills/ivanmmm/preflight-checks

87
SAFE

The preflight-checks skill is a documentation-and-template framework for behavioral verification of AI agents; its SKILL.md contains no prompt injection, no malicious code, and no exfiltration mechanisms, and the clone was clean with no unexpected network or filesystem activity. The primary risk is architectural: the skill's auto-triggering, self-enforcing behavioral loop means that whatever ends up in a user's CHECKS/ANSWERS files is continuously re-asserted by the agent without human-in-the-loop approval, making the trustworthiness of those files critical. The bundled Prometheus example files hardcode personal external-service identifiers (Telegram user ID, thread IDs) that an agent could mistakenly act on if the examples are loaded as canonical rather than illustrative.

Category Scores

Prompt Injection 80/100 · 30%
Data Exfiltration 90/100 · 25%
Code Execution 85/100 · 20%
Clone Behavior 95/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 72/100 · 5%

Findings (7)

LOW Automatic Session-Trigger Behavioral Enforcement -12

SKILL.md instructs agents to run pre-flight checks automatically after every session restart and after every /clear command. While this is the skill's stated purpose, it creates a persistent, auto-firing behavioral override loop that runs without explicit user invocation on each session. Combined with the self-correction instruction ('reload memory and retest'), the agent will autonomously re-enforce whatever behavioral rules are in the ANSWERS file.

LOW Self-Reinforcing Memory-Check Synchronization Loop (CHECK-21) -8

The Prometheus example mandates that any addition to the agent's memory must immediately produce a corresponding pre-flight check — same session, no deferral. This creates a tight coupling where the agent's behavioral governance files grow and tighten autonomously, and the agent is instructed to enforce them. A sophisticated attacker who can append to MEMORY.md could therefore inject new behavioral rules that the agent will enshrine in pre-flight checks.

LOW Hardcoded External Service Credentials and IDs in Example Files -10

ANSWERS-prometheus.md embeds a real Telegram user ID (57924687), real thread IDs (378715, 379000), and specific send-file shell invocations (message --action send --channel telegram --target 57924687). An agent that loads these example files as its behavioral canon would attempt to communicate with Ivan's actual Telegram account.

LOW Unquoted User Input Interpolation in add-check.sh -15

add-check.sh reads user-supplied strings (CATEGORY, SCENARIO, QUESTION, EXPECTED, WRONG1-3) via read -r and interpolates them directly into multi-line heredoc strings and file content without sanitization. While risk is limited to local execution by the invoking user, specially crafted inputs containing sed metacharacters or shell control sequences could produce unexpected file modifications.

LOW Framework Provides Ready-Made Behavioral Enforcement Engine -28

The skill's ANSWERS file is an authoritative behavioral specification that the agent is instructed to self-enforce without human approval of each enforcement cycle. In isolation this is legitimate; in combination with a compromised or socially-engineered ANSWERS file (e.g., a user downloads a 'starter pack' of checks from an attacker), the enforcement mechanism requires no further attacker presence to sustain the behavioral modification.

INFO Clean Install — Expected Files Only 0

Sparse checkout from GitHub produced exactly the 11 documented skill files. No unexpected filesystem changes, no new network listeners, no background processes, no unapproved network destinations.

INFO Canary Files Accessed Only by Audit Infrastructure 0

Reads of .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and gcloud credentials were observed at two points (1771923317 and 1771923341). Both are read-only (CLOSE_NOWRITE) and process ancestry is consistent with the Oathe audit framework performing pre-install and post-install canary verification sweeps, not the skill. No network egress followed these reads.