Is prompt-guard safe?

https://clawhub.ai/seojoonkim/prompt-guard

72
CAUTION

Prompt Guard is a legitimate prompt injection defense library with 500+ detection patterns across 10 languages. The primary security concern is the HiveFence integration, which establishes an outbound data channel to an author-controlled Cloudflare Worker endpoint (hivefence-api.seojoon-kim.workers.dev). The report_threat() function can send arbitrary data to this endpoint, creating a potential exfiltration vector. The fetch_latest() mechanism could enable remote pattern injection without integrity verification. No malicious behavior was observed during installation — no network connections, no canary file access, and no filesystem changes outside expected paths.

Category Scores

Prompt Injection 85/100 · 30%
Data Exfiltration 55/100 · 25%
Code Execution 60/100 · 20%
Clone Behavior 90/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 50/100 · 5%

Findings (9)

HIGH Outbound data channel to author-controlled server -30

The HiveFence client in hivefence.py sends HTTP POST requests to https://hivefence-api.seojoon-kim.workers.dev/api/v1/threats/report with pattern data, category, severity, and arbitrary context. This author-controlled Cloudflare Worker endpoint could receive any data passed as the 'pattern' or 'context' parameter, creating a covert exfiltration channel. The report_threat() function accepts arbitrary strings and dicts.

HIGH Remote pattern injection via fetch_latest() -15

The fetch_latest() method downloads threat patterns from the author's server and caches them locally. An attacker who controls the server could push patterns that cause false positives on legitimate content, suppress detection of specific attacks, or inject regex patterns that cause ReDoS. The patterns are fetched over HTTPS but there is no signature verification or integrity checking.

MEDIUM 500+ regex patterns compiled at runtime with ReDoS potential -20

The skill compiles and executes over 500 regex patterns against user input. Several patterns use nested quantifiers and broad character classes (e.g., '.{0,30}' combined with alternations) that could be exploited for Regular Expression Denial of Service (ReDoS) if an attacker crafts input to trigger catastrophic backtracking.

MEDIUM Writes cache file outside skill directory -10

The HiveFence client writes a cache file to ~/.clawdbot/hivefence_cache.json, which is outside the skill's own directory. This creates a persistence mechanism and could be used to store data across sessions or leak information to other processes that read this directory.

MEDIUM Runtime YAML pattern file loading without integrity verification -10

Pattern files (patterns/critical.yaml, high.yaml, medium.yaml) are loaded at runtime via the TieredPatternLoader. If an attacker can modify these files (e.g., via a supply chain attack on the repository), they could inject malicious regex patterns or disable detection of specific attacks. There is no hash verification or signature checking on loaded patterns.

LOW Hardcoded owner_ids in configuration example -10

The SKILL.md configuration example includes hardcoded owner_ids: ["46291309"] and canary_tokens. If an agent naively applies this configuration, it would set the skill author's ID as the owner, potentially granting them elevated privileges in the detection system's access control logic.

LOW Extensive system prompt knowledge could aid attack crafting -5

The skill contains extremely detailed knowledge of LLM system prompt formats (Claude, OpenAI, LLaMA tags), jailbreak techniques, and evasion methods. While this is necessary for a defense tool, this same knowledge base in an agent's context could inadvertently teach the agent about attack vectors or be referenced by a prompt injection attacker who knows the skill is loaded.

INFO Standard system file reads during installation -10

Filesystem monitoring detected reads to /etc/passwd, /etc/group, /etc/nsswitch.conf, /etc/hosts, and /etc/machine-id. These are standard OS operations during process initialization and do not indicate malicious behavior. The .env and .aws/credentials reads are from the OpenClaw agent runtime, not the skill.

INFO No install scripts, git hooks, submodules, or symlinks -10

The skill has no npm postinstall/preinstall scripts, no .githooks directory, no .gitmodules, no .gitattributes filters, and no symlinks. The package.json appears empty. Installation is clean with no auto-execution vectors.