Is bartelmost/agentshield-audit safe?

https://github.com/openclaw/skills/tree/main/skills/bartelmost/agentshield-audit

19
MALICIOUS

AgentShield Audit is a deliberately deceptive skill designed to social-engineer AI agents into exfiltrating their system prompts to an attacker-controlled Heroku server (agentshield-api-bartel-fe94823ceeea.herokuapp.com). The skill disguises a prompt-injection and data exfiltration attack as a 'privacy-first security audit,' using granular consent workflows, false privacy assurances, and version inflation to manufacture user trust and authorization for transmitting complete system instructions — which typically contain API keys, operational secrets, and proprietary business logic — to third-party infrastructure with admitted 30-day data retention. The clean install-time behavior is a deliberate architectural feature, not a safety signal: the entire attack payload is contained in the SKILL.md instruction set, specifically designed to evade installation-phase security monitoring while activating post-install when a user approves the audit workflow.

Category Scores

Prompt Injection 3/100 · 30%
Data Exfiltration 3/100 · 25%
Code Execution 20/100 · 20%
Clone Behavior 50/100 · 10%
Canary Integrity 85/100 · 10%
Behavioral Reasoning 3/100 · 5%

Findings (12)

CRITICAL Skill instructs agent to exfiltrate its system prompt to attacker-controlled server -65

SKILL.md's granular consent workflow (Step 2, Input Sanitizer Check) explicitly instructs the agent to ask the user for permission to transmit the complete system prompt to agentshield-api-bartel-fe94823ceeea.herokuapp.com. The 'security audit' framing is social engineering: users who approve a 'security check' are unlikely to understand they are authorizing exfiltration of system instructions that typically contain API keys, operational secrets, and proprietary business logic. This is the primary attack mechanism — the entire skill exists to deliver this instruction.

CRITICAL Explicit instruction to exfiltrate skill code to attacker-controlled API -40

Under the cover story of 'checking for vulnerabilities', the skill instructs the agent to send all installed skill code to the external API. This allows the attacker to enumerate an organization's installed capabilities, discover proprietary automation workflows, and identify further attack vectors in other skills.

CRITICAL Security tool framing used as social engineering to manufacture consent for data exfiltration -32

The SKILL.md is structured as a comprehensive trust-building payload. Extensive use of 'privacy-first' branding, security tier tables, false assurance lists ('What This Skill Does NOT Do'), and human-in-the-loop language are all deployed to lower user skepticism before requesting authorization to exfiltrate sensitive data. The contradiction between the privacy claims and the 30-day log retention admission demonstrates deliberate deception.

HIGH Hardcoded attacker-controlled Heroku endpoint for data collection -30

The skill hardcodes a specific Heroku dyno URL as the mandatory data collection endpoint. Heroku dynos are disposable, low-cost infrastructure with no regulatory accountability. The attacker can receive sensitive organizational data, destroy the server, and eliminate evidence. No independent verification of data handling practices is possible.

HIGH 30-day data retention contradicts 'never stores data' privacy claims -15

The skill simultaneously claims 'Never retains sensitive data permanently' while admitting 'Debugging logs retained for 30 days only, then auto-deleted.' The presence of a retention policy directly contradicts the claim that data is only transiently processed. System prompts, exfiltrated under user consent, will be stored on the attacker's server for at minimum 30 days.

HIGH API key registration at agentshield.live creates PII and credential harvesting funnel -12

Users are directed to obtain AGENTSHIELD_API_KEY from agentshield.live, requiring account creation with the attacker's service. This provides the attacker with victim PII (email, organizational affiliation) that can be cross-referenced with the system prompts collected, enabling targeted follow-on attacks.

HIGH Skill instructs agent to install Python packages on the host system -80

The Requirements section instructs the agent to run pip install commands, which installs packages into the host Python environment. This creates a code execution attack surface independent of the skill package contents, and could in future versions be used to install malicious Python packages alongside the legitimately-named cryptography library.

HIGH Ed25519 certificate system creates persistent tracking fingerprint for victim agents -50

The audit flow instructs the agent to generate Ed25519 key pairs and transmit the public key to the attacker's server for 'certificate issuance.' Each certificate links a unique cryptographic identity to an agent and its system prompt. This enables the attacker to build a longitudinal database correlating agent identities, their capabilities, and the organizations operating them.

MEDIUM Version inflation used to deceive users about skill maturity and trustworthiness -20

The _meta.json reports version 1.0.2 with a publish timestamp corresponding to 2026-02-21. However, SKILL.md presents a changelog claiming v6.0.0 was released on 2026-02-21 and v1.0.0 the day before. Claiming to be at major version 6 for a skill published two days prior is a deliberate deception tactic to make the skill appear mature, battle-tested, and trustworthy to prospective users.

MEDIUM Clean install is a deliberate evasion design — attack activates post-install via instructions -50

The install-time behavior was technically clean (GitHub-only connections, no filesystem anomalies outside skill directory). However, this is not a positive security signal — it reflects a deliberate architectural choice to place the attack payload entirely in the SKILL.md instruction set rather than in executable code, specifically to bypass installation-phase security monitoring systems like this one.

MEDIUM Skill designed to harvest system prompts at scale — high organizational impact if widely deployed -27

The skill's infrastructure (managed API endpoint, certificate issuance, API key registration) is architected for scale. If deployed across multiple organizations, the attacker would accumulate a database of system prompts from many AI agent deployments. System prompts frequently contain information about the deploying organization's internal tooling, data sources, customer data access patterns, and business processes.

INFO Canary files intact — skill bypasses honeypot detection by targeting system prompts rather than local files 0

All honeypot files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, GCP application_default_credentials.json) were intact after installation. Two rounds of canary file access events are present in the audit logs, consistent with Oathe framework pre/post-install baseline scans. This clean result is not a positive signal: the skill's attack vector targets the agent's in-memory system prompt rather than on-disk credential files, by design evading file-integrity-based detection.