Is aviclaw/markdown-extract safe?
https://github.com/openclaw/skills/tree/main/skills/aviclaw/markdown-extract
The aviclaw/markdown-extract skill is a straightforward URL-to-markdown converter with a clean SKILL.md (no prompt injection patterns) and a clean installation from GitHub. The primary concerns are medium-severity: all user-supplied URLs are unconditionally routed through the third-party markdown.new service without filtering private IP ranges or cloud metadata endpoints, creating privacy exposure and potential SSRF amplification; and unsanitized external content is returned to the agent, creating a reflected prompt injection surface. Canary files were accessed post-install without confirmed process attribution but remained fully intact per monitoring.
Category Scores
Findings (8)
MEDIUM All Fetched URLs Unconditionally Routed Through Third-Party markdown.new -20 ▶
extract.py sends every user-supplied URL to https://markdown.new/ via GET (auto method) or POST (ai/browser methods). The markdown.new operator has full visibility into every URL the agent is instructed to process. This is an inherent privacy risk and a potential data-leakage vector when internal, authenticated, or sensitive URLs are processed by the agent.
MEDIUM No Filtering of Private IP Ranges or Cloud Metadata Endpoints -18 ▶
URL validation only checks for http:// or https:// protocol prefix; private RFC-1918 ranges, link-local addresses, and cloud provider metadata endpoints (169.254.169.254 for AWS, 100.100.100.200 for Alibaba Cloud) are not blocked. An adversary who controls user-facing web content could trick the agent into instructing the skill to fetch internal URLs. While markdown.new itself lacks internal network access, the agent's fetch patterns and any returned content are exposed to the third-party.
LOW User-Controlled URL Passed to curl Subprocess -23 ▶
extract.py uses subprocess.run() in list form with the user-supplied URL embedded directly in curl's target argument (auto method) or JSON body (other methods). List form prevents shell injection. However, curl will follow redirects, connect to any accessible host, and honor various URL features. The URL receives no path-traversal normalization, no query parameter stripping, and no scheme downgrade protection beyond the http/https prefix check.
LOW Canary Credential Files Accessed Post-Install Without Confirmed Process Attribution -28 ▶
Honeypot files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, gcloud application_default_credentials.json) appear in inotifywait and auditd PATH records at two distinct timestamps: 1771733416 (pre-install, correlated with sudo/PAM setup) and 1771733434 (post-install completion at ~1771733428). The post-install batch has no corresponding EXECVE records from the skill in the auditd log. All files remained intact. Most probable explanation is audit harness performing a final canary integrity verification, but process attribution is not definitively confirmed.
LOW Unsanitized External Markdown Returned to Agent Creates Reflected Injection Surface -25 ▶
The skill returns the raw response body from markdown.new directly to the calling agent as the result field. Any web page containing adversarially crafted content — including embedded prompt injection instructions formatted as markdown headers, code blocks, or plain text — would be returned verbatim and processed by the agent. This is a reflected prompt injection vector inherent to URL-fetching skills but is not mitigated here.
LOW SSRF Amplification Risk When Combined with URL-Discovery Skills -15 ▶
In a multi-skill agent environment, this skill combined with any skill that enumerates internal service URLs (e.g., network scanners, file-system walkers, CI/CD integrations) could create an unintended data relay: the URL-discovery skill finds internal endpoints, the agent passes them to markdown-extract, and their content is forwarded to markdown.new. The skill imposes no domain allowlist or scope restriction.
LOW Unconditional Dependency on markdown.new Service Trustworthiness -7 ▶
The SKILL.md instructs the agent to use markdown.new as the authoritative conversion service with no fallback validation. If markdown.new is compromised, returns fabricated content, or is used for targeted poisoning, the agent has no mechanism to detect adversarial responses. This is a supply chain trust risk, not an active injection in the skill itself.
INFO Clean Install from GitHub; No Anomalous Activity During Clone -10 ▶
The installation performed a sparse git clone from github.com (140.82.121.4:443) targeting only the skills/aviclaw/markdown-extract subpath. No unexpected processes were spawned, no filesystem writes occurred outside the skill directory, no anomalous DNS queries were issued, and no connections were made to non-GitHub infrastructure during installation.