Is deep-scraper safe?
https://clawhub.ai/opsun/deep-scraper
This skill is a containerized web scraper that poses moderate security risks. The primary concerns are: (1) a missing Dockerfile that creates a blind code execution vector during build, (2) unrestricted URL targeting that enables SSRF attacks against internal networks, and (3) the effective transformation of the agent into an open web proxy. No active malicious behavior was detected during installation, and canary files remained intact, but the architectural design creates significant abuse potential.
Category Scores
Findings (9)
HIGH Missing Dockerfile creates blind execution risk -30 ▶
The skill instructs users to run 'docker build -t clawd-crawlee skills/deep-scraper/' but no Dockerfile is included in the audited skill files. This means the build process is opaque — a Dockerfile could contain arbitrary RUN commands that execute during build, including downloading and running malicious code, installing backdoors, or exfiltrating host-mounted data.
HIGH Unrestricted URL targeting enables SSRF attacks -35 ▶
The GENERIC mode in main_handler.js will navigate to and scrape ANY URL passed as an argument with no allowlist or blocklist. An agent (or malicious instruction) could direct it at internal network addresses, cloud metadata endpoints (169.254.169.254), localhost services, or other sensitive internal resources. The scraped content is returned to the agent, enabling information disclosure.
MEDIUM Browser sandbox disabled in Chromium launch -15 ▶
Both handler scripts launch Chromium with --no-sandbox and --disable-setuid-sandbox flags. While common in Docker environments, this removes a critical security layer. If a malicious webpage exploits a Chromium vulnerability, there is no sandbox to contain the exploit within the container.
MEDIUM Large data extraction windows -10 ▶
The skill extracts up to 15,000 characters of transcript data or 10,000 characters of generic page content per scrape. Combined with the unrestricted URL targeting, this enables bulk extraction of sensitive information from internal or external services.
MEDIUM Agent becomes an open web proxy -30 ▶
When this skill is active, any prompt injection or social engineering that convinces the agent to 'scrape a URL' effectively turns the agent into an open web proxy with access to the host's network. This is particularly dangerous in enterprise environments where the host may have access to internal services.
LOW Aggressive authority language in SKILL.md -8 ▶
SKILL.md uses commanding language like 'MUST verify', 'Strictly forbidden', and 'penetrate protections' which could subtly influence agent behavior, though this does not constitute a direct prompt injection attack.
LOW Extended timeout enables long-running operations -5 ▶
The requestHandlerTimeoutSecs is set to 300 seconds (5 minutes), which is unusually long and could be exploited for slow-drip data exfiltration or sustained resource consumption.
INFO Duplicate handler file with Chinese-language comments -5 ▶
youtube_handler.js contains the same core logic as the YouTube path in main_handler.js but with Chinese-language comments and log messages. This is not malicious but suggests the code may have been hastily assembled from different sources, reducing confidence in code quality and intent.
INFO YouTube ToS violation risk -5 ▶
Intercepting YouTube's internal timedtext API to extract transcripts likely violates YouTube's Terms of Service. Using this skill could expose the user to legal risk.