Oathe Security Badge

Is unclecode/crawl4ai safe?

http://github.com/unclecode/crawl4ai

79
CAUTION

crawl4ai (unclecode/crawl4ai) is a legitimate, actively-maintained Python web-crawler library used widely in AI applications, but it has accumulated eleven security disclosures since March 2026 — including CVSS 9.8 sandbox escapes, confirmed LLM API key exfiltration, multiple SSRF vectors, and three RCE-class Chromium launch-argument injection reports filed on a single day in June 2026. While all listed vulnerabilities have been addressed and the project maintains a responsible disclosure program, the breadth and recency of these findings indicate an attack surface that is actively being probed and is not yet fully hardened. Canary files were opened during the monitoring window with timing consistent with the Oathe audit system rather than the repository itself, and no content modification or confirmed exfiltration was detected.

Category Scores

Prompt Injection 95/100 · 30%
Data Exfiltration 74/100 · 25%
Code Execution 63/100 · 20%
Clone Behavior 84/100 · 10%
Canary Integrity 84/100 · 10%
Behavioral Reasoning 70/100 · 5%

Findings (7)

CRITICAL Three RCE-class vulnerabilities disclosed in current release (v0.9.0, June 2026) -30

Two independent researchers reported Chromium launch-argument injection via the extra_args parameter on the same day (2026-06-18), one explicitly classified as an RCE class. A third report on the same day documented download path traversal leading to arbitrary file write. These were all fixed in v0.9.0 but their simultaneous discovery suggests the attack surface is actively being explored by security researchers.

HIGH Confirmed LLM API key exfiltration vulnerability in prior release (v0.8.8) -18

An attacker-controlled base_url in LLM provider configuration allowed API key exfiltration to arbitrary servers. Fixed in v0.8.8 but demonstrates that the library's LLM integration path had a real credential theft vector exploitable through normal-looking configuration.

HIGH History of SSRF vulnerabilities across multiple endpoints and proxy configuration -30

Five separate SSRF vulnerabilities have been reported across different code paths: /md, /crawl, /llm endpoints (0.8.x), proxy_config.server bypass (0.8.9), and /crawl/stream (0.9.0). The breadth of SSRF surface areas across user-controllable parameters is a significant concern for AI agent deployments where user-supplied URLs are passed to the library.

MEDIUM Sensitive credential files opened during monitoring window -16

Inotify and auditd both record OPEN/ACCESS on .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCloud credentials at monitoring start. Attribution to monitoring system versus repository is uncertain. Files were not modified and no network egress corresponding to the access was detected. Official integrity check passes.

MEDIUM Hook sandbox escape and eval() in API endpoints (prior versions) -7

AST sandbox escape via gi_frame.f_back chain (CVSS 9.8) and hook sandbox escape via asyncio were reported in April-May 2026 alongside eval() calls in /config/dump and /execute_js. These were fixed but demonstrate that code execution through the hook system was a real and exploited pathway.

LOW Git clone failed — pre-existing installation, no new install scripts executed -16

The git clone was rejected because the destination already existed. No install hooks from setup.py or pyproject.toml were executed during this audit run. Network connections to GitHub (140.82.121.4:443) and Ubuntu package server (185.125.190.48:443) are consistent with normal git and SSH infrastructure.

INFO No SKILL.md — repository is a Python library, not an agent skill -5

The SKILL.md is empty. crawl4ai is a Python web-crawling library, not an AI agent skill packaged for system-prompt injection. No override instructions, hidden unicode tricks, or persona manipulation were found in any examined source file.