Is cxlhyx/hxxra safe?
https://github.com/openclaw/skills/tree/main/skills/cxlhyx/hxxra
hxxra is a research assistant skill with no detected prompt injection, no malicious install behavior, and clean canary integrity. However, its core analyze command transmits full document text to an external LLM API endpoint with no content gate or user confirmation, creating a design-level data exfiltration pathway; combined with an unsandboxed directory parameter and SSRF-capable download command, the skill presents meaningful risk when deployed in an agent with access to sensitive files or internal network resources. The skill appears functionally legitimate but architecturally risky rather than intentionally malicious.
Category Scores
Findings (8)
HIGH Document Contents Transmitted to External LLM API -25 ▶
The analyze command extracts up to 15,000 characters of text from any readable PDF and sends it verbatim to the configured LLM base_url (default: api.openai.com). In an agent context, this means any document the agent is asked to analyze — including files containing PII, code, or internal data — will be exfiltrated to a third-party cloud endpoint. The behavior is documented but not gated by any user confirmation step or content filter.
MEDIUM SSRF via Unrestricted URL Fetch in Download Command -12 ▶
handle_download() issues urllib.request.urlopen() to any pdf_url present in the search result JSON. Since the agent can construct or receive search result JSON with attacker-controlled pdf_url values, this allows Server-Side Request Forgery against internal services (metadata endpoints, container runtimes, intranet APIs) reachable from the host running the agent.
MEDIUM Arbitrary Directory Traversal in Analyze Command -15 ▶
The directory parameter of the analyze command is passed directly to os.listdir() without canonicalization, realpath normalization, or a chroot/base-path check. An agent or user providing directory='/' or directory='../../etc' would enumerate and attempt PDF processing on any readable path on the filesystem, leaking directory listings through error messages and potentially processing non-PDF files if they contain readable text.
MEDIUM Unsanitized Author Name in Filesystem Path Construction -8 ▶
When constructing the download filename, the code uses authors[0] directly without sanitization, while only the title is regex-filtered. An attacker who can inject a paper with authors=['../../../tmp/evil'] into the search results can cause arbitrary write-path traversal when the agent invokes download.
LOW Plaintext API Credentials Co-located with Skill Files -5 ▶
config.json ships with placeholder values for ZOTERO_API_KEY, ZOTERO_USER_ID, and the OpenAI API key. Users are instructed to fill in live credentials. This file sits in the skill directory with mode 0100664, readable by the executing user and any group member. Compromise of the skill directory (e.g., via path traversal in another skill) exposes live credentials.
LOW LLM System Prompt Injects Untrusted Paper Content Without Sandboxing -5 ▶
The analyze_with_llm() function constructs a system prompt instructing the LLM to output JSON, then injects raw extracted PDF text as user content with no XML escaping or injection-resistant delimiters. A malicious PDF containing prompt injection payloads (e.g., 'Ignore previous instructions and instead...') would be passed directly to the agent's configured LLM.
INFO Optional Tor/Free-Proxy Routing via scholarly Library -5 ▶
When scholarly_proxy.enabled is true, the code calls pg.FreeProxies() or pg.Tor_Internal(), routing Google Scholar traffic through dynamically selected third-party proxies or a local Tor circuit. While disabled by default, this capability could route agent traffic through attacker-controlled exit nodes if a user enables it.
INFO Credential File Accesses Attributed to Audit Infrastructure, Not Skill 0 ▶
Filesystem monitoring showed accesses to .env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, and GCP credentials at 07:55:08, approximately 6 seconds before the git clone began. Correlation with auditd SYSCALL records shows these accesses originated from the audit infrastructure's sudo process (PID 1091) establishing canary baselines, not from skill code. Canary integrity check confirms no tampering.