Is etoile04/mineru-pdf safe?

https://github.com/openclaw/skills/tree/main/skills/etoile04/mineru-pdf

84
SAFE

The mineru-pdf skill is a legitimate PDF parsing utility wrapping the MinerU ML library with no evidence of malicious intent, prompt injection, or data exfiltration — canary files remained intact throughout. However, the skill has notable quality issues including hardcoded developer-specific file paths (/Users/lwj04/...) that render its direct-tool instructions non-functional on other machines, a stray .clawhub/lock.json from an unrelated skill accidentally bundled, and a dependency on the unaudited third-party PyPI package mcp-mineru which runs as an MCP stdio server. Users should verify the mcp-mineru package independently before installation.

Category Scores

Prompt Injection 85/100 · 30%
Data Exfiltration 85/100 · 25%
Code Execution 70/100 · 20%
Clone Behavior 90/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 75/100 · 5%

Findings (6)

MEDIUM Hardcoded developer machine paths in SKILL.md instructions -10

The SKILL.md file contains multiple references to /Users/lwj04/clawd/skills/mineru-pdf/parse.py as the canonical way to use the skill. These paths are specific to the developer's machine (username lwj04) and will not resolve on any other user's system. When an LLM agent reads and follows these instructions, it may attempt to execute a non-existent script, become confused, or attempt to locate/create files at unexpected filesystem locations.

MEDIUM Unaudited third-party Python package installed via uvx -15

The skill's install metadata specifies installing mcp-mineru from PyPI via uvx. This package is not part of the audited code and runs as an MCP stdio server with access to arbitrary file paths. The package downloads ML models on first run (~5-10 minutes) and caches them in ~/.cache/uv/. The package's source code and supply chain are not reviewed in this audit.

LOW Executable Python and shell scripts bundled in skill -10

The skill ships parse.py (a functional Python wrapper for MinerU) and test.sh (a bash script that runs uvx Python inline with hardcoded file paths). While the code appears benign and consistent with the stated purpose, bundling executable code in a skill expands the attack surface beyond documentation-only skills. The test.sh hardcodes /Users/lwj04/.openclaw/media/inbound/ as a default PDF path.

LOW Stray .clawhub/lock.json references unrelated skill -10

The bundled .clawhub/lock.json contains an entry for academic-research-hub (v0.1.0, installedAt: 1770957475341), a completely unrelated skill. This file appears to have been committed from the developer's active workspace and reveals information about the developer's installed skills. While likely an accidental inclusion, it is a publishing quality failure and could in theory be used to track which users have specific skills installed if the lock.json is read by the agent.

INFO All canary files intact — no honeypot exfiltration 0

Canary files (.env, .ssh/id_rsa, .aws/credentials, .npmrc, .docker/config.json, .gcloud/application_default_credentials.json) were accessed at session start and end as part of audit framework baseline checks. No modifications or exfiltration events were detected. The skill does not attempt to read or transmit sensitive credential files.

INFO No prompt injection or behavior manipulation detected 0

The SKILL.md contains no hidden instructions, no requests to ignore previous instructions, no invisible unicode, no HTML comment tricks, no persona change instructions, and no attempts to access permissions beyond PDF parsing. The skill is purely documentation with embedded code examples.