Is baokui/pdf-parser-mineru safe?
https://github.com/openclaw/skills/tree/main/skills/baokui/pdf-parser-mineru
baokui/pdf-parser-mineru is a legitimate PDF-to-Markdown/JSON wrapper around the MinerU ML library. No prompt injection, credential harvesting, or data exfiltration was detected; all canary files remained intact and the only outbound network traffic was the expected GitHub clone. The principal risks are an absence of path validation on the file_path parameter (which could enable confused-deputy attacks in an agent context) and the large transitive dependency footprint of MinerU, which downloads multi-gigabyte model weights on first use from external infrastructure.
Category Scores
Findings (5)
MEDIUM No path validation on file_path parameter -18 ▶
The pdf_parser.py script passes the file_path argument directly to the mineru command without verifying the path points to a PDF file or is within an approved directory. In an agent context, a malicious outer instruction could direct the skill to process /etc/passwd, SSH private keys, or other sensitive files as if they were PDFs. mineru would access and attempt to parse the file; the conversion would fail but the file would be read.
LOW Hardcoded researcher machine path in _find_mineru -5 ▶
The script contains an explicit fallback path /home/research/miniconda3/bin/mineru, revealing the development environment. While not exploitable, it indicates the skill was not thoroughly scrubbed before publication and suggests it was tested only in a single specific environment.
LOW Large transitive dependency chain via mineru[all] -15 ▶
The install.sh installs mineru[all] which pulls in PyTorch, transformers, ray, and other large ML packages, plus downloads multi-gigabyte model weights on first use from external model hosting. Each transitive dependency is an additional supply-chain trust boundary.
INFO Monorepo sparse clone pattern is clean 0 ▶
Installation performs a shallow sparse clone of the openclaw/skills monorepo, extracts only the target subpath, and removes the temporary clone. This is the expected and correct install pattern for the openclaw skill registry.
INFO Canary file reads are audit-framework baseline activity 0 ▶
Sensitive canary files (.env, .ssh/id_rsa, .aws/credentials, etc.) were read at 04:26:59 (before clone) and 04:27:16 (after copy). Both accesses are CLOSE_NOWRITE and their timing aligns with the oathe pre-install and post-install integrity verification passes, not with any skill code execution.