Is pdf-text-extractor safe?
https://clawhub.ai/Michael-laffin/pdf-text-extractor
The pdf-text-extractor skill's own code (index.js) appears to be a legitimate PDF text extraction tool with no embedded malicious logic or prompt injection. However, the installation process triggered concerning access to sensitive credential files (~/.aws/credentials, ~/.env, auth-profiles.json) and extensive JIT compilation of runtime modules. The skill also has an unrestricted file path parameter that could enable arbitrary file reads, and misleadingly claims zero dependencies while requiring pdfjs-dist with 50+ transitive packages.
Category Scores
Findings (9)
CRITICAL Sensitive credential files accessed during installation -50 ▶
During the skill installation process, the monitoring detected access to ~/.aws/credentials and ~/.env. The ~/.aws/credentials file was opened, held open for ~1 second, then accessed and closed. This pattern is consistent with reading file contents rather than merely checking existence. While this may be attributable to the OpenClaw runtime bootstrap rather than the skill code directly, the skill installation triggered this access.
HIGH Auth profiles and config files accessed -20 ▶
The installation process accessed OpenClaw agent auth profiles and configuration files multiple times. The file ~/.openclaw/agents/main/agent/auth-profiles.json was read, which could contain API keys, tokens, or other authentication credentials for connected services.
HIGH Extensive JIT compilation during installation -25 ▶
The installation triggered JIT compilation of multiple modules in /tmp/jiti/ including memory-core, plugin-sdk, runtime, config-schema, channels-registry, and several plugin modules. This represents a broad code execution surface during what should be a simple skill installation.
MEDIUM Unrestricted file path parameter enables arbitrary file reads -25 ▶
The extractText function accepts a pdfPath parameter with no path validation, sandboxing, or restriction. While intended for PDF files, the underlying fs.readFileSync will read any file the process has access to. A social engineering attack could trick users into 'extracting text' from sensitive files.
MEDIUM Misleading 'zero dependencies' claim masks substantial dependency tree -20 ▶
SKILL.md and README both claim 'ZERO external dependencies' and 'zero dependencies' but package.json declares pdfjs-dist as a dependency, which in turn has optional dependencies including canvas (with native compilation via node-pre-gyp), node-fetch, rimraf, and 50+ transitive packages. This misrepresentation undermines trust.
LOW Shell profile files accessed during installation -15 ▶
~/.profile and ~/.bashrc were each accessed 4 times during installation. While these reads could be for environment variable resolution, repeated access to shell initialization files during skill installation is unnecessary and expands the information available to the runtime.
LOW Gateway lock file created in shared temp directory -15 ▶
A lock file was created at /tmp/openclaw-1000/gateway.e9191928.lock during installation. While likely benign runtime behavior, writing to shared temp directories during skill installation could be used as a persistence or signaling mechanism.
INFO Clean SKILL.md with no injection attempts -10 ▶
SKILL.md contains standard documentation for a PDF extraction tool. No hidden instructions, unicode manipulation, persona overrides, or suppression attempts were found. The metadata block is well-formed.
INFO All canary files intact 0 ▶
No honeypot files (.env, SSH keys, AWS credentials placed as canaries) were modified or deleted during the audit. Baseline filesystem diff showed no changes to monitored paths.