The Pre-Install Check: How LLMs Should Evaluate Third-Party Skills

Right now, when a user tells an AI agent to install a third-party skill, the agent just… does it. No security check. No behavioral analysis. No second opinion. It fetches the package, wires it up, and hands an untrusted tool access to your system.

That’s insane. And it’s the default behavior for almost every agent framework in production today.

The Trust Gap

Think about what happens when you install a skill. You’re giving an autonomous system a new capability — one that might read files, make network requests, spawn processes, or access environment variables. The agent trusts the skill implicitly because nobody told it not to.

This is the software equivalent of hiring someone without a background check because they had a nice resume. The resume (the skill’s README) says all the right things. But you have no idea what the skill actually does when it runs. This is why agent security is a behavioral problem, not a code review problem. In our audit of 1,620 skills, every one of the 88 threats had a clean installation phase — the attacks lived in instructions, not code.

Agents need a way to evaluate skills before installation. Not after something goes wrong — before. And that evaluation needs to be behavioral, not cosmetic.

What Oathe Returns

When you submit a skill to Oathe’s audit engine, you get back a structured assessment:

Trust Score (0-100). A composite behavioral score based on how the skill actually behaves in an isolated environment, evaluated across six scoring dimensions: prompt injection, data exfiltration, code execution, clone behavior, canary integrity, and behavioral reasoning. Threat detection is powered by the Open Threat Classification.
Verdict. One of SAFE, CAUTION, DANGEROUS, or MALICIOUS — a human-readable summary of how much you should worry.
Findings. Specific security observations from the audit — what files the skill touched, what network calls it made, whether it tried to access things it shouldn’t. Plus a one-line summary of the overall assessment.
Recommendation. The actionable output your agent actually needs.

The recommendation field is where this gets practical. It returns one of four values:

Recommendation	What It Means
`INSTALL`	Skill behaved cleanly. No concerns.
`INSTALL_WITH_CAUTION`	Minor flags, but generally safe. Review the flagged findings first.
`REVIEW_BEFORE_INSTALL`	Significant findings. A human should look at this before proceeding.
`DO_NOT_INSTALL`	Dangerous or malicious behavior detected. Block it.

This is the kind of structured output LLMs are good at reasoning over. Your agent doesn’t need to understand security — it just needs to follow a decision tree.

The Agent Workflow

Here’s what a responsible install flow looks like:

User says: “Install the github-issues skill.”
Agent calls Oathe’s audit engine with the skill identifier.
Oathe returns a trust score, verdict, findings, and recommendation.
Agent acts on the recommendation:
- INSTALL — proceed automatically.
- INSTALL_WITH_CAUTION — install but inform the user of the flags.
- REVIEW_BEFORE_INSTALL — surface the findings to the user and ask for confirmation.
- DO_NOT_INSTALL — refuse the install and explain why.

The agent never installs blindly. Every third-party skill passes through a behavioral check first. This is the kind of guardrail that should be built into every agent framework by default.

Three Ways to Integrate

We’ve made this as easy as possible to wire up, regardless of how your agent is built.

1. The Oathe MCP Server. Run npx oathe-mcp and your MCP-compatible agent gets audit capabilities as a native tool. Zero config, no API key. The agent can call Oathe’s audit engine the same way it calls any other skill — because it is a skill.

2. The Oathe Agent Skill. Drop our SKILL.md into your agent’s context and it automatically triggers a pre-install check whenever the agent detects an install intent. No custom code required — the behavior is defined in the skill instructions themselves. This is part of our broader LLM-aware security strategy, which also includes discovery via oathe.ai/llms.txt.

3. The REST API. For custom agent frameworks or CI/CD pipelines, hit the Oathe API directly. Submit a skill, poll for completion, and get back a structured JSON response with the trust score, verdict, findings, and recommendation. For previously audited skills, use the summary endpoint to get instant results without triggering a new audit. Build whatever logic you want on top of it.

All three paths return the same audit data. Pick the one that fits your stack.

Build the Reflex

The goal here isn’t to slow agents down. It’s to give them a reflex — a built-in instinct to check before they trust. The same way a developer runs npm audit before deploying, your agent should check Oathe before installing.

The difference is that agents can do this automatically, every single time, with zero friction. You set it up once and every future install goes through the check. No human has to remember. No step gets skipped.

Oathe is free during beta. We want as many agent builders as possible to start building this behavior into their systems now — before the ecosystem gets big enough for the trust gap to become a real crisis.

Try an audit and see what your skills are actually doing. Once your skill passes, add a trust badge to your README so users know it’s been verified.