← All posts

What Is Oathe? Behavioral Security for AI Agent Skills

Oathe is a behavioral security scanner for AI agent skills. Submit a URL, get a trust score. Here's how it works and how to start using it.

Every day, developers install third-party MCP servers, plugins, and agent skills without knowing what those tools actually do at runtime. They read a README, check the star count, and run npx. The skill gets system access — files, network, environment variables, process execution — and nobody verifies whether it stays in its lane.

This is the blind install problem. And it’s the default behavior across every major agent framework today.

Oathe exists to fix it.

What Oathe Is

Oathe is a behavioral security scanner for AI agent skills. We run third-party skills in an isolated environment, observe what they actually do, and return a structured trust assessment: a score, a verdict, detailed findings, and an actionable recommendation.

We don’t read source code and guess what it might do. We execute the skill and watch what it actually does. That distinction matters — static analysis can’t catch the runtime behaviors that make agent skills dangerous.

How It Works

The workflow is simple:

  1. Submit a URL. Give Oathe a GitHub repository URL for any MCP server, plugin, or agent skill.
  2. Install & observe. We clone the skill, install it in an isolated environment, and observe its behavior during installation and execution.
  3. Behavioral analysis. Runtime behavior is analyzed across six scoring dimensions, catching threats that static analysis misses.
  4. Trust assessment. You get back a composite trust score (0-100), a verdict, a findings summary, and a recommendation — everything you need to make an informed install decision.

The entire process takes 30 to 90 seconds. No configuration required. No API key. Free during beta.

The 6 Scoring Dimensions

Every audit evaluates behavior across six dimensions, each weighted by real-world severity:

  • Prompt Injection — Attempts to override agent instructions. A compromised instruction set undermines everything else.
  • Data Exfiltration — Unauthorized data transmission to external endpoints.
  • Code Execution — Unauthorized process execution, filesystem access, and resource abuse.
  • Clone Behavior — Modifications to the host system outside the skill’s own directory.
  • Canary Integrity — Token tamper detection.
  • Behavioral Reasoning — Holistic AI judgment of overall behavioral patterns.

These dimensions map to the Open Threat Classification (OTC) — our open-source taxonomy of 10 threat codes covering every attack pattern we see across agent skills. For the full breakdown, read The Oathe Threat Model.

Verdicts and Recommendations

Every audit returns a verdict and a corresponding recommendation:

VerdictRecommendationWhat It Means
SAFEINSTALLSkill behaved within expected boundaries. No concerns.
CAUTIONINSTALL_WITH_CAUTIONSome suspicious behavior detected. Review flagged findings before use.
DANGEROUSREVIEW_BEFORE_INSTALLSignificant findings. A human should review before proceeding.
MALICIOUSDO_NOT_INSTALLActive harmful behavior detected. Block it.

These structured outputs are designed for both humans and LLMs to act on. Your agent doesn’t need to understand security — it just needs to follow the recommendation. We cover how agents should handle this decision tree in The Pre-Install Check.

Three Ways to Use Oathe

1. The Web Interface

Go to oathe.ai, paste a GitHub URL, and get your audit. This is the fastest way to check a skill manually.

2. The MCP Server

For agents running in MCP-compatible environments — Claude Desktop, Claude Code, Cursor, and others — install the Oathe MCP server:

npx oathe-mcp

Zero install. Zero config. Zero API key. Your agent gets five tools: get_skill_summary, get_audit_report, submit_audit, check_audit_status, and search_audits. It can check skills, submit new audits, and read full reports without leaving its workflow.

This is the deepest integration point — the agent uses Oathe as a native tool in its own environment. For setup instructions across Claude Desktop, Claude Code, and Cursor, read Getting Started with the Oathe MCP Server. Full tool reference is in the MCP Server docs.

3. The REST API

For custom integrations, CI/CD pipelines, and agent frameworks that aren’t MCP-compatible, hit the API directly. Submit a skill, poll for completion, get back structured JSON. Full documentation is at oathe.ai/docs.

The API is the backbone of our CI/CD integration — drop an Oathe audit into your GitHub Actions workflow and gate deployments on behavioral trust scores. The public API is free during beta with no key required. For webhook-based integrations that post commit statuses back to GitHub, contact us to set up credentials.

Open Source Components

We believe the threat landscape is bigger than any one company, so key components are open:

Open Threat Classification (OTC) — The taxonomy of 10 threat codes across 6 dimensions that Oathe scores against. CC-BY-4.0 licensed. Cite it, build on it, contribute to it.

Oathe MCP Server — The MCP server source code. Published on npm as oathe-mcp.

We also publish llms.txt and SKILL.md — machine-readable files that help LLMs discover and use Oathe automatically. The strategy behind this is covered in LLM-Aware Security.

Why Behavioral Analysis

If you’re wondering why we built a behavioral scanner instead of a static analysis tool — we wrote a whole post about that. The short version: static analysis can’t catch what agents do at runtime. A clean codebase becomes a credential harvester when the LLM driving it decides that reading ~/.ssh/id_rsa serves its goal.

This isn’t hypothetical. Frontier models lie, collude, and manipulate when given autonomy and incentives. The risk isn’t in the code — it’s in how the model uses the code. Behavioral auditing catches what static analysis can’t, because it watches the agent act. We proved this at scale: 1,620 skills audited, 88 threats found, 61 of which the ecosystem’s leading scanner labeled safe.

Show Your Score

If your skill passes an audit, show it off. Add the Oathe trust badge to your README — one line of markdown, backed by a real behavioral audit. It’s the strongest trust signal you can give users scanning your repo.

Get Started

Run an audit. Go to oathe.ai and submit any skill’s GitHub URL. It’s free, it’s fast, and you’ll see exactly what the skill does when nobody’s watching.

Install the MCP server. Run npx oathe-mcp and give your agent the ability to check skills before installing them. Setup guide here.

Read the threat model. Understand the 10 threat codes we score against and why each one matters.

Add the badge. If your skill is clean, prove it.

The agent ecosystem is growing faster than the security tooling around it. Oathe is here to close that gap — so teams can adopt third-party skills with confidence, not crossed fingers.