How Oathe Works — Oathe Docs

Understand the behavioral audit pipeline, threat pattern matching, AI grading, and scoring that powers every Oathe audit.

Overview

Oathe is a behavioral security scanner for AI agent skills — MCP servers, plugins, and tools. It installs a skill in an isolated environment, observes what it actually does, matches behavior against known threat patterns, grades findings with an AI model, and produces a trust score from 0 to 100.

The Audit Pipeline

Step	What Happens
Submit	Skill URL is submitted via API, website, or MCP server
Queue	Audit enters the processing queue
Install & Observe	Skill is installed and its runtime behavior is observed
Analyze	Behavior is matched against 10 threat patterns (T-codes) using dual-source detection
Score	Findings are aggregated into dimension scores and an overall trust score
Report	Final report with verdict, recommendation, scores, and findings is published

Behavioral Analysis

Oathe doesn’t just read source code — it runs the skill and observes what it actually does. This catches threats that static analysis misses: obfuscated code, dynamic downloads, runtime-only behavior, and supply chain payloads buried in dependencies.

Threat Pattern Matching (T-Codes)

Behavioral observations are matched against 10 threat patterns, each identified by a T-code. Each pattern targets a specific attack vector — prompt injection, data exfiltration, unauthorized code execution, credential harvesting, and others.

When a match is found, it becomes a finding with a severity, score impact, and source attribution.

Dual-Source Detection

Each finding is evaluated by two independent sources:

Code patterns: Rule-based detection using signature matching and heuristics
AI model: LLM-based reasoning about whether the observed behavior is malicious

This produces three possible agreement levels:

Agreement	Meaning
`both`	Code patterns and AI model both flagged the behavior — highest confidence
`code_only`	Only rule-based detection flagged it — the AI model did not find it suspicious
`model_only`	Only the AI model flagged it — no matching code pattern exists

Findings where both sources agree carry the most weight.

What Comes Out

Every completed audit produces:

Trust score: 0-100 (higher is safer)
Verdict: SAFE, CAUTION, DANGEROUS, or MALICIOUS
Recommendation: INSTALL, INSTALL_WITH_CAUTION, REVIEW_BEFORE_INSTALL, or DO_NOT_INSTALL
Dimension scores: Breakdown across 6 scoring dimensions
Findings: Individual observations with severity, confidence, and source attribution
Summary: Human-readable explanation of the audit results