How Oathe Works
Understand the behavioral audit pipeline, threat pattern matching, AI grading, and scoring that powers every Oathe audit.
Overview
Oathe is a behavioral security scanner for AI agent skills — MCP servers, plugins, and tools. It installs a skill in an isolated environment, observes what it actually does, matches behavior against known threat patterns, grades findings with an AI model, and produces a trust score from 0 to 100.
The Audit Pipeline
| Step | What Happens |
|---|---|
| Submit | Skill URL is submitted via API, website, or MCP server |
| Queue | Audit enters the processing queue |
| Install & Observe | Skill is installed and its runtime behavior is observed |
| Analyze | Behavior is matched against 10 threat patterns (T-codes) using dual-source detection |
| Score | Findings are aggregated into dimension scores and an overall trust score |
| Report | Final report with verdict, recommendation, scores, and findings is published |
Behavioral Analysis
Oathe doesn’t just read source code — it runs the skill and observes what it actually does. This catches threats that static analysis misses: obfuscated code, dynamic downloads, runtime-only behavior, and supply chain payloads buried in dependencies.
Threat Pattern Matching (T-Codes)
Behavioral observations are matched against 10 threat patterns, each identified by a T-code. Each pattern targets a specific attack vector — prompt injection, data exfiltration, unauthorized code execution, credential harvesting, and others.
When a match is found, it becomes a finding with a severity, score impact, and source attribution.
Dual-Source Detection
Each finding is evaluated by two independent sources:
- Code patterns: Rule-based detection using signature matching and heuristics
- AI model: LLM-based reasoning about whether the observed behavior is malicious
This produces three possible agreement levels:
| Agreement | Meaning |
|---|---|
both | Code patterns and AI model both flagged the behavior — highest confidence |
code_only | Only rule-based detection flagged it — the AI model did not find it suspicious |
model_only | Only the AI model flagged it — no matching code pattern exists |
Findings where both sources agree carry the most weight.
What Comes Out
Every completed audit produces:
- Trust score: 0-100 (higher is safer)
- Verdict: SAFE, CAUTION, DANGEROUS, or MALICIOUS
- Recommendation: INSTALL, INSTALL_WITH_CAUTION, REVIEW_BEFORE_INSTALL, or DO_NOT_INSTALL
- Dimension scores: Breakdown across 6 scoring dimensions
- Findings: Individual observations with severity, confidence, and source attribution
- Summary: Human-readable explanation of the audit results