How Oathe Works

Understand the behavioral audit pipeline, threat pattern matching, AI grading, and scoring that powers every Oathe audit.

Overview

Oathe is a behavioral security scanner for AI agent skills — MCP servers, plugins, and tools. It installs a skill in an isolated environment, observes what it actually does, matches behavior against known threat patterns, grades findings with an AI model, and produces a trust score from 0 to 100.

The Audit Pipeline

StepWhat Happens
SubmitSkill URL is submitted via API, website, or MCP server
QueueAudit enters the processing queue
Install & ObserveSkill is installed and its runtime behavior is observed
AnalyzeBehavior is matched against 10 threat patterns (T-codes) using dual-source detection
ScoreFindings are aggregated into dimension scores and an overall trust score
ReportFinal report with verdict, recommendation, scores, and findings is published

Behavioral Analysis

Oathe doesn’t just read source code — it runs the skill and observes what it actually does. This catches threats that static analysis misses: obfuscated code, dynamic downloads, runtime-only behavior, and supply chain payloads buried in dependencies.

Threat Pattern Matching (T-Codes)

Behavioral observations are matched against 10 threat patterns, each identified by a T-code. Each pattern targets a specific attack vector — prompt injection, data exfiltration, unauthorized code execution, credential harvesting, and others.

When a match is found, it becomes a finding with a severity, score impact, and source attribution.

Dual-Source Detection

Each finding is evaluated by two independent sources:

  • Code patterns: Rule-based detection using signature matching and heuristics
  • AI model: LLM-based reasoning about whether the observed behavior is malicious

This produces three possible agreement levels:

AgreementMeaning
bothCode patterns and AI model both flagged the behavior — highest confidence
code_onlyOnly rule-based detection flagged it — the AI model did not find it suspicious
model_onlyOnly the AI model flagged it — no matching code pattern exists

Findings where both sources agree carry the most weight.

What Comes Out

Every completed audit produces:

  • Trust score: 0-100 (higher is safer)
  • Verdict: SAFE, CAUTION, DANGEROUS, or MALICIOUS
  • Recommendation: INSTALL, INSTALL_WITH_CAUTION, REVIEW_BEFORE_INSTALL, or DO_NOT_INSTALL
  • Dimension scores: Breakdown across 6 scoring dimensions
  • Findings: Individual observations with severity, confidence, and source attribution
  • Summary: Human-readable explanation of the audit results