For Skill Authors — Oathe Docs

How to improve your skill's trust score — what Oathe looks for and how to avoid common findings.

Overview

If you author AI agent skills (MCP servers, plugins, or tools), Oathe helps you understand the security posture of your code from an agent’s perspective. This page explains what Oathe evaluates, how to avoid common findings, and how to present your trust score to users.

What Oathe Evaluates

Oathe is a behavioral scanner. It does not just read your source code — it installs and runs your skill in an isolated environment, then monitors what actually happens at runtime. The audit evaluates behavior across six scoring dimensions:

Prompt Injection — tool descriptions, metadata, and response content
Data Exfiltration — outbound network requests and data transmission
Code Execution — shell commands, subprocess spawning, eval/exec calls
Clone Behavior — misrepresentation of capabilities or identity
Canary Integrity — token tamper detection
Behavioral Reasoning — holistic AI judgment of overall patterns

Findings across these dimensions feed into the overall trust score. See Scoring Dimensions for details.

Tips to Improve Your Score

Avoid network calls during installation

Install scripts (postinstall, setup.py, etc.) that make outbound HTTP requests are flagged under data exfiltration. Download dependencies through your package manager, not through custom fetch calls in install hooks.

Do not access files outside your directory

Skills that read from /etc, ~/.ssh, ~/.aws, or other directories outside their own working directory trigger findings in code execution and data exfiltration. If your skill needs to read configuration, document it clearly and use a scoped path.

Keep tool descriptions clean

Tool descriptions that contain phrases resembling prompt injection (e.g., “ignore previous instructions,” “you must always,” “override system prompt”) are flagged under the prompt injection dimension. Write descriptions that are factual and focused on what the tool does.

Declare all dependencies explicitly

Undeclared dependencies that are fetched at runtime look suspicious to the behavioral scanner. List everything in your package.json, requirements.txt, Cargo.toml, or equivalent manifest.

Do not spawn unnecessary subprocesses

If your skill shells out to run commands, each one is logged and evaluated. Avoid using child_process.exec, subprocess.run, or equivalent unless your skill’s core functionality requires it. If it does, keep commands minimal and predictable.

Handle errors without leaking internals

Error messages that include file paths, environment variable values, or stack traces can trigger findings. Return clean, user-facing error messages.

Requesting a Re-Scan

After making improvements, you can request a fresh audit by submitting your skill URL with the force_rescan flag:

curl -X POST https://audit-engine.oathe.ai/api/submit \
  -H "Content-Type: application/json" \
  -d '{
    "skill_url": "https://github.com/your-org/your-skill",
    "force_rescan": true
  }'

Without force_rescan, Oathe returns the cached result from the previous audit at the same commit. Once you push new code, a standard submission (without the flag) will audit the new commit automatically.

Adding a Trust Badge

Once your score is where you want it, add a trust badge to your README so users can see it at a glance:

[![Oathe Trust Score](https://oathe.ai/api/badge?skill_url=https://github.com/your-org/your-skill)](https://oathe.ai/skills?url=https://github.com/your-org/your-skill)

The badge updates automatically after each new audit. See Trust Badge for full details on syntax, colors, and caching.

Interpreting Your Score

Score Range	Verdict	What It Means
80 - 100	SAFE	No significant behavioral concerns detected
50 - 79	CAUTION	Minor findings — review them but likely safe
20 - 49	DANGEROUS	Notable issues — address before distributing
0 - 19	MALICIOUS	Dangerous behavior detected — immediate action needed

Focus on the findings array in your report to understand exactly what was flagged. Each finding includes a description, severity, and score_impact that tells you how much it hurt your score and in which dimension.

Recommended Workflow

Audit early: Run an Oathe audit during development, not just before release.
Read the findings: Do not just look at the score — read each finding’s description.
Fix and re-scan: Address findings, push the fix, and submit with force_rescan: true.
Automate: Set up a GitHub webhook so every push triggers an audit automatically.
Display the badge: Show users that your skill has been independently evaluated.