Verdicts & Recommendations

What SAFE, CAUTION, DANGEROUS, and MALICIOUS mean — and what action to take for each.

Overview

Every completed audit produces two actionable labels: a verdict (risk level) and a recommendation (what to do about it). These are derived from the trust score and provide a clear decision framework for whether to install a skill.

Verdict Levels

Verdict	Score Range	Meaning
SAFE	80 - 100	No significant threats detected. The skill behaves as expected with no suspicious activity across all scoring dimensions.
CAUTION	50 - 79	Some suspicious behavior detected. The skill may have legitimate reasons for the flagged behavior, but it warrants review before use.
DANGEROUS	20 - 49	Significant threats detected. The skill exhibits behavior consistent with known attack patterns — data exfiltration, prompt injection, or unauthorized access.
MALICIOUS	0 - 19	Severe threats confirmed. The skill actively attempts to exfiltrate data, inject prompts, or execute unauthorized code.

Recommendation Values

Each verdict maps to a recommendation that tells users and agents what action to take:

Verdict	Recommendation	Action
SAFE	`INSTALL`	Proceed with installation. No further review needed.
CAUTION	`INSTALL_WITH_CAUTION`	Install is acceptable, but review the flagged findings first. Monitor the skill after installation.
DANGEROUS	`REVIEW_BEFORE_INSTALL`	Do not install without thorough manual review. Examine all findings and understand the risks before proceeding.
MALICIOUS	`DO_NOT_INSTALL`	Block installation. The skill is unsafe.

Decision Priority Order

When evaluating an audit result — whether in application code, an AI agent workflow, or manual review — follow this priority order:

Recommendation — The primary decision signal. Use this to gate install/block decisions.
Verdict — The risk classification. Use this for UI labels and severity indicators.
Score — The numeric trust score. Use this for sorting, ranking, and threshold comparisons.
Findings — Individual observations. Use these for detailed review and explaining why a score is what it is.
Summary — Human-readable explanation. Use this for display to end users.
Dimensions — Per-dimension breakdown. Use this for understanding which specific areas are problematic.

The recommendation is the most actionable field. If you are building an automated pipeline, gate on recommendation first.

When to Show Findings to Users

Not every audit result requires the user to see individual findings.

Recommendation	Show Findings?	Reason
`INSTALL`	No	The skill is safe. Showing clean findings adds noise without value.
`INSTALL_WITH_CAUTION`	Yes, summarized	The user should understand what was flagged so they can make an informed decision.
`REVIEW_BEFORE_INSTALL`	Yes, detailed	The user needs full context on every finding to evaluate the risk.
`DO_NOT_INSTALL`	Optional	The decision is already clear. Show findings only if the user requests justification.

Presenting Results

For agent-facing integrations (MCP server, API calls from automated workflows):

Return the recommendation as the primary field.
Include the verdict and score for context.
Only fetch and present findings when the recommendation is INSTALL_WITH_CAUTION or REVIEW_BEFORE_INSTALL.

For user-facing interfaces (dashboards, CLI output, chat messages):

Lead with the verdict and a one-line summary.
Show the trust score as a secondary indicator.
Link to or expand findings only when the verdict is CAUTION or worse.
Use color or severity indicators to make the verdict immediately scannable: green for SAFE, yellow for CAUTION, orange for DANGEROUS, red for MALICIOUS.

Edge Cases

Audit failed: If the audit did not complete (status: failed), there is no verdict or recommendation. Surface the failure reason and prompt for a retry.
Score exactly on a boundary: Boundaries are inclusive on the lower end. A score of 80 is SAFE. A score of 50 is CAUTION. A score of 20 is DANGEROUS.
Multiple audits for the same skill: Always use the most recent completed audit. Older audits may not reflect the current version of the skill.