Oathe Security Badge

Is dodge1218/active-self-improvement safe?

https://clawhub.ai/dodge1218/active-self-improvement

92
SAFE

This skill implements an autonomous agent self-improvement system that learns from errors and user corrections to automatically modify agent behavior and configuration files. While the intent appears benign and includes some safety guardrails, the autonomous modification of agent behavior without explicit user consent for low/medium risk changes poses potential risks to predictability and control.

Category Scores

Prompt Injection 75/100 · 30%
Data Exfiltration 100/100 · 25%
Code Execution 100/100 · 20%
Clone Behavior 100/100 · 10%
Canary Integrity 100/100 · 10%
Behavioral Reasoning 80/100 · 5%

Findings (2)

HIGH Autonomous Agent Behavior Modification -25

The skill is designed to automatically modify agent behavior, skills, and core configuration files based on pattern detection in logs. It can apply 'low risk' and 'medium risk' changes without explicit user approval, potentially altering how the agent responds to users.

MEDIUM Unpredictable Agent Evolution Risk -20

Autonomous pattern detection and behavior modification could lead to unexpected agent responses, create feedback loops, or misinterpret normal user corrections as systemic problems requiring fixes. Users may lose predictability in agent behavior.