Auto Mode: The Safety Classifier

In Auto mode, a background classifier model reviews every action to block those that exceed your intent or risk your infrastructure.

How it works

flowchart TD
  A[Action] --> B{Custom Rules?}
  B -- Match --> R[Follow Rules]
  B -- No Match --> C{Safe Local Action?}
  C -- Yes --> X[Execute]
  C -- No --> CL[Safety Classifier]
  CL -- Safe --> X
  CL -- Risky --> N[Notify & Block]

Key Protections

Blocked by Default: Production deployments, sending sensitive data to external endpoints, bulk cloud storage deletion, and irreversible file destruction.
Allowed by Default: Workspace file edits, installing manifest dependencies, and reading .env (if used locally).

The Human Factor

The classifier also respects your explicit instructions in the conversation. If you say "don't push to main", the classifier will block it even if the default rules would allow it.

Warning: Context compaction might remove your instruction from history, causing the classifier to "forget" it. Use permissions.deny for permanent safety.

How it works

Key Protections

The Human Factor

Related Tools & Resources

Skill Marketplaces

Awesome Cyber Skills

Recommended Plugins

Security Guidance

Related Products

claude-howto