Auto Mode: The Safety Classifier

⏱ Est. reading time: 3 min Updated on 5/8/2026

In Auto mode, a background classifier model reviews every action to block those that exceed your intent or risk your infrastructure.

How it works

flowchart TD
  A[Action] --> B{Custom Rules?}
  B -- Match --> R[Follow Rules]
  B -- No Match --> C{Safe Local Action?}
  C -- Yes --> X[Execute]
  C -- No --> CL[Safety Classifier]
  CL -- Safe --> X
  CL -- Risky --> N[Notify & Block]

Key Protections

  • Blocked by Default: Production deployments, sending sensitive data to external endpoints, bulk cloud storage deletion, and irreversible file destruction.
  • Allowed by Default: Workspace file edits, installing manifest dependencies, and reading .env (if used locally).

The Human Factor

The classifier also respects your explicit instructions in the conversation. If you say "don't push to main", the classifier will block it even if the default rules would allow it.

Warning: Context compaction might remove your instruction from history, causing the classifier to "forget" it. Use permissions.deny for permanent safety.