⚡ AgentUpdate Blog

From 'Enter-Key Babysitter' to Goal-Driven Architecture: A Deep Dive into Claude Code and Codex /goal Evolution

From 'Enter-Key Babysitter' to Goal-Driven Architecture: A Deep Dive into Claude Code and Codex /goal Evolution

In the early stages of AI-assisted programming, developers often found themselves trapped in the dilemma of "Enter-key Babysitting": you input a command, the AI executes a small snippet of code, and then pauses, waiting for you to review and type "continue." Under this interaction model, humans remained the core dispatcher (Human-in-the-loop), and the AI was merely a smarter input method.

In mid-2026, as Claude Code and Codex successively built in the /goal command, AI programming tools officially evolved from "continuous execution" to "Convergence by Acceptance Criteria". This is not merely the addition of a command, but a fundamental restructuring of the software engineering paradigm.

I. Paradigm Shift: What is True "Goal-Driven"?

Traditional AI programming is Instruction-Driven, whereas /goal introduces a Goal-Driven approach. Its core lies not in making the AI run a few more iterations, but in defining a Verifiable End-state.

Core Logic: User sets the goal $\rightarrow$ Agent executes automatically across multiple turns $\rightarrow$ Independent model/mechanism verifies $\rightarrow$ Loops until achieved or budget is exhausted.

In this mode, the Agent acquires Persistent Execution capabilities. It will autonomously read code, execute Shell commands, run tests, fix errors, and continuously retry until the criteria you set are met.

II. Architectural Confrontation: Independent Judge vs. Environmental Self-Audit

Although both Claude Code and Codex have introduced /goal, their underlying implementation philosophies are entirely different, which determines their reliability boundaries when handling complex tasks.

1. Claude Code: Judge Mode

Claude Code adopts a "restrained and decoupled" design. It completely separates the executor from the acceptor:

  • Main Agent: Responsible for the concrete dirty work, such as writing code and running tests.
  • Evaluator: Defaults to the lighter, faster-responding Haiku model. It does not have tool-calling permissions and determines whether the goal has been achieved solely by reading the Transcript Evidence.

Technical Implementation: Claude utilizes a session-level Stop Hook. When the Main Agent attempts to end a task, the hook intercepts the action and triggers the Evaluator. The Evaluator must output JSON conforming to the following structure:

{
  "ok": true, 
  "reason": "Specific evidence cited from the transcript proving the conditions have been met"
}

### 2. Codex: Self-audit Mode
Codex, on the other hand, leans towards an **"environmental evidence-oriented"** approach. It does not rely on an independent judge but constrains the Main Agent through a mandatory **Completion Audit**.

*   **Runtime Continuation**: Codex persists the goal state in a local database (e.g., SQLite). Before each turn begins, the system automatically injects a Continuation Prompt, forcing the model to verify against the actual system Artifacts.
*   **Tool Integration**: Codex encourages the model to directly utilize environmental tools (e.g., `grep`, `test-runner`) to find evidence, rather than relying solely on textual descriptions.

| Dimension | Claude Code (`/goal`) | Codex (`/goal`) |
| :--- | :--- | :--- |
| **Decision Authority** | Independent Evaluator Model (Haiku) | Main Agent Itself (GPT-4.5/5.5) |
| **Evidence Source** | Transcript text only | Real file system and tool outputs |
| **State Management** | Session-scoped | Thread-level persistence (Persisted State-DB) |
| **Core Philosophy** | Prevent model self-deception | Reinforce environmental fact-checking |

## III. Practical Guide: How to Write "Bulletproof" Goal Conditions?

The effectiveness of `/goal` highly depends on the quality of the conditions. A vague goal (e.g., "write the code well") will cause the AI to enter an infinite loop or hallucinate. Excellent `/goal` conditions should follow this **Golden Formula**:

> **Goal = Task Scope + Acceptance Criteria + Verification Command + Boundary Constraints**

### Recommended Template:
/goal Complete the migration of [Feature Module], requirements:
1. Implement the three core logics: A, B, and C;
2. Ultimately, `npm test` must be run with no FAIL in the output;
3. Verification method: Paste the complete test coverage report;
4. Constraints: Do not modify any public interface signatures under the src/api/ directory;
5. Fallback: If not completed within 20 turns, stop and output the current blocking points.

Pitfall Avoidance Guide:

  • Unverifiable Vocabulary: Eliminate the use of subjective terms like "more elegant," "more efficient," or "production-ready." The Evaluator cannot see your aesthetics; it can only see Exit Codes and string matches.
  • Missing Evidence Requirements: Since Claude's Evaluator only looks at the transcript, you must explicitly require the AI to "paste the test results"; otherwise, the Evaluator will reject it due to "insufficient evidence."

IV. Advanced Configuration: The Combo of Permissions and Automation

To achieve true "unattended operation," /goal usually needs to be used in conjunction with Auto Mode:

  1. Auto Mode + /goal: Auto Mode automatically approves tool calls (e.g., writing files, running Shell commands), and /goal automatically drives the turn loops. Only by combining the two can you confidently step away for a cup of coffee.

  2. Pre-approved Safe Commands: Configure a whitelist in .claude/settings.json to reduce unnecessary confirmation pop-ups:

    {
      "permissions": {
        "allow": ["Bash(npm test *)", "Bash(git status)"]
      }
    }
    
  3. Non-interactive Mode: In CI/CD or scripts, using claude -p "/goal ..." can directly complete long-running tasks and exit.

V. Conclusion: The Future of Software Engineering is "Feedback Engineering"

The popularization of the /goal command marks a shift in the role of programmers. Code is becoming cheap, while Criteria are becoming expensive. In the future, the core competency of senior developers will no longer be the speed of hand-writing code, but the ability to define goals, design Feedback Loops, and build automated agent workflows.

As Norbert Wiener, the father of cybernetics, implied: do not try to chat with machines; give them precise feedback. In the goal-driven era, remember: Clear criteria beat ten thousand Prompts.