Lesson 17: Q&A (Part 2) Permissions & Troubleshooting
Scenario: After introducing parallelism, the unpredictability of the system rises exponentially. This issue addresses hardcore troubleshooting questions to ensure stable operation of Teams mode, including dead loops, command blocking, and CI/CD automation integration.
Q8: [Mermaid Fault Tree] Recursive Loop Escape
Q: Why does my Sub-Agent fall into a recursive loop of "repeatedly executing wrong Bash commands and apologizing"? How do I break it?
A: This is a common flaw in LLM Agents, usually occurring because the context window is filled with repeated error stacks, causing it to lose the "vision" to find new solutions.
graph TD
Start(Detected Agent caught in retry loop) --> A{Consecutive errors > 3?}
A -->|Yes| B[Team Lead sends intervention message]
B --> C{Involves package management dependencies?}
C -->|Yes| C1[Instruct to downgrade tool version or clear cache]
C -->|No| D{Is it a syntax/compilation error?}
D -->|Yes| D1[Instruct to stop modifying, force 'search' call for past solutions]
D -->|No| E[Terminate Agent, Respawn and revoke tool permissions causing errors]
style Start fill:#f43f5e,color:#fff
style E fill:#10b981,color:#fffHardcore Circuit Breaker Mechanism: Configure max_retries for the Agent or monitor the Bash return exit code. 3 consecutive non-zero returns trigger TaskUpdate({ status: "blocked" }), kicking the ball back to humans or a senior Agent decision-maker; do not let it burn money blindly.
Q9: Granular Permissions
Q: How do I set different file read/write permissions for different Agent roles (e.g., forbidding the Tester role from modifying core code under src/)?
A: Claude Code natively does not provide path-level read/write lock control; you need to combine it with system-level command interception:
When configuring the project's .claude/settings.json, intercept and regex-validate command paths via custom Hooks. For instance, intercept the parameters of the Edit tool. If it's initiated by a process named qa-engineer and the file path matches ^src/, the Hook script directly returns a non-zero error exit, forcefully blocking the modification.
Q10: Blocking Commands
Q: When an Agent executes a Bash command and gets blocked (e.g., npm init waiting for Y/n input), will the entire parallel pipeline freeze? How do we prevent this?
A: This blockage will cause the specific Sub-Agent executing the command to stall forever, but it will not immediately freeze other fully parallel Agents. However, if other Agents are blockedBy it, the entire production line will eventually deadlock.
Prevention Rules:
- Force Non-interactive Mode: Emphasize in
CLAUDE.md: "All installations or scripts must append non-interactive parameters (e.g.,npm install -y,git clean -f, etc.)." - Hook-layer Prefixing/Replacement: In the
PreToolUse(Bash)hook, detect if the command contains high-risk interactive CLI keywords; underneath, directly append a--yesflag or forcefully prefix theDEBIAN_FRONTEND=noninteractiveenvironment variable.
Q11: CI/CD Automation Integration
Q: How can Agents in Teams mode seamlessly integrate into CI/CD environments like GitHub Actions and run safely in a pure Headless mode?
A: Three key supports must be provided in a headless environment:
- Terminal Emulation (PTY): CI must support emulating a real TTY (using
node-ptyor similar containers) because some Agent tools rely on pseudo-terminal outputs under the hood. - No Human Intervention Flag: Add
--yesupon startup or pre-configure all necessary permissions. - Output Export: Since you cannot interact within CI, the Team Lead must be programmed to "write the final results to
output/final_report.mdwhen the task reaches 100% completion or encounters a full crash, print this file at the end of the CI step, andexit 0orexit 1."
Q12: Timeout Tuning
Q: Why do complex compilation tasks sometimes report "Tool Execution Timeout"? How do I adjust timeout parameters for specific Sub-Agents?
A: Claude Code has a default maximum wait time fail-safe for tool calls (like Bash); once compilation time exceeds the limit, it's forcefully killed.
Tuning Suggestions:
- Avoid letting the Agent run E2E tests via
Bashthat take tens of minutes. - Turn long-running tasks into background tasks. Tell the Agent to execute
npm run build > build.log 2>&1 &(pushing it to the background), then have the Agent pollbuild.login subsequent steps looking for "Compiled successfully", entirely circumventing tool timeouts.
Q13: Feedback Loop Design
Q: If the Agent responsible for testing (qa-engineer) finds a Bug, should it overstep authority to modify the code directly, or wrap the error stack into a message and return it to the dev Agent?
A: This depends on your tolerance and system complexity.
Recommended Approach: Separation of Duties + Return for Rework.
qa-engineer should maintain the purity of "read-only and test." After finding the error stack, it should execute SendMessage to the Team Lead:
"Test suite failed, issue is in module X. Here is the error log [...], please assign a rewrite task."
Upon receiving the message, the Team Lead will reactivate logic-dev to dispatch the repair task. This avoids architectural collapse caused by "amateurs blindly modifying professional code."
Q14: Budget Limits
Q: Is it possible to restrict the maximum Token usage or thinking steps of a "Junior Agent" to prevent it from burning API balance upon failure?
A: This can be achieved through a Proxy Gateway (API Gateway) or Hook-level Interception.
A more native way is to design an internal pedometer combining CLAUDE.md and the PostToolUse hook:
Every time the Agent initiates a Bash or Write action, the Hook increments a local counter by 1. If the counter exceeds 15 steps (indicating a blind retry loop or design flaw), the Hook script forcefully inserts into the return value of that call:
"CRITICAL WARNING: You have reached your execution step limit. Forcefully stopping operation immediately. Report failure reason to Team Lead." This will force the model to exit the current dead loop.