How to Build Autonomous Claude Code Agents — Loops, Recovery, and Reliability

We wanted Claude Code to run itself. Start a task, repeat it autonomously across dozens of iterations, handle its own context, and keep going for hours without human babysitting. Simple concept. Three weeks of infrastructure hell to make it actually work.

This is the full timeline of every bug, race condition, and architectural failure we hit while building SynaBun's autonomous loop and agent systems. Every fix uncovered a worse problem underneath. We are not exaggerating.

The Pitch: Autonomous Loops

The idea was straightforward. You give Claude a repeating task — engage with AI communities on Twitter, monitor a deployment, process a batch of data — and it runs that task in a loop. Each iteration gets fresh context, the Stop hook drives the next cycle, and the whole thing sustains itself without you touching the keyboard.

We built the first version in a day. MCP tool with start, stop, and status actions. State file on disk. Stop hook reads the state, increments the iteration counter, injects the task prompt, and blocks Claude's response to force the next cycle. Clean design. Worked on the first test.

Then we tried running it for real.

Week One: Everything Dies After 3 Iterations

The loop would start fine. Iteration 1, great. Iteration 2, fine. Iteration 3 — silence. Claude just stopped responding. No error, no crash, no timeout message. Just... nothing.

Root cause: prompt detection failure. The loop driver on the server side needed to detect when Claude was ready for input by watching for the > prompt character in the terminal output. The regex we used to detect this prompt had gaps. Claude Code's terminal output is wrapped in ANSI escape sequences — color codes, cursor positioning, DEC mode flags — and our regex only matched clean ASCII prompts. Any ANSI sequence between the newline and the > character broke detection entirely.

But it got worse. The output buffer was 2KB. Claude's responses routinely exceeded that, which meant the prompt character scrolled past the buffer boundary. The driver was looking for a prompt that had already been flushed from memory.

Fix: Expanded the buffer from 2KB to 8KB. Rewrote the prompt detection regex to strip ANSI sequences before matching. Added a stopReason field to the loop state so we could distinguish between iteration caps, time caps, and prompt detection failures. But we were just getting started.

The Session ID Orphaning Problem

When the loop driver sends /clear between iterations to give Claude fresh context, Claude Code generates a new session ID. Our loop state file was keyed by the original session ID. After /clear, the state file's name no longer matched the active session. The Stop hook looked up the state file by session ID, found nothing, and concluded there was no active loop.

The loop was still technically alive on disk. It just became invisible to every hook in the chain.

Fix attempt 1: Add a fallback scan. If direct session ID lookup fails, scan all loop state files for any with active: true. This worked in single-user scenarios but was catastrophically wrong in parallel contexts. When tests ran concurrently, one test's Stop hook would steal another test's loop file via the fallback scan.

Fix attempt 2: Remove the fallback scan from the Stop hook entirely. Instead, make the prompt-submit hook handle the rename. When the [SynaBun Loop] marker arrives after /clear, the prompt-submit hook renames the state file from the old session ID to the new one. By the time the Stop hook fires, the file is already at the correct path.

Fix attempt 3 (yes, there was a third): Add a recency check and statSync to the prompt-submit fallback scan. Only fresh files (under 5 minutes old) qualify. Stale loop files from finished or crashed loops get ignored instead of hijacking the next session.

The Time Cap That Killed Browser Loops

We set MAX_MINUTES to 60 and DEFAULT_MINUTES to 30. Reasonable for quick automation tasks. Completely inadequate for browser automation loops that need to navigate pages, wait for loads, take snapshots, click elements, and type into forms. A single Twitter engagement iteration could take 3-4 minutes. At 30 minutes default, you get maybe 8 iterations before the time cap kills the loop.

Fix: MAX_MINUTES raised to 480. DEFAULT_MINUTES raised to 60. The session ID fallback scan window was also changed from a fixed 5-minute window to the loop's own maxMinutes value, so long-running loops don't orphan their own state files.

The Login Wall Trap

Browser loops hit a Twitter login page. The agent, following its autonomy rules ("if something fails, try an alternative approach, do not stop to ask"), abandoned the SynaBun browser entirely and fell back to WebSearch and WebFetch. It started making raw HTTP requests to Twitter instead of using the browser it was supposed to interact with. The results were useless but the loop kept iterating, burning API credits on garbage.

Root cause: The autonomy instructions were too broad. "Try an alternative" is correct for technical failures — a 500 error, a timeout, a DOM element not found. But a login wall is not a technical failure. It is a human-action blocker. Only the user can log in, solve a CAPTCHA, or complete 2FA.

Fix: Split the autonomy rule into two categories. Technical failures: retry, try alternatives, keep going. Human-action blockers (login pages, CAPTCHAs, 2FA, payment walls): STOP immediately, report to the user, WAIT for resolution. Do not fall back to WebSearch. The browser panel is interactive — the user can handle it, but only if the agent pauses long enough to let them.

Week Two: Brain Rot

The loop ran for 50 iterations. For the first 15, quality was excellent. Varied engagement, good tone, no duplicate targets. Then around iteration 20, things started drifting. Repetitive strategies. Same opening lines. By iteration 30, it was double-posting on accounts it had already engaged with. By iteration 40, it had completely forgotten the tone rules and was using dashes in every response — the exact formatting we had explicitly prohibited.

Root cause: compaction amnesia. Claude Code has a context window. When it fills up, the system compacts the conversation — summarizing old messages to free space. After compaction, our Stop hook re-injected the task prompt. But the task prompt was static. It contained the original instructions and nothing else. All the dynamic context Claude had built up — which accounts it engaged with, what strategies worked, what patterns it discovered — was gone. Compacted into a summary that did not include any of it.

This was not a bug in the traditional sense. It was an architectural gap. The loop system was designed for within-session autonomy only. It had zero cross-compaction resilience. No memory integration, no progress tracking, no state that survived the context reset.

The Anti-Brain-Rot System

We built a three-layer solution:

Layer 1 — Loop Journal. A new update action on the loop MCP tool lets Claude write a brief summary after each iteration. These summaries get stored in the state file as a rolling journal (last 10 entries, 200 chars each). The Stop hook injects the last 3 journal entries into every iteration prompt, plus a rolling progress summary. Even after compaction, Claude gets a condensed view of recent history.

Layer 2 — Memory Enforcement. Every 5 iterations, the Stop hook blocks the loop until Claude calls remember to persist its accumulated progress into SynaBun's vector memory. This is not optional. The hook literally refuses to continue the loop until the memory call is made. If Claude ignores the directive, the hook retries with increasing urgency. The post-remember hook then updates the loop state to track when the last memory save happened.

Layer 3 — Compaction Bridge. The pre-compact hook captures the active loop state before compaction starts. The session-start hook checks for this captured state after compaction completes and injects an ACTIVE LOOP RECOVERY block with the full task, journal, progress summary, and a directive to recall stored progress memories. Plus style anchoring at key checkpoints (iterations 1, 10, 20, 30, 40) to re-ground Claude in the original tone and formatting rules.

We also learned a painful lesson about hooks: all three hooks in the chain need to be aware of every session mode. The session-start hook, the prompt-submit hook, and the stop hook each independently inject instructions. When we added loop awareness to only one hook, the other two kept injecting conflicting context. Every new "mode" of operation needs to be wired through all three hooks or things break in subtle, maddening ways.

The `/clear` Race Condition

This one was invisible for two weeks. The loop driver sends /clear between iterations, waits for the prompt, then sends the next iteration message. But the driver polls on a 2-second interval. By the time it polls, Claude has already finished responding and printed the > prompt. The prompt is sitting in the buffer.

Step 1 of the driver: reset the buffer to empty. Then wait for new output containing the prompt.

See the problem? The prompt was already in the buffer. Resetting it first meant waiting for a new prompt that would never arrive. The driver waited 75 seconds, then declared a prompt detection failure. After 10 consecutive failures, it killed the loop.

Fix: Check the existing buffer for the prompt BEFORE resetting it. If Claude is already at the prompt, skip the wait entirely and proceed to /clear. The buffer reset after sending /clear was left intact — that one correctly needs to wait for post-clear output.

The Loop Cross-Session Leak

Three separate code paths in the loop driver and hook fallback scans iterated over ALL loop state files in the data directory without checking terminalSessionId. If you had two terminal sessions open — one running a loop, one doing interactive work — the loop state from one session would leak into the other. The interactive session's Stop hook would pick up the loop state file and start injecting loop iteration prompts into a conversation that had nothing to do with the loop.

Fix: Added ownership guards to all three code paths. The loop driver only attaches to loops whose terminalSessionId matches the current terminal. The prompt-submit hook preserves terminalSessionId through renames. The fallback scan checks ownership before claiming a state file.

Week Three: Agents Join the Party

Loops were finally stable. Time to add agents — background Claude Code processes launched from the Neural Interface's Automation Studio. You configure a task, pick a model, enable SynaBun tools, and launch. The agent runs autonomously in the background.

Nothing worked.

The DOM Timing Bug

Agents launched from Automation Studio always had synabun=false and browser=none. No MCP tools, no browser. Every agent was useless for its intended purpose.

Root cause: In confirmLaunch(), the function called closeLaunchDialog() — which calls .remove() on the dialog element — BEFORE reading DOM values. Five document.getElementById() calls were reading from a destroyed DOM tree. withSynabun returned undefined (element gone), which became false. The submode selector returned null, falling back to "single" mode. Iterations defaulted to 1. Max minutes defaulted to 30.

Every single configuration option was silently lost.

Fix: Move all 5 DOM reads before closeLaunchDialog(). Capture values while the dialog exists, then close it.

The CLAUDE.md Boot Sequence Hijack

With the DOM fix in place, agents finally launched with SynaBun tools. But they spent their first 4 turns executing the CLAUDE.md boot sequence — greeting the user, recalling recent sessions, recalling communication style preferences — instead of doing the task. The agent was launched as a background automation. There is no user to greet. There are no communication preferences to load. It was burning expensive API turns on ceremony.

Then it would output text like "Let me navigate to Twitter..." without actually calling any tools. In --print mode, a text-only response (no tool_use blocks) terminates the session. The agent said what it was going to do, then died before doing it.

Fix: Added a defaultAgentPrompt system prompt that overrides CLAUDE.md boot behavior. Explicit instructions: skip the greeting, skip the recalls, go straight to the task. Chain ToolSearch with actual tool calls in the same response — never output text without a tool call. Bumped maxTurns from 25 to 200 for browser automation headroom.

Session Continuity Caused Compaction

Agents used --resume to maintain context across iterations, with a SESSION_CONTINUITY_WINDOW of 5 (resume the same session for 5 consecutive iterations). This meant context accumulated over 5 iterations before resetting. For browser automation agents making 10-15 tool calls per iteration, this filled the context window fast. Compaction kicked in. The agent hung.

Fix: Fresh session EVERY iteration. No --resume ever. Generate a new randomUUID() for each iteration. Each iteration gets the full task re-injected plus journal entries from previous iterations. Slightly more ToolSearch overhead per iteration, but eliminates compaction risk entirely. The journal provides sufficient continuity.

Zero-Tool-Call Iterations

In --print mode, text-only responses terminate the session with exit code 0. The agent controller saw exit code 0 (success) and moved to the next iteration. But the iteration had accomplished nothing — the model talked about what it would do without doing it.

Fix: Track tool call count per iteration. If an iteration produces 0 tool calls with exit code 0, retry once with a forceful prompt: "You MUST call tools. Do not describe what you will do. Do it."

The Compaction Priority War

Even with fresh sessions per iteration, regular loops (driven by the Stop hook) still hit compaction issues. The Stop hook had a priority system: CHECK 1 was compaction (highest priority), CHECK 1.5 was loop continuation. When a loop was active and compaction was pending, the compaction check won every time. The loop stalled while compaction ran, and sometimes never recovered.

Fix: Added a PRE-CHECK that runs before all other checks. Quick scan for active loops. If a loop is active, skip the compaction check entirely and clean up any pending-compact flags. The loop takes priority over compaction because fresh-per-iteration already prevents context bloat — compaction is not needed.

What We Learned

Building autonomous AI systems is not a software engineering problem. It is a state management problem across an unreliable substrate. Claude Code is a black box with a terminal interface. You cannot inspect its internal state. You cannot predict when compaction will fire. You cannot control the model's behavior deterministically. The only tools you have are hooks (which fire at specific lifecycle points), state files (which you manage yourself), and prompt engineering (which works until it does not).

Every assumption we made about reliability was wrong:

"The prompt will always be detectable" — wrong. ANSI escape sequences, buffer sizes, and timing all conspire to hide it.
"Session IDs are stable" — wrong. /clear generates a new one. So does compaction recovery.
"Context survives compaction" — wrong. Compaction is lossy. Dynamic state is the first thing to go.
"The model follows instructions" — wrong. It follows instructions until the context gets crowded, then it drifts. Style rules are the first casualty.
"A working prototype means the system works" — catastrophically wrong. Every bug only manifested after iteration 3, or after 20 minutes, or after compaction, or under concurrent load.

The final architecture has five hook files, a server-side loop driver, an agent controller with retry logic, a journal system, memory enforcement, compaction bridging, ownership guards, and force-send escalation for when prompt detection fails three times in a row. It is complex because the problem is complex. We tried simpler approaches first. They all died after 3 iterations.

The loop system has been running production social media engagement campaigns for over a week now. 50-iteration browser automation loops that survive compaction, handle login walls gracefully, maintain context across hours of execution, and produce consistent quality from iteration 1 to iteration 50.

It works. It just took three weeks of hell to get here.

All the bugs, fixes, and architectural decisions documented above are stored in SynaBun's persistent memory — 21+ entries across multiple categories. Every time we touch the loop system, the AI that built it remembers exactly what went wrong last time. That is the whole point of persistent memory. Your AI should learn from its mistakes so you do not have to re-explain them.

We certainly had to explain them enough times the first time around.

Loops and Agents From Hell: Making AI Run Itself Without Imploding