We Skinned Claude Code. It Took 6 Days and Our Sanity.

Claude Code lives in the terminal. It renders with React and Ink, a framework that draws UI components to stdout using ANSI escape codes and a WASM-compiled Yoga layout engine. It is beautiful in the way a brutalist building is beautiful: functional, opinionated, and absolutely not interested in your preferences.

We wanted to put it in a browser. A real GUI. Chat bubbles, tool cards, permission prompts, a thinking indicator that does not vanish mid-thought. What followed was six days of debugging undocumented flags, fighting platform-specific spawn behavior, watching the same bug come back from the dead three times, and learning that the distance between "working prototype" and "actually works" is measured in profanity.

Day 1: Research, or How We Found Out This Was a Bad Idea

The first question was obvious: can you skin Claude Code? The answer was no. The terminal UI is not a theme layer you can swap out. It is the application. React components rendering to an Ink instance that writes directly to your terminal buffer. There is no "just change the CSS." There is no CSS.

We launched six research agents in parallel. They dissected Claude Code's internals, analyzed the Agent SDK streaming protocol, and reviewed nine community GUI wrappers that had already attempted some version of this: Claudia (Tauri, YC-backed), CodePilot (Electron/Next.js), Claudex (FastAPI/React), and six others. Every single one had taken a different approach. None of them had solved the permission round-trip problem cleanly.

Three options emerged. Build a new Tauri or Electron app from scratch. Fork Claude Code and rip out the Ink renderer. Or extend our existing Neural Interface, which already had an Express server, WebSocket infrastructure, terminal sessions, and a file explorer running on port 3344.

We went with option three. Not because it was the best architecture. Because it was the one where we did not have to start from zero.

Day 2: "It Should Just Spawn a Process"

The plan was simple. User sends a message through a WebSocket. Server spawns claude as a subprocess with --output-format stream-json. Parse the NDJSON events. Forward them to the browser. Render chat bubbles.

The process did not spawn.

On Windows, claude is not a binary. It is a .cmd wrapper that Node's spawn() cannot execute without shell: true. The error was ENOENT. No "file not found." No "wrong platform." Just... nothing. The subprocess silently failed and the WebSocket sat there waiting for events that would never arrive.

We added shell: true. Then we discovered the environment variable problem. Claude Code checks for VSCODE_* and TERM_PROGRAM vars to detect if it is running inside an IDE. Our server process inherited all of them. The nested Claude instance thought it was inside VS Code, which changed its behavior in ways we could not predict. We had to strip every interfering env var before spawning, matching the exact filtering pattern our terminal spawner already used.

Then the subprocess hung. It spawned, it connected, it loaded its MCP server config. And then it sat there doing nothing, because --strict-mcp-config was not set and the MCP server initialization was blocking the process. Another silent failure. Another hour gone.

We also discovered, by accident, that claude -p --output-format stream-json silently requires the --verbose flag if you want tool call events. Without it, you get the final result and nothing in between. This is not documented anywhere. We found it by reading the source.

Day 3: Making It Look Like Something

Once events were actually flowing, we had to render them. The stream-json format emits events like system.init, assistant (with text and tool_use content blocks), tool_result, and result. Each one needed its own renderer.

The first version looked like a terminal pretending to be a chat app. Monospace everything, no visual hierarchy, tool calls dumped as raw JSON. So we rewrote the CSS from scratch. GPT-style layout: centered column, max-width 780px, avatar plus content for assistant messages, pill-shaped user bubbles. Gold accent because we were building SynaBun and gold is the brand color.

The user hated it.

Not just the gold. The whole palette. The gold gradients on the send button, the gold-tinted scrollbar, the gold borders on tool cards, the gold box-shadow on input focus. All of it. Then we tried blue accents. Hated that too.

Strip the color. Make it neutral. The panel should blend with the Neural Interface, not scream its own identity.

So we ripped out every accent color and rewired the entire panel to use the Neural Interface's own CSS variable system. --t-faint, --s-subtle, --b-hover. Neutral grays everywhere. The only colored element left was the brand "S" icon: a single gold gradient in a sea of monochrome. That was the right call. It took three complete palette rewrites to get there.

Day 4: The Binary That Ate the JSON Stream

Someone attached a JPEG through the paperclip button.

The file attachment system used file.text() to read the file content and injected it into the prompt via <file> tags. This works fine for code files, markdown, plain text. For a JPEG, file.text() returns garbled binary garbage that gets concatenated into the prompt string, piped into stdin, and immediately corrupts the NDJSON stream. The subprocess chokes. The WebSocket dies. The UI freezes.

The fix was a whitelist. About 70 text and code file extensions that are safe to read as text. Everything else gets rejected with a status message. We also added a 100KB size limit and moved file content before the user's text in the prompt, so Claude sees the context first. Nine new tests. The kind of bug that makes you wonder how it ever worked in the first place, until you realize nobody had tried attaching an image yet.

Day 5: The --print Flag Haunting

This one deserves its own section because it happened three times.

Claude Code has a --print flag (short form: -p). It runs Claude in non-interactive mode: send one prompt, get one response, exit. It also auto-approves every tool call. No permission prompts. No control_request events. The CLI just does whatever it wants.

For a skin that needs to show "Allow / Deny / Always" permission cards, this is fatal. If --print is in the spawn args, the permission UI is dead code. Fully implemented, fully styled, never triggered.

March 11: We noticed permission prompts were not appearing. Found -p in the args array. Removed it. Permissions worked. Celebrated.

March 12: Permissions stopped working again. Someone had added --print back with a comment that said // --print is REQUIRED for stream-json. This comment was wrong. --input-format stream-json handles stdin piping on its own. The comment was written with absolute confidence and was absolutely incorrect.

The "someone" was the AI. In a different session, working from a different plan, it had concluded that --print was necessary based on the CLI docs saying stream-json "only works with --print." That documentation was describing a different use case. The AI did not verify. It just added the flag and wrote a convincing comment.

March 13: Same bug. Third time. The definitive fix was removing --print and adding a comment so aggressive it could not be misunderstood:

DO NOT add --print / -p here. It forces non-interactive mode which auto-approves ALL tool uses and suppresses control_request events. This bug was fixed 3 times. See CHANGELOG.

Three sessions. Three plans. Two of them contradicted each other. One said "add --print, it is required." The other said "remove --print, it breaks permissions." Both were written by the same AI, days apart, with equal confidence. This is what building with AI actually looks like. Not the polished demo. The part where your own tool gaslights you across sessions.

The Control Response Format War

Even after fixing the --print flag, permission responses did not work. The user clicked "Allow." Nothing happened. Claude sat there waiting.

The server was normalizing the control_response message into a flat format: {"type": "control_response", "request_id": "...", "response": {"behavior": "allow"}}. Clean. Logical. Wrong.

The CLI reads q.response.request_id and q.response.response. It expects a nested format where everything lives inside a response wrapper with a subtype: "success" field. With the flat format, q.response.request_id resolved to undefined, the pending request lookup failed, and the response was silently dropped.

We found this by reading the actual SDK source code. Line 13436 of cli.js. Not the docs. Not the examples. The minified source. That is the level of archaeology this project required.

Day 6: Thinking About Thinking

Claude Code has extended thinking. You can set effort levels: low, medium, high, max. We built a toggle button in the toolbar. A little lightning bolt that cycles through levels with progressively lit dots. The first version used an emoji bolt with filter: saturate() CSS hacks. Ugly. Replaced it with an SVG that inherits currentColor like a civilized icon.

The colors were too saturated. Purple glow on max level. Pulsing animation. Scale transforms on the icon. One look and it was clear: tone it down. So we muted everything. Removed the glow, the pulse, the scale. Soft blue to muted violet across the levels. Understated.

Then we discovered that thinking blocks were invisible. The renderAssistant() function only extracted text and tool_use content blocks. Thinking blocks were silently dropped. When effort was set to high or max, Claude was thinking extensively and the user saw nothing.

We added thinking block extraction and rendered them as collapsible <details> elements. The first design used a purple left-border accent. It looked bad. We redesigned it to match the existing tool-card pattern exactly: same CSS tokens, same border-radius, same hover transitions, same chevron rotation. If it looks like it belongs, it belongs.

The thinking indicator had its own problem. hideThinking() was called on the first streaming assistant event, which removed the activity dots while Claude was still working. In the real CLI, the spinner persists throughout the entire turn. We had to invent a repositionThinking() pattern that moves the indicator below new content as messages stream in, with a persistent timer counting seconds. Only finish() kills it.

The Actual Scorecard

Six days. Fourteen files modified across two complete UI surfaces. Two full CSS rewrites. Three color palettes rejected. A file attachment system that had to learn what a binary file is. A permission flow that broke and was fixed three separate times because the AI kept contradicting its own previous fixes. A thinking toggle that went through four visual iterations. A control_response format that required reading minified SDK source to get right. A subprocess spawn that silently failed on Windows for three different reasons.

The final skin has: a standalone chat page with project and branch selectors, a slide-in side panel for the 3D view, model switching, cost tracking, file attachments with validation, permission prompt cards with Allow/Deny/Always, extended thinking with effort control, slash command support, partial message streaming, keyboard shortcuts, and a resizable layout. It uses zero frameworks. Vanilla JS, no build step, no bundler. Every line of CSS was handwritten and argued about.

What We Actually Learned

Undocumented behavior is the norm, not the exception. The --verbose flag requirement, the --print lifecycle change, the nested response format, the Windows .cmd wrapper, the MCP config blocking. None of this was in any guide. It was all discovered by spawning the process and watching it fail.

AI-generated plans contradict each other across sessions. The same model will write two plans for the same problem that recommend opposite solutions, both with total confidence. If you are using AI to build, you need a memory system that tracks decisions and their reasoning. We had one. It still was not enough to prevent the --print flag from being re-added twice.

Color is a minefield. What looks right in isolation looks wrong in context. What looks right at 420px panel width looks wrong at 22%. What looks right with one message looks wrong with twenty. Every visual decision was made, unmade, and remade in the browser with real content.

And the gap between "the feature exists in code" and "the feature actually works end-to-end" is where most of the time goes. The permission UI was fully implemented on Day 3. It did not actually work until Day 6. Every component was correct in isolation. The integration was where everything fell apart.

The Claude Code skin ships with SynaBun's Neural Interface. It is not a fork, not a wrapper, not a separate app. It is the same Express server, the same WebSocket, the same port. Just a different way to look at the same conversation. Whether that was worth six days and a measurable decline in mental health is a question we are not ready to answer yet.