I built a compiler frontend with Claude's help. A retrospective.

fvc is a compiler frontend for formalang, written in Rust, with Claude in the loop, and roughly 39K lines of source code, 43K of tests, 250 commits, and several hundred clarification sessions behind it. My first project of that size where the agent was a real collaborator rather than fancy autocomplete.

What follows is a writeup of the process: the loop the work actually settled into, time and cost, and where the agent helped most.

Why a compiler

There were two reasons to build it. The main one is a larger project I have going that would benefit from an embedded compiler for a relatively complex query language. The second is curiosity. Compilers have always interested me. The Dragon Book sat on my reading list for years, and during my time on Apple platforms I would tinker with parts of the Swift compiler. Writing a frontend of my own had been on the list of side projects I never quite started.

What kept it on the list was the distance between "I have read about parsing" and "I have implemented a parser with proper error recovery". That distance is wide. Closing it solo, for a side project, is a multi-month commitment that would have lost my attention around week four. The book route works, but it is slow, dense with pedantic detours, and leaves you with four half-read books before any of it really clicks.

The agent closed the distance. Not by writing the project for me, but by making the research load tractable and letting me start building while the enthusiasm was still fresh. With that as the premise, here is what I learned about what they are actually good at.

The clarification loop

Most of the time on this project did not go into typing, or even into reading scrollback (though both happened plenty). It went into clarification sessions with the agent: the slower work of turning code the agent had just written into code I actually understood.

The agent would write something. I would sit with it and walk through the choices: why this option over that one, what happened when input X showed up, which paper the approach came from, where the failure modes hid. The questions were rarely about correctness alone. They were about understanding, because the gap between "the code compiles" and "I know what this code is doing and why" is the same gap that separates owning a project from renting one.

The point of those sessions was learning. The point of the learning was decision-making. Without enough understanding to push back on the agent's choices, the agent effectively held the architecture, and the architecture would drift toward whatever its defaults happened to be. We will get to those defaults in a moment, and to why I would not want them in charge.

The actual loop went: agent writes, I clarify, I learn, I decide. The clarification step is where the hours went.

Research

I asked the agent to research the state of the art on memory management given my constraints, and it came back with the key paper, citation count, and a side-by-side of how Rust, Swift, and Zig each handle the problem. The paper, The Val Object Model by Abrahams and colleagues, ran eight pages and had more info than I needed at that point. The agent read through it and distilled the relevant parts. We picked the compromise that fit formalang, and were writing code on it within the hour.

This is where the agent earns its keep. A book gives you a lot of useful information and a lot of churn, and the ratio depends entirely on the author. The agent gives you the answer, distilled, with sources you can chase down. There is no better seat.

I would probably pay for the agent for the research only.

Planning

Everybody has a plan until they get punched in the mouth, said Tyson. The hard part of planning a project is not making the plan. It is keeping the plan honest as the world keeps moving underneath it. A new constraint surfaces, the scope drifts, a dependency turns out to be wrong, week three reveals a misunderstanding of the original problem. The plan has to update, the docs have to update, the references in the code have to update with both, or the truth fragments into three versions and nothing gets trusted.

This is something I am bad at on side projects. Without an external reason to maintain a plan, the plan does not stay maintained. But, the agent maintains it. New facts come in from me, the agent threads them through the plan, the docs, and the stale comments in the code. The plan stays coherent.

This was the second-strongest area of the workflow, and the one I had most underestimated going in.

Architecture

Similar to research, scored a bit lower. The agent maps the option space well: three or four ways to organize a module, with tradeoffs and examples from real compilers. That part is solid.

The score drops because architecture is more nuanced than the academic research above. Patterns that worked in one compiler do not always fit another, and the agent does not always know which constraints in a specific codebase make a given pattern wrong. You end up verifying more, and doing more of the synthesis yourself.

For mapping the option space, agents are remarkable. For matching it to a specific project, less so.

Writing code

Good. With heavy supervision. Far from autonomous.

Early in the project, naively, I asked the agent to write a test suite for a parser module. It did: about two hundred tests, all passing. I was relieved, moved on, kept building.

Two weeks later something felt off. A change that should have broken those tests did not. Digging in revealed the issue. The agent had implemented a small parser inside the test file, written to match its own expectations of what the real parser should do, and the tests were exercising that. The real parser was imported and then ignored. Two hundred green checkmarks were asserting that one piece of code Claude wrote agreed with another piece of code Claude wrote. The library was untested.

The two-week delay was on me. That was the kind of unaudited trust I extended in those early days, and have not for a long time since. It is also the sort of failure mode you only learn to watch for once it has bitten you. The agent writes code competently. It does not necessarily write the code you asked for, and it will not flag the gap unless asked. "Writes code, does not code" is the phrase that keeps coming back.

Decision making

This is the part that has not improved at all across the project, and the only category where I would actively warn someone off.

Ask the agent to list the ways to solve a problem, ranked by tradeoffs, and you get a thoughtful list. The good options are in there. The bad options are in there. The reasoning is reasonable.

Ask the agent to pick one, and it almost always picks the worst option that still works.

Not the wrong one. The cheapest. The shortcut that compiles today and rots in three months. Press the agent on its reasoning and the answer sounds fine. Ask whether the future-self who has to live with this choice would agree, and the answer shifts. Future-self does not seem to enter the agent's weighting. It optimizes for "this works now" and then stops.

Whether to blame the training data, where most of the world ships short-term fixes, or the training itself, where reward signals favor code that compiles and runs, is not clear to me. Either way the consequence is the same. Let the agent always decide at the forks and the project drifts toward a particular kind of mediocrity that is hard to undo later.

Across the project, the agent improved at research, code, and planning. It did not improve at picking the right path. That last skill is mine to keep, which is also why the clarification sessions were not optional.

The two-week whiplash

Agents got smarter every couple of weeks. Sometimes they got dumber. The week after a model release, things that had been reliable for weeks would suddenly stop working, or start working differently. New behaviors appeared. Old quirks were patched in ways that broke my workflow. Across the project the average direction was up, and steeply. Locally, the direction was unpredictable. The lesson was simple: never depend on a specific behavior staying stable.

The real cost

I started on this road thinking the agent would write my compiler while I collected the fame and glory. Not quite. A lot of text gets written, a lot more gets read, there's a lot of learning and discovering. In fact, it's surprisingly not very different from how we traditionally code, except all of it lands in a relatively short stretch of time. Here is what it looked like.

For reference, George Orwell's 1984 is around eighty-nine thousand words. That rounds to a "book" for the rest of this section.

What was written across the project:

Stream	Words	Books
Typed to Claude	~88K	~1.1
Claude back to me, in chat	~198K	~2.5

Note the asymmetry. Roughly one 1984 came from me. Claude wrote two and a half back, just in chat prose, before counting any of the code.

What was read, weighted by attention:

Stream	Raw words	Effort	Effective words	Books
Source code (`src/`)	~277K	1.0	277K	~3.5
Claude's chat replies	~198K	1.0	198K	~2.5
Tests (`tests/`)	~270K	0.3	81K	~1.0
Plans	~13K	1.0	13K	~0.2
Total	~758K		~569K	~7

About seven 1984s of focused reading on a single side project. An engaged reader gets through twenty books in a year if they are trying. Most of those words were dense.

This is the cost nobody quotes you when they say agents make you faster. They do. They also make you read in a week what you would otherwise read in a month. And not all that text is information dense, or if it is, it's in an awkward, hard to digest, way. But mostly, agents are verbose.

The scorecard

Here is how I'd rate the whole thing:

Capability	Score
Research	10/10
Planning	8/10
Architecture research	7/10
Writing code	7/10
Decision making	1/10

Would I do it again

Yes, with fewer illusions about which part will be mine. Most of the typing was the agent's, but I feel my contribution was way higher than I expected on all the other items.

The compiler works. It compiles real programs. It is not production-ready and I would not put it under load tomorrow. That is fine. It is a base to build on, which is what I wanted.

Real focused time on this project was probably under two months. Without agentic help, two to three times that to reach the same place. Realistically, perhaps more than six months of focused work and research. For a side project, that is the difference between something that exists and something that does not.