The Last Carpenter

It’s late, I’m on my second decaf, and I’ve been putting off adding a terminal to Kale for days — convinced it’ll be the usual slog of incompatible libraries, StackOverflow spelunking, and documentation rabbit holes. Then I remember: none of that applies anymore. I make a branch, chat with Claude Code for five minutes, and alt-tab away. Thirty minutes later, the agent dings. I open the app expecting wreckage. Instead: a fully working terminal, custom system prompt loaded, shift-return handling correct, graceful exit — the whole thing. All I had to do was describe what I wanted.

That was… uneventful? My previous conditioning was wrong. The correct answer now is always “sure, why don’t we try that, it’s literally a freeroll.”

I studied computer science for four years in college, then studied software engineering in graduate school. I’ve read hundreds of books. Thousands upon thousands of hours of practice. Nearly two decades of carving ornate tables and chairs by hand.

Now I press a button and the table appears.

Five minutes of talking and thirty minutes of waiting. That’s what two decades of expertise has been compressed into. The terminal worked because I knew exactly what to ask for. The builder has become the boss.

I built Kale — a writing tool, designed around how I actually write — and it rewired how I think about building. Not because it failed. Because it worked so well, it made me confront the question I’d been avoiding: now that the craft I spent my career learning is transforming, what’s the new craft?

Software developers have been through this before. We’ve been unlearning things we prided ourselves on since the ’70s. We laugh at the “skill” of sequencing punch cards. But when it’s your turn, it hits different. I am the carpenter who has lost his craft. The question is whether I can also be the architect who directs the machines.

Kale was my attempt to find out.

On Kale, by Kale, in Kale

This article is a true dogfood. I composed it in Kale. Kale prompted this article, Kale helped me structure it, and Kale enabled the finishing touches.

And through that process, a beautiful cycle emerged: The more I used Kale -> the more I iterated on Kale -> the more I used Kale. The software iterated itself.

A screenshot of Kale

Functionally, Kale is a document editor with a built-in comment system, designed around my personal workflow for writing articles. Technologically, it’s what you’d get if you threw Cursor, Claude Code, Google Docs, and Git for desktop into a blender and optimized the smoothie to my particular taste.

Everything I’ve worked on for the last fourteen years has been zero-to-one. Blank canvas, figure out what the experience should feel like, ship it. At Freckle, we helped teachers differentiate math lessons without drowning in the volume of customized lessons. At Double Dusk, we built co-op gaming from first principles. Caltsar was meeting accountability for Google Calendar. With Tutorbox, we launched AI-powered English tutoring before most people had even heard of ChatGPT. At ConceptualHQ we launched early LLM experiments around SEM campaign management. Every time, the same loop: understand what someone actually needs, find the right shape for it, build the thing, gather feedback, iterate relentlessly.

Inspired by Peter Steinberger of OpenClaw and Boris Cherny of Claude, I built Kale to feel (in my hands, not in the abstract) what it’s like when an agent builds your software for you. I wrote every line of code for my first startup’s MVP. Artisanal. Handmade. My hands. I have now written zero lines of code in almost a year and I’m over 100x faster than I was three years ago. Let’s see how far the most expensive tools on the market can take us.

The Genie and the Rule

When creating Kale, I held myself to the YouTube Rule, coined by my friend Finbarr Taylor: “At all times the agent is coding, it must require so little attention from me that I can be watching YouTube.” No peeking. No fidgeting with something it’s working on. Full delegation.

Years of managing engineers taught me: Don’t delegate by looking over someone’s shoulder. Set context, define success, and get out of the way. The YouTube Rule is that instinct, formalized.

As an added benefit, I get to watch YouTube.

The New Craft

The YouTube Rule required a completely different relationship to building. I started by going after the scariest assumptions first – the stuff without correct answers, stuff which I had no idea if the agent could even pull off – and validated them as fast as possible. Experiment, play, don’t lock anything in.

You stop thinking about code and start noticing where the thing feels wrong. Where the UX is clunky. Where the architecture is getting in the agent’s way. It’s more like sculpting than engineering.

One thing I carried over from game development was the obsession with feel. In games, you care about the emotional impact of every interaction. Most general software doesn’t bother: it’s all function, zero feeling. For Kale, I wanted the canvas to look pretty, to make me want to write, and to feel good when the agent reviewed my work or workshopped a passage.

At Double Dusk, we spent weeks tuning the weight of a character’s jump. Not the physics. The feel. How heavy should the landing be? How floaty the peak? We’re not aiming for accuracy. We’re shooting for experience. That obsession carried straight into Kale. How does it feel when a comment appears? When Claude finishes a review? These micro-moments are where users fall in love or bounce. Most software teams never think about this stuff. But experience is everything.

My goal with Kale: a writing app that both performs well and feels right for the N of 1 of me.

The Comment System

Kale’s comment system does three things, and each one emerged from a real pattern in how I actually write.

First, comments are reminders for myself. Stuff I want to come back to but don’t want to deal with right now. This mirrors the two phases of writing music: the generative phase and the editorial phase. You can’t do both at once, they’re different parts of your brain. Comments let me punt the inner critic to a second pass so the first pass flows instead of stuttering.

Second, comments are a to-do list for Claude. Standard operations against specific chunks of text: “This passage flows poorly. Fix it.” “This turns into word salad. Fix it.” “I’m not sure this is actually true. Research it.” “I want to rewrite this. Give me a draft.”

Third – and this is the most interesting one – Claude uses comments to act as an editor and writing coach. This came from my own experience working with human writing coaches. I’d send them a polished article, and days later get back a file plastered with edits, critiques, ideas, and feedback. Hugely useful for getting better at the craft. Now I can get that experience in a few minutes instead of waiting a week. Is it as good as a pro? Probably not. Is it vastly more convenient? For blog posts and essays, where the gap between “pretty good” and “perfect” barely matters? Yeah, absolutely. Iterate faster, avoid getting stuck.

Choosing the Comment Format

Comment systems for the written word are all over the map. My constraint was specific: I wanted to edit files directly on the file system. What’s on disk is what you’re working with. No translating Markdown into some internal format and back.

Option one: build a custom internal representation, inject comments into that, export to Markdown when saving. Clean, but way heavier than what I needed.

Option two: a sidecar file tracking comments next to the content. Sounds fine until the writer starts deleting lines, adding paragraphs, and moving things around. Keeping the sidecar in sync is doable from the editor’s perspective, but once Claude Code enters the picture, you’re asking the agent to understand your custom sidecar format. That eats context and adds fragility, all for a problem that has simpler solutions.

I went with option three: inject markdown comments directly into the source file. Everything lives in one place. Comments are invisible when published. Both the writer and Claude can edit freely without breaking anything. The main cost: making sure the editor doesn’t let you accidentally half-delete a comment boundary. A few quirks, all manageable. Simplest solution that met every constraint.

Saving, Git, and Knowing When to Stop

“A painting is never finished - it simply stops in interesting places.” -Paul Gardner

Saving to disk sounds trivial until you realize the user might be writing at the same time as Claude. Both of you, editing the same file, side by side. Claude’s update tool flails if the file has changed since it last read it, so there’s this dangerous window between your keystrokes and the background save where Claude could try to write and blow something away.

Fix: autosave every three seconds. Fast enough that a thinking model is too slow to sneak changes in between. A couple hours later I’d forgotten it had ever been a problem.

Git integration followed the same principle: do the least possible thing. Having manual version control under the writer’s control felt powerful: you can see the article evolve, and if you let the agent loose on your writing, there’s no way to lose your work. But the temptation to bloat was constant. Am I trying to recreate Cursor? Do I need a file explorer? A diff viewer? Branch management?

Your brain is incredible at making things more complex than they need to be. You can always add more – more options, more workflows, more buttons, more knobs – and in an AI writing tool, you could have the AI nag you about a thousand different things, distracting you from the actual writing. I don’t want that. I want to be immersed. To write. To flow. So: what’s the absolute minimum? Save button. Open file button. Restore button. Those were the only ones I actually reached for. Everything else? Overkill.

How Much Leash?

The meatiest non-obvious decision was safety. Should this app be completely uncontrolled – able to execute anything on the hard drive – or highly constrained, where the developer picks the tools and content lives only in memory?

I let the agent loose. Kale runs a completely uncontrolled coding agent with total access to everything on your machine. I baked guidelines into its instructions – stick to this one document, don’t edit my machine, don’t connect to other computers, don’t do anything funny – but these are suggestions, not walls. The machine can talk itself out of following your rules if it decides it has a better idea.

I was generally impressed by how well Claude respected the guardrails. But this approach would NOT be acceptable for anything public-facing. Right now, it’s a bazooka. I wouldn’t feel comfortable handing it to someone else, even with detailed instructions on how to avoid blowing yourself up.

In a real product, Kale would need sandboxing: a safe container so users don’t explode their files or exfiltrate their data. And here’s the thing I had known cognitively before (from people like Sander Schulhoff), but building Kale made visceral for me: despite the safety training that companies like Anthropic invest in their models, it’s essentially impossible to make a perfectly safe AI. There will always exist some perfect string of inputs that changes its goals.

Scale changes everything, too. My N of 1 went well. That tells you nothing about N of 10 million. If you’re worried about a one-in-a-million event, your wariness should increase dramatically when you’re actually dealing with millions of instances.

Closing the Loop

The YouTube Rule is a nice idea. Making it actually work inside a tactile Electron app was a whole different beast. You have to set the agent up for success before you walk away: prepare the plan together, keep the architecture minimal so it doesn’t get confused, have tests in place, and – this is the big one – give it a way to see and interact with what it’s building.

In a document editor, everything is a bug hotspot. Moving the cursor with arrow keys, pressing backspace, selecting text to create a comment – all of it. The agent needed to operate the app without calling me over. I landed on dynamically generating Playwright JavaScript to drive the app directly (screenshots, mouse movements, selections, zoom ins and outs) instead of trying to wrestle an MCP server connection that never quite worked out of the box.

I had screenshotting scripts I’d built while working on Cui and Rui — native apps with custom GPU-driven UI systems, no browser under the hood, so I’d had to roll my own visual testing from scratch. Those scripts became part of a ragtag toolkit I’ve been carrying from project to project, evolving with each one. In Kale, they served as scaffolding until the agent figured out how to drive the Electron app directly through Playwright. Bringing old ingredients into new recipes always fills me with delight. I feel a comfortable sense of continuity, even when it’s only at this microscopic scale of an individual script.

The testing approach that emerged:

Create a test Markdown file in a temp folder for whatever corner case you’re chasing
Spin up a fresh instance of the app against that file, reproduce or confirm directly
If the issue isn’t niche, turn that confirmation into a test case you run as part of your test suite to prevent future regressions
Define what “done” looks like, the same way you would with a human dev.

Once this process was in place, 90–95% of issues got reproduced and fixed without me touching anything.

Not everything landed. I asked the agent to build a diff view — the kind you get in Cursor or a GitHub PR. Trivial, solved problem, I thought. Instead, I kept returning to find paragraphs jumbled, inline edits misread as entirely new sections, side-by-side alignment broken. Each fix regressed something else. The agent thought it was done, but my eyes told me it wasn’t. I ultimately abandoned the feature — not because it was impossible, but because I’d been vague about what “correct” looked like, and without crisp success criteria the agent spiraled. Lesson learned: the bottleneck wasn’t the machine’s capability. It was my specification.

I experienced an engineering “team” faster than the dozens I’ve led in my career. It lacks some of the context-questioning habits of humans. It also never needs a bathroom break.

But that failure moment stuck with me: the speed is real, and so is the gap that only taste can close.

Hats Off to the Machine, New Hats on My Head

The thing that fascinated me most: what it actually feels like to be the PM, the UX person, the user researcher, and the software engineer all at once. Not in theory. In practice, in one afternoon.

As a founder, wearing many hats is nothing new. But the speed – making real progress without needing to go find an expert, without waiting on anyone – that was something else. At Freckle, we couldn’t afford a dedicated UX person for the longest time, and I dreaded doing interface work myself — it never quite lived up to my aesthetic aspirations. Now I can get polished, ergonomic visuals without months of recruiting a UX specialist. And even when the first attempt misses, tries are practically free. Having poor UX is inexcusable in 2026. With Kale, the old tension between “build it” and “make it beautiful” just dissolved. The machine handled the mechanical translation; I got to think about the product all day.

When you’re this fast, you can experiment with interfaces and approaches like you’re sculpting a hundred iPhones out of clay to feel which ones sit right in your hand. It used to take two weeks to roll out a single prototype. Each try was expensive, so we spent so much time guessing in advance. Now it’s “Let’s try this. Nope. Let’s try this. Better.”

This is how we built Whisk (the two-player co-op at Double Dusk). What’s the minimum set of mechanics that actually delivers fun? You can’t spec “delight” in a design doc. You have to feel it. And now you can feel a dozen variants in an afternoon instead of a quarter.

This raises a real question about scale. Once a zero-to-one gets real traction, do you bring on deep domain experts? Is the improvement marginal or transformational? Maybe the tools keep getting better and the value-add of specialists shrinks. Maybe the pros with the same tools stay ahead. Maybe the gap shifts – as it has in software development – toward taste and context and understanding the problem instead of executing the solution.

Either way: we’re all going to have much better software because builders can finally play and experiment at the speed of their intuition.

I’ve done it.

It’s magic.

Iterating on Myself

The models (Claude Opus 4.6, Codex Extra High) are astounding. With the right guidelines, upfront research, an agreed-upon plan, and real tests plus “manual” agentic QA, the results are remarkable. Most things got implemented as envisioned. Some took multiple tries. A few took too long to be worth hammering on any further.

The genie is truly extraordinary. But the gap between what it can do and what you actually need – that’s where you earn your keep.

Here’s the core tension: Independence and correctness pull in opposite directions. Independence means you’re not checking the agent’s work, which means stuff slips through. How do you maximize both at the same time? That’s the unsolved meta-problem of agentic development, and I don’t think anybody has cracked it yet. How do I tell it to take into account all the things a highly capable software expert should (security, robustness, maybe its own taste)?

Next time I build something like Kale, I’ll be way more disciplined about testing from the start. Make sure the agent knows what tests to write, review them before it runs off, agree upfront on what “done” looks like. I don’t want to be the one mashing buttons to double-check everything. I want the agent to be independent. But right now, that independence comes at a cost, and you have to decide how much you’re willing to pay.

My Reckoning

The first few months of serious agentic coding hit hard. Four years of school. Hundreds of books. Nearly two decades of practice — and suddenly the market doesn’t value the mechanical execution nearly as much. But here’s what I didn’t expect: Once I stopped clutching the old tools, the higher level of abstraction felt like a promotion, not a demotion. The work got more interesting, not less.

It’s like being a language carrier for a tradition that’s fading. Building Kale, I finally realized: it’s my turn to set twenty years of practice on fire.

I sit with a real question: how are other people – people who haven’t been doing this for twenty years – going to navigate this new paradigm? Is my experience holding me back? Am I a dinosaur, hooked on a past that’s no longer relevant? Or is my perspective an advantage, because I actually understand how things work at the low level? Every new generation misses out on details that their ancestors found fundamental. Yet each new generation accelerates at faster and faster speeds.

On the individual level, people find something they’re good at, something the world values – “I’m the X guy” – and it becomes their whole deal. Power, respect, authority, compensation. When that disappears, a lot of people get really upset and want to revolt. These are the Luddites.

Part of my identity was “the solid low-level guy” or “the functional programming expert.” That specific facet is losing market value fast. But my broader identity — the one built on product strategy, team leadership, and taste — that’s more relevant than ever. The innovator’s dilemma, playing out in real time: Can you let go of the skill you were exploiting without losing the judgment it gave you?

Retreat or Rebirth

The people who stay relevant are the ones excited to get more out of their tools. They are not doing a renaissance fair LARP of their glory days.

This is equally true for musicians and artists as it is for developers. Musicians are struggling against an artistic machine that spits out twenty variants of the punk passion they used to rage. Developers are, well, no longer writing code.

The great artists reinvent themselves. The David Bowie types go through era after era, some brilliant, some flops. They keep asking: “Now that all of this is possible, what will I become?”

I am the carpenter who has lost his craft. But I’m equally the architect who directs the machines. I appreciate the past. I choose my future.

Coda

Kale isn’t a product I’m shipping — it’s a concept prototype, the way automakers build concept cars to feel what the future might drive like. It’s nearly twenty years of expertise poured into a new paradigm to see what comes out the other side.

There’s more to build. The comment UX is bare bones, I’m curious about porting it to the web, and the tension between independence and correctness is nowhere near solved. But these are problems I’m excited to have. They’re the problems of someone who chose to architect a future working alongside computers, not someone mourning their old tools.

We’re at the very beginning of building at a speed that would have been inconceivable even three years ago. The early adopters are already on the wave. The tools will only get sharper, the loops tighter, the prototypes faster. I don’t know exactly where this takes us — but I know I want to help steer this new future.

26 Mar, 2026 | #ai #coding-agents #software-engineering #developer-experience #future-of-work #workflow

Share: X HN LinkedIn Reddit