The AI Engineering Baseline

Monday, 8 June 2026

As global AI adoption has accelerated through the first quarter of 2026, it’s become clearer where the baseline bets that you need to make as a modern engineering organisation need to be on the curve of AI adoption in your day-to-day workflows.

It’s no longer a question of “can these tools work” (they can and do) – software professionals using AI tooling is the new normal - the question now moves to be “what are the most effective ways we can use this new category of tooling to speed up where we need to go”.

Through a purely technical lens, our goals are clear – we want to build increasingly stable, resilient systems, reduce complexity at every opportunity, and ensure that the platforms we build can operate at a competitive cost in the market.

These are not our only software goals (they exclude product direction; they don’t speak to meeting the future of the technology landscape) but as we consider what our baseline set of AI capabilities should be across our systems they are the bedrock on which we build.

The current state of the art in AI models and tooling is accelerating our ability to reason about complicated distributed systems as one cohesive whole, and this document outlines techniques, and subsequently highlights a direction of travel to take advantage of this change in the technology landscape.

The Four Patterns

There are four patterns of behaviour that makes sense in the building of software

AI in Development Workflows
AI Introspection
Agent Assisted Synchronous Change
Observing Super-Agent

And each has a distinct different set of trade-offs associated with its use.

AI in Development Workflows

AI in Development Workflows is the most familiar AI adoption path. Almost everyone is familiar with this by Q2 2026 – it involves a technical user, using tools that are either embedded in their IDE, or called via their Terminal to generate source code that is integrated into the software that they build. This practice itself is both rapidly evolving and encompasses a set of ancillary tools – things like MCP (Model Context Protocol) servers, agent skills, agent teams, using agents to do analysis, refinement, and iteration, among other co-design and co-development practices.

Almost all developers are familiar to some extend with this application of AI, and the rapid pace it’s been changing in recent months, but it’s characterised as cantered on the developer experience and almost always involves a human in the loop.

AI Introspection

AI Introspection is an emerging category of tooling both in the marketplace, and in our own novel experimentation where models are used to augment traditional static analysis of software. Where traditional static analysis focused solely on using programming tools like type checkers and linters to provide insights into software quality, generative models are allowing us to build on software introspection to also check for adherence to patterns and standards, evaluate quality, produce threat models, and perform the kind of analysis that was previously too difficult with more rigid tools.

Examples of good AI Introspection include answering questions like “do our APIs meet our API standard documentation”, “does this service follow any of our catalogued design patterns”, or “what potential attack vectors exist in the source code for this service”.

Layered on top of this, and combined with traditional static analysis, we’re able to use AI models to map out dependencies between distributed components, understand how changes are made using signals like git commit history, and generally reason about an extensive distributed system as a whole, producing meaningful reports that explains how a large platform changes in near real-time.

This blend of techniques is a generational leap over previous types of tooling in this space like SonarQube.

Agent Assisted Synchronous Change

Over the last two decades, there has been a drift towards increasingly “decomposed” and “decoupled” system designs. This is most observed as organisations trended first towards “Service Oriented Architecture”, and subsequently (via Guerilla SOA) towards “Microservices”.

This drift was driven by two factors – the increasing complexity of systems causing systems to grow large, and the appetite for organisations to parallelise work across these increasingly large systems. Software has a fixed “surface area” – a maximum amount of people that can meaningfully contribute to it before human coordination becomes unwieldy, so the drive towards microservices accelerated the amount of decomposition present in most systems.

Unfortunately, if you take this design philosophy to its natural extreme, whilst the software now has more “seams” in it, more subdivisions where ownership can be established between different groups of people, the total cognitive overhead required to comprehend the system and it’s behaviours as a whole exponentially increased, and the cost of servicing and maintaining and individual parts of the system increased to match. This has taken us to a place where lots of organisations now author vast distributed systems that are hard to comprehend from any one point, and making changes in one place often has adverse effects on several other components.

This antipattern is commonly referred to as a “distributed monolith” and is the most common distributed system accidental design, where changes are coupled across team and ownership boundaries.

Modern AI tooling is a salve to this problem – as using multiple iterations, models can comprehend all the components of a system and help synchronise change. To bring this to life “Agent Assisted Synchronous Change” refers to building tools that can reason about the whole system and sequence complimentary changes that are authored and committed across an estate, removing the traditional human overheads in doing this kind of split-ownership joined up work.

As an example, “upgrade all of our systems to the latest version of Framework X”, “change this contract in API A, and update all the other systems that call it to understand the contract change”.

Observing Super-Agent

The more future looking of the patterns in the Observing Super-Agent. “Super-Agent” is an emerging term in the space to describe an AI “director” that observes regular SDLC behaviours like story refinement, spec writing, and ticket authoring in tools like JIRA, and automatically dispatches AI agents to participate in the work, unsupervised until they return for feedback.

Tools have been trending in this direction for the last 3-6 months with features like “Assign to Copilot” in GitHub.com, and the Linear’s work tracking system where its unique sales pitch is “write tickets for agents to work on here”. The final form of this kind of solution is currently a vague, but there are several projects already pitching to own this space such as Open AI’s Symphony project (https://openai.com/index/open-source-codex-orchestration-symphony/), ByteDance’s DeerFlow (https://github.com/bytedance/deer-flow) which pitches itself as “an open-source super-agent harness”, and other similar projects.

It’s safe to speculate that whatever emerges in this space is going to be an orchestrating process that observes the tools you already use today, to dispatch jobs and have agents interact with the workflows that already happen in the organisation.

These super-agent harnesses are likely the engine that allows traditionally non-technical members of staff to be more involved in the build process products.

Making bets in this space

AI in Development Workflows

Teams should probably not look to build one thing in this area, instead, people should be incentivised and encouraged to use model assisted development and AI in their day-to-day tasks via GitHub Copilot subscriptions, Claude Code, and other commodity harnesses.

The state of the art in tooling here changes rapidly, instead people should just focus on the effective use of tools as they’re available today – this includes things like custom skills, custom agents, and MCP servers.

You should accept that there may be pockets of re-work in this space due to the pace of change, the cost of generating tools on an ad-hoc basis is currently very low, so that it makes sense to encourage teams to experiment.

It’s very likely that for the vast majority of developers, they’ll end up building their own tools in the field that’s evolving to be called “harness engineering”.

AI Introspection

There are emerging products in the space that support automated AI Introspection.

I’ve had positive experiences with SPAN (https://www.span.app/) – which provides development insights based on the flow of commits and work across our systems. It’s intended to be a single pane of glass for analysing “developer experience” and understanding where we spend our time and money with a product focus.

Similarly, CodeScene (https://codescene.com/) is a well-established product in this space that is augmenting its traditional static analysis tooling with MCP for AI workflows.

That said, I think there’s a large, untapped potential that we’re personally investing in with the teams I work with to build custom introspection tools that blend model driven assessment and traditional static analysis techniques to provide actionable quality insights across a software estate.

We’ve been investing in tooling that clones all the source code in our estate into a central location then runs analysis jobs and reports over the gestalt (gestalt: “a theory of perception that emphasizes the processing of entire patterns and configurations, and not merely individual components.”) – the sum of all the software together.

We’ve been using this technique to generate service maps of connection edges, PCI compliance reports, and generally to “analyse all the software, then analyse the connections between the software, and then report on the software as a whole”.

What’s interesting about this category of techniques, is that due to every estate being different, the best tool might not be one you buy, but one you build that can connect to the kinds of internal data source you’d never share with a third party.

I published the specification for our “System Map Builder” here - https://gist.github.com/davidwhitney/b278658398c8f54527815f79944ab4ef - which was part of this imitative.

Agent Assisted Synchronous Change

I am convinced you should be betting on this now.

One of the step changes in removing and reducing toil across the organisation I look after has been our slow and methodical exploration of allowing small quorums of people make changes to disparate areas of our platform autonomously.

We’ve built our own harness – a coordinating agent - that produces changesets and opens and tracks pull requests once it’s completed work. We’re piloting it, but with good first signs. I think this is possible the future of a lot of toil-based engineering work that frequently bogs down teams or dies in coordination and planning work.

I think it’s objectively much more interesting to use agent assisted tooling to reduce the complexity of code, and remove toil, than just accelerating the creation of new software.

Observing Super-Agent

The market is less clear on where things are going for Observing Super-Agents. It’s clearly GitHub’s strategy – the confluence of tools like CodeSpaces and Copilot as assignee is heading in the direction of auto-dispatched and auto-reviewed tickets.

I suspect this is the place where a pre-defined product will most likely emerge. You can make a bet on something like DeerFlow, or one of the OpenClaw (and clones) family of observing agents but the space feels like it’s still building out it’s primitives for the safe execution of agents inside trusted contexts.

If you look on the internet lots of people will talk about how they’re “doing this now” but I think it’s probably only very risk-comfortable organisations that are betting real world, autonomous features on agents – and there’s more credible signs that the early adopters are pulling back a little because they moved a little too fast.

It’s my suspicion that whatever the most successful products that emerge here will, at least in the interim, be a built on or around existing GitHub / Jira style workflows or built around a tool competing in that space line Linear.

This does appear to be the safest path to allow non-technical users to contribute to software at scale and at pace, without relying on giving “just anyone” access to programming agents and hoping they get it right.

The Cost Apocalypse

It’s safe to say that all of this is set against the backdrop of technical innovation and extreme price rises of frontier LLMs at the point of consumption.

The picture is particularly bad at the moment, with all the key vendors raising prices (5x-10x) as they wrestle with their huge initial investments. Simultaneously, of the work debuted in Windows at Build like MXC- https://github.com/microsoft/mxc (Microsoft eXecution Containers) is all about allowing agents to work in a controlled context on user machines.

I think this points to an emerging trend of a lot of AI development swinging back towards local hardware. Nvidia pushing it’s DGX and RTX Spark reference architectures for Windows machines, and Apple Silicon both simultaneously point to a future where a lot of the developer infrastructure for this tooling is done on device, at a non-trivial startup cost.

With Chinese models becoming increasingly capable, I’m relatively convinced that normal usage is going to be facing a different shape of supply chain problem within a few months, but actually it’ll go a long way to reducing to commodity levels, the cost of hardware for relatively capable local models and that’s why we’re seeing a lot of work being put into safe local execution contexts.

If you’re making bets here? Wait and see what is available by Q3-4 2026.

What we need to change to enhance this adoption

There are a few complimentary behaviours that need to change to help this shift to AI accelerated development in a lot of organisations.

Permissive Contribution Models

The ownership models of our software components in many software teams are protective.

Teams are asked to “own” their components, and that incentivises being closed to changes from outside of the team. As we anticipate a world where wide-ranging changes are coordinated by agents, and contributions are driven by people who may not have traditionally been engineers, a more collaborative, open source inspired, “custodianship” model will need to be adopted.

Organisations will often talk about this as an “inner sourced” model, inspired by open-source software. This shift in landscape makes it a required change – likely alongside SLAs for responding to pull requests, and better verification processes.

Increase In Defined Architectural Styles

We have long accepted that software implementations tend to differ across systems because it’s been traditionally very difficult to write a linter for “does this design fit” without resorting to self-certification and manual review.

With AI introspection, it’s now much easier to ensure that any pocket of software matches agreed upon default styles and patterns. Whilst we don’t expect all software to become uniform (and in fact, that should be an explicit non-goal – innovation comes from a lack of uniformity), we should expect that the average piece of software, sticks to our average house patterns and styles.

We need to step up documentation on our expected default and rely on introspection to provide a rating for each area of the system that shows how far away from standard any area is. Not being “standard” doesn’t mean “bad”, but it’s a leading indicator for us to understand where complexity exists in our estate.

With more work being contributed by agents, these written standards now have a second purpose – for guiding the implementations that the agents produce. It’s expected that if we embrace contributions from traditionally non-technical members of staff, the agents should follow the rules outlined in our standards to ensure a reasonable, sane implementation.

These patterns and standards will all be owned by experts in the discipline being standardised, and changes will go through those authoritative single points of control.

The Role of Teams in the Future

If the tools are now good enough that smaller teams can reason about systems-as-a-whole, and the teams themselves are more effective without organizational communication burden, a natural shape starts to emerge.

It’s also critically important to make we have a through line of knowledge – it’s our responsibility to raise the programmers of the future and make sure they’re equipped with the skills to do the work and do it well.

I suspect we’ll see a movement from the “two pizza teams” we have today (of about 6-10 people), to smaller quorums of people – a staff engineer shaped person, and a couple of understudies in a “master and apprentice” shaped model. To be successful, this new shape of team will probably be working in a pattern that looks more like mob programming, where they use tool assistance to change entire systems and platforms at once. The natural consequence of this is that site reliability engineering will enter a second era of prominence, as operating software effectively will become more important than ever.

This is an answer to the question “how do we train our juniors” in a world where programming knowledge is at risk of being lost, and companies will neglect this due to short term thinking, to their detriment.

Think about where you’re going

It’s a fool’s errand to try focus on technology in search of a problem, and this paper looks at the work we do purely through a technology lens. To understand which bets you need to make for your business requires an entirely different perspective, with different inputs.

General advice that doesn’t fit where you are, or what you’re doing, is useless to you.

Never forget the people

Regardless of the hubris and marketing around modern AI, it acts as an amplifier for whatever you’re doing at the moment. If you have good practices, it’ll amplify them. If you have bad practices, it’ll make everything a lot worse, a lot faster. Most of the best software I ever saw built was built with careful consideration, quiet reflection, and a lot of care. The best software is built by people who care about the software they’re building, and the people they’re building it for.

Going fast is useless if you’re moving in the wrong direction. Going fast is useless, if you cease to be able to navigate.

To build software for everyone, it has to be built by everyone.

Sidemark: Active Telemetry Comments for C#

Friday, 29 May 2026

OpenTelemetry has quietly become table stakes. That’s a good thing, but if you’ve instrumented a real codebase, you know the tax. A method that does one obvious thing slowly fills up with StartActivity, SetTag, AddEvent, SetStatus. The bookkeeping of telemetry starts to drown out the intent of the code, and in review you spend half your time mentally separating “what this code does” from “what we report about what it does.”

It’s easy to think “oh, but the framework takes care of this with auto-instrumentation”, but if you talk to the experts in OTel, they’ll go to great lengths to explain that auto-instrumentation is a floor, not a ceiling. Most of the value in telemetry comes from the custom instrumentation you add to your code that adds business context to your traces. And that custom instrumentation is the stuff that clutters up your code.

Here’s the kind of thing I mean:

// before - ugly, obtuse, who put that there
var orderId = order.Id;
Activity.Current?.SetTag("orderId", orderId);

Sidemark is my answer to that code-obfuscation problem: non-invasive instrumentation via what I’m calling Active Comments.

// after - glorious, beautiful, basking in the light of the sun, closer to god, happy, satiated
var orderId = order.Id; //?

The idea

A small set of comment syntaxes - //?, //!, //?! - become ride-along annotations. They travel next to the code, get read at build time, and turn into the equivalent Activity calls in the compiled output. The code you read stays the code that does the work. The telemetry rides along instead of competing with your logic for attention.

The framing is loosely inspired by Wallaby.js’s Live Annotations, which project runtime values inline next to the code that produced them. Sidemark takes the same instinct in the other direction: comments as a write surface for instrumentation, rather than a read surface for debug values. Comments are an under-used channel for information about code that isn’t itself code - and surfacing it there keeps the underlying logic legible.

Yes, this means making comments load-bearing, which is a bit of a heresy. Comments have a reputation for bit-rot and lies. But used this way they move back toward their original purpose: a place for the things a programmer needs to understand the code - reimagined for an era where a lot of that understanding happens while staring at production traces.

What it looks like

Tags - drop //? on a local and it becomes a SetTag:

var orderId    = order.Id;          //?
var totalCents = order.Total * 100; //? order.total_cents

Events - //! emits an AddEvent, before a statement or at method entry:

await CallExternalApi(); //! ApiCalled

Activities - //? on a method signature wraps the body in a span:

public async Task<Order> Checkout(Cart cart) //?
{
    // ...
}

Exceptions - //? on a catch records the error status:

catch (Exception ex) //?
{
    throw;
}

A worked example with a couple of tags, a span and an event comes out to roughly thirty lines of hand-written OpenTelemetry - written in five.

How it works

There’s no runtime magic. Sidemark runs as an MSBuild task that hooks in just before CoreCompile. For each source file it parses the syntax tree with Roslyn, rewrites the annotated bits into the equivalent Activity calls, and writes the rewritten copy into obj/. The compiler only ever sees the rewritten files - your source files on disk are never touched, and the emitted IL is identical to what you’d have written by hand. Comments are trivia in the syntax tree, which is exactly why this has to happen at build time rather than at runtime.

A nice consequence: the markers are just comments. Turn Sidemark off with [assembly: DisableSidemark] or <SidemarkDisable>true</SidemarkDisable> and they pass straight through to the compiler as the plain comments they always were.

What’s in the box

It’s a single NuGet package - dotnet add package Sidemark - that ships three things:

The runtime attributes you use to point Sidemark at your ActivitySource (and to disable it).
A Roslyn analyzer that flags misused markers (SDM001–SDM006) right in your IDE and your CI logs. It’s wired in automatically by the package - nothing to switch on.
The MSBuild rewriter that does the actual work at build time.

The package is deliberately lean: it borrows the Roslyn that already ships with your .NET SDK instead of bundling its own copy, so it adds no transitive NuGet dependencies to your app - just the marker attributes and some build-time tooling.

Requirements are modest: a .NET SDK of 8.0.200 or newer to build, and any target framework compatible with netstandard2.0 - so modern .NET, .NET Framework 4.6.1+, Mono and Unity are all fine.

Setup is one assembly attribute pointing at a config class:

[assembly: Sidemark(typeof(OTelConfig))]

public static class OTelConfig
{
    public static readonly ActivitySource ActivitySource = new("MyApp", "1.0.0");
}

…and then your //? and //! comments take effect on the next build.

Is this a good idea?

If the idea makes you recoil slightly, I understand. Try it on a service or two and see whether your code reads lighter afterwards. That’s the point - the lines that matter and the lines that observe them should not sit at the same visual weight, but appear as legitimate annotations scoped to individual lines of code. Comments are the most natural way to do that.

Source: https://github.com/davidwhitney/Sidemark
NuGet: https://www.nuget.org/packages/Sidemark

Seams in Software

Friday, 20 March 2026

One of the common themes in the talks I’ve been doing over the last 5 years is about how we have carelessly over-decomposed and damaged software - and I think this is because of a fundamental conflict between the surface area - the observable “size” of a system - and the desire of organisations to parallelise and accelerate work.

Software has a fixed surface area - a maximum amount of humans you can fit around it. Capitalisms voracious desire for parallelism directly conflicts with this, because there’s always an insatiable desire for more. More teams, more concurrent work, more progress, more everything. The only way to satisfy this hunger is to make the surface area of the software bigger, and we do this by introducing seams. You can imagine this like cutting a dining table in half to introduce two more edges to sit people around - the area doesn’t get bigger, but you can fit a few more people at the table.

Seams in software come in lots of different disguises, modules, microservices, packages, applications, outsources SaaS solutions - but all of those things have the same intent to a business - to subdivide the working set into parts, to fit more people around those parts, to parallelise the work. Introducing seams come with technical overheads: repository management, documentation, and assorted other complexities.

Sometimes, often accidentally, those seams are “good seams” - they align with fault boundaries in your software (things that change together, fail together), or they align with domains (“Customer”, “Payments”), or they align with specific concerns (“File system access”, “I/O”). Unfortunately, they are often “bad seams”, especially in Enterprise software. The seams are synthetic, subdividing a logical module into unnatural boundaries. This most frequently manifests as nanoservices, single functions extracted from larger applications without semantic changes and packages that should be parts of their containing app.

The appetite for business productivity introduces seams in the software with the goal of parallelisation of work - and in a majority of cases damages the design, legibility and performance of the software directly. This is the root of Conway’s law - that your organisational structure and your software always reflect one another, and either you actively engage in software design, or you’re subjected to (often low quality) design by side-effect.

The cruel joke in all of this though is the perceived productivity gains of “bad seams” never happen. This was the central thesis of The Mythical Man Month - “it takes 9 months to make a baby? shit, better get 9 women on it, I need it done next month!”. Even when the seams in the software are good (domain bounded, modular, fault boundary respecting subsystems and libraries), organisational entropy normally kills the effectiveness of parallelism in the work - you replace the “I can’t fit enough programmers around this problem!” problem, with the “I can’t coordinate all of these teams and divide the work in a sufficiently non-blocking way” problem.

Most of the parallelism really just resulted in technical debt - frequently dragging organisations over the line owning brittle, broken, systems where the cost of maintenance was 100x the cost of initial creation, and directly contributes to that programmers sense of “if only I’d got this right the first time we wouldn’t be doing this subsequent half decades worth of work”.

There’s a very famous misquote that was never said by Shigeru Miyamoto - that “a delayed game is eventually good. A bad game is bad forever.” - he might never have said it, but I think it’s pretty true in design, that slower progress and more competent design is more evergreen than rushed haphazard systems, and central to my belief that the only good systems designs are the ones where their ability to change trivially is their central design philosophy.

One of the most liberating things about the current capabilities of model-assisted software development is that it brings with it the ability to reason about systems as a whole, rather than isolated pockets of software. Thinking about the system as a whole over the application isn’t a new discipline, but with the industry having spent nearly three decades subdividing and introducing seams, it’s been increasingly difficult to even perceive a whole system, let alone reason about it. Model-assisted development is now in a place where we can take these tools - that are good at processing large volumes of data - and use them to pull back together the unnatural seams in a system, because with the assistance of these tools we’re able to reason about the whole, rather than be damned into the artificial subdivisions of large software enforced by organisational structure and entropy.

I’ve always felt that the correct amount of software to build is “nothing” if you can get away with it, because the cost of maintaining software is so high, so it’s interesting to me that we might have a technique for undoing some of the damage done to system design by having tooling that can bring all these wayward parts back together.

A Practical Tool

I’ve recently experimented with mapping out the entire constellation of software I care-take, trying to map and comprehend the seemingly infinite seems in both the software and processes around them and I ended up building software that instrumented all of our tooling and systems to build a graph of HTTP, Event and package dependencies between components.

It’s an obvious and sound technique, but I’ve always struggled with finding a good tool to do this kind of thing because each organisations software ecosystem is different. I built a thing, a bunch of scripts that talked to the systems we use and produced a graph for me, but it’s not really a tool that can be generalised or shared because it’s specific to our organisation.

Inspired by OpenAI’s Symphony - I instead realised that with the current generation of tooling, that this software didn’t need to be generalised, and instead could be released as a specification. So I’ve written a specification, of a tool that I built, so you can build your own that’s specific to you that might get the same outcomes in your environment.

It’s wild to me that we’re in a place where we can release “software as a design”, but I think it’s a novel application of the current generation of tooling where actual software would be too difficult to generalise, but a model-assisted tool could trivially build something for your exact setup easily.

You can see the spec for my System Map generator here.

The Programmers's Guide to Co-Designing with Agents

Wednesday, 11 March 2026

In this piece I’m going to lay-out my current patterns for working with agents in software development - there’s a bunch of preamble about why I think this is important, so if you’re just here for the what, feel free to skip to the “Co-Design workflows” section.

More mulch faster was never the goal.

I’ve watched a lot of people put their foot on the gas over the last few months and steamroll out a mountain of code using the latest generation of model-assisted tools. I’ve done it myself.

I wrote recently about the burnout that comes from indulging in extreme concurrency - running a swarm of agents, producing at a pace that outstrips your capacity for comprehension - and I think it’s worth unpacking why that approach, while intoxicating, is probably a trap. It’s something I’ve changed in myself over the last month or so to try stem the flow of blood and find, new, good, working patterns.

The instinct to parallelise everything is the wrong instinct. I think it’s a fool’s errand to focus on concurrency as your primary workflow. You’ll still end up with unfinished projects, but this time they’ll be unfinished projects that you don’t understand. This isn’t really a new thought - we’ve long understood that focus time for software teams always wins. Because of this, over the last couple of weeks I’ve taken to preferring what I’m going to call Co-Design with agents over raw parallelism. I think I’ve probably been stumbling towards these working patterns since the end of last year, but I’ve only recently started to articulate them and understand them as a set of workflows.

This isn’t the same thing as what most people seem to be describing as “human in the loop”. Whenever I see people talk about human in the loop, I see a pattern focused on after-the-fact PR style review of machine generated code.

I suspect that model is already dying under the weight of its own volume constraints. After-the-fact review will become arduous, long-winded, and ineffective as the pace of code generation accelerates. PR workflows in organisations will probably take longer to die that we expect because people will cling to their existing, familiar, illusion of safety. Pull requests - an adversarial technique for untrusted authors to contribute to critical codebases - were never designed for the kind of workplace collaboration they’re normally used it, and were always worse than code review and pair programming and we shouldn’t lament their death.

With some sense of symmetry, traditional pair programming with the machine is also better than after-the-fact adversarial review.

Focusing on raw output and concurrency is the same mistake it ever was, because quality subsides underneath it. Even if you personally don’t care about the quality of the output, even in a world where models are generating most of the glue code, quality still matters for software that has to operate reliably in production.

Many of these assurances on quality don’t map to one-shotting consumer grade “apps”, but they absolutely matter when you operate systems.

Context Matters with Regards to Quality

It’s important to realise that much of what good looks like when it comes to adapting to model assisted development in enterprise are echos of the lessons we learnt twenty-five years ago in the extreme programming movement. This is an adaptation of technique to new tooling. The people that were sceptical or ineffective at writing tests, doing TDD, verifying their code in automated, system-driven ways will continue to be resistive to these techniques and will end up with very poor, low quality outcomes.

The context of the kinds of change you’re working on - especially in business software - change how effective these practices are. Models are mostly good at remixing existing ideas - which might sound limiting, but it’s largely fine in business software where the vast majority of programming and systems integration work is remixed work to start with.

The inverse is also true. people just saying “give me code that does X” are going to receive poor quality results, because quality of specification always begets quality of implementation.

I wrote a talk about fifteen years ago about how the gulf of understanding between specification and implementer was the quality ceiling of all software. That gap defined how good the software could possibly be. This will play out en mass with low quality tool usage - the specification problem doesn’t go away just because the implementer is a machine. If anything, it gets worse, because models lack the social context and domain intuition to fill in the gaps that a human colleague would.

This isn’t a new problem. It’s common to all code generation, low-code solutions and other boilerplate-centric techniques.

Despite all this, it seems to me today that anyone that can’t get roughly 80% good outcomes from the current early 2026 frontier models is experiencing operator error. The tools are good enough. The question is the same one of technique.

The reality is that most people have never really cared about technique or code quality - this isn’t new either. The same people that achieved poor results before will continue to do so using new tooling. The accelerant doesn’t change the trajectory, it just gets you there faster.

If our quality goals aren’t changing around the software we build, we need patterns of work to support them.

Quality Begets Reliability

Reliable systems are readable systems, because readable systems can be understood and diagnosed. This has been true since the first line of code was written and it remains true when the code is written by a machine.

Context windows and managing them are currently the only tools we have to keep AIs grounded. Context windows and their token usage cost money - if you let your software design complexity explode with rote repetitive code, these tools that provide you an accelerant will diminish in effectiveness over time. The model needs to reason about your software to change it well, and if it can’t fit the relevant context into its window, or the signal is buried in noise, the quality of its contributions degrades. Your code quality is now directly proportional to how useful your tools are.

This emphasis on design is reinforced by the reality that the worst time to learn what the design of your system is is when you’re diagnosing it in a production outage at 3am. Operating production systems requires excellent telemetry and verification, and usually a reasonable working model of what the software should have been doing. Without that you’re relying solely on navigational aids when you need a map.

The reason why technical managers are often more comfortable with these tools than engineers is that they’ve already outsourced their understanding of what is real and concrete to their teams. This shift is functionally no different for them - they were already operating at a level of abstraction above the code. For engineers, it’s a much more visceral change. You’re being asked to let go of something you’ve spent years learning to wield.

Greenfield vs Brownfield

Here’s a counter-intuitive observation: greenfield projects tend to be more susceptible to agent slop than brownfield ones.

AI is often maximalist in its application of patterns and enterprise bloatware. Ask a model to build you a new service from scratch and you’ll get the most enterprisey, over-architected, pattern-laden monstrosity you’ve ever seen. It doesn’t have that implicit temporal trade-off that the best software designs are should be smaller than the problem space they inhabit to be effective.

Over-design still carries the same cost burden of maintenance - and you probably don’t want to start designing software that’s fit for organisations orders of magnitude larger than you are just because the training data says so. It learned from the internet, and the internet is full of bad software written by people that confused complexity with quality.

Brownfield contains established examples that help ground the AI. The models are good at mimicry, at following established patterns and being constrained by the things in their context window. It makes them unusually effective at performing localised refactors that look more or less like what your own teams would write.

This is why I continue to believe that using AI to mutate and reduce the burden of existing code is a much more fruitful use of this category of tooling - AI assisted refactoring, minification, optimisation, error and vulnerability scanning.

This is because the cost of maintaining and operating software has always been the vast majority of the cost of software over its lifetime. Creating software has always been effectively “free” for large categories of programs and systems, whilst mutating it has not.

If you’re looking for the highest leverage use of these tools, it’s not in writing new code, it’s in making your existing code better. Shipping features that nobody wants or uses at a high velocity is intrinsically zero-value work.

Co-Design workflows

So what does good look like?

The best outcomes I’m seeing in early 2026 come from engaging in what I’m calling Co-Design - a set of workflows where you’re designing with model assistance, not reviewing its output after the fact. This is in part a reaction to the trade-offs that the models aren’t currently good at making without steering. Current frontier models without guardrails often will err on the side of repetition, and performance optimisation, over legibility. This sometimes leads to software designs where modularity and internal boundaries are extremely coupled in sometimes obtuse ways because there is no incentive for the models to optimise for human legibility.

Many of the following patterns are compensating practices that keep the model “on the straight and narrow” as it iterates. There are patterns to coerce the agents to write human friendly, quality code. This may not be your goal, but it frequently is mine.

With that in mind, here are the working patterns I’ve found most effective.

Specify Ahead

Problem: It’s complicated and exhausting to perpetually context switch between different streams of work while agents compute solutions. The experience is similar to repeated context switching that prevents programmers from entering a “flow state”.

Solution: Instead of trying to multi-task different workstreams, focus on the immediate next change to keep in flow.

This is my most frequent behavioural pattern, rather than continuous task switching, I “specify ahead” of the currently monitored agents task. This works best when I have a list of small, incremental changes to a system or group of systems.

While the agent is implementing the first step in the sequence, I’ll be human-refining it’s next task.

Review While Iterating

Problem: The model is often good at making progress on a task, but it can go off the rails if left unchecked for too long. Waiting until the end of a task to review can lead to more significant course corrections later on.

Solution: A mixture of live observation with steering, and code-review per increment.

Review while iterating rather than review on completion is a more involved co-design process than “delegate work to an agent”. This reduces the cognitive load of context switching because as the agent returns for review and feedback, you’re only a single increment away from it’s current workload.

This feels closest to traditional pair-programming than async PR review - and involves a mixture of live agent monitoring with steering and interruption, and taking live notes to feedback as you watch the agent implement.

You are effectively the navigator in a driver/navigator pair with the machine taking lie instruction. It’s vital that you focus not just on output but on structure and design during this process - if you don’t, it looses all value and you may as well one-shot and review after.

Human Directed Refactoring

Problem: Agents will frequently over-design, under-design, or make assumptions of context that are present in their training data.

Solution: After every successful change the agent makes, you inspect the design (modules, abstractions and organisation) and do interactive refactoring with the agent driven by your own taste.

This is similar to traditional code review, but without the ceremony of “PR and wait”. Never be satisfied with the first shot. Focus on what could be simplified in the design, what could be removed from the design, and what could be done to drive up cohesion in the design.

This is similar to a traditional TDD “red, green, refactor” incremental improvement cycle with the agent as a “ping-pong-pair”.

Another variant of this pattern that’s most prevalent in brownfield refactoring involves hand-executing the kind of transformation you expect the agent to follow and directing it to those examples.

A good example of this is migrating between test frameworks, or re-writing tests to conform to a pattern en-mass. The agents will be much more successful if you hand convert one or more files, then indicate that it should mimic the patterns and conventions in your example, rather than trying to describe the patterns in natural language - which will inherently be more imprecise than coded examples.

Agent Directed Refactoring

Problem: The agent paves the road with large volumes of code that contain obvious duplication and module boundary problems

Solution: Agent self-code review and iteration.

One of the easiest and cheapest tricks is you can ask the model to check it’s own work for human legibility and maintainability. You can design specific agents or skills to do this work, but the effectiveness isn’t significantly different than just asking “can you see any opportunities to refactor your latest change to make it more coherent and human readable”.

As a general rule the agents will come back with a list of reasonable changes that will improve their output with a second pass. This is an essential step whenever an agent produces work of any volume before investing time in human directed refactoring, if only to make the work easier to navigate.

Scaffold, Tweak, Iterate

Problem: Greenfield projects are too blank page for most agents, and often you end up with designs far away from the complexity of their problem space.

Solution: AI Scaffolding, Human Tweaking, AI Iterates

I’ve long been a fan of Alistair Cockburn’s “walking skeleton” metaphor for agile system evolution, and the associated Pragmatic Programmer “tracer bullet” system design technique where you provide the feature-free skeleton of the moving parts of your system then incrementally “flesh out” the capabilities.

It’s valuable to engage in this process with an AI, especially in greenfield projects where you need to ensure the structural stability of the design before you let a model rapidly iterate out the details.

Directing an AI to scaffold a project (or using traditional code generation and scaffolding) followed by human intervention to tweak the modules and concepts often ensures a model sticks to the patterns presented in the emerging codebase. This is a powerful way to steer the model towards the design you want, and then have it iterate outwards from there.

Hand Scaffold, AI Expand

The complimentary pattern, for when you know exactly what you want the skeleton to look like, is to human scaffold, and use the AI to expand your target design before it implements features.

You establish the patterns in code or natural language, and have the AI expand from there. This provides a greenfield project with the same basis for mimicry that it would get from a brownfield project by effectively populating the context with examples before any significant unsupervised work happens.

Surgical Preparation

Problem: You’re working in an ugly, complicated, brownfield application that the AI cannot operate inside effectively.

Solution: Fix the edges before touching the core.

Repositories are often in poor states, and asking AIs to review for obvious problems so you can iterate on expanding test coverage and guard rails before other work is essential. Think of this as preparing the ground. You want the codebase to be in a state where the model can reason about it effectively before you start asking it to make significant changes.

There are a number of forms of this - you can go through several iterations with a model to construct documentation, indexes or pointers around the codebase in a /docs folder, you can use the model to refactor and normalise tests, to improve build scripts, to address toil in verification of changes, before asking it to modify the code proper.

Usually this is all the toil that already existed in your codebases, but the models are uniquely useful at being able to provide quick remediation to make later changes safer. As a Senior IC, a lot of my work involved these categories of cleanup. Fixing builds, fixing tests, making sure everything can run in memory, in isolation, on a local machine. Having a model do this category of work to “prepare the ground” for future changes is one of the most valuable applications of the technology because you are increasing verifiability.

AI Safety Checks

A partner with the above. Embedding a safety check in your prompts to ask the AI for its confidence level on changes before it continues can be a good hook for human review. Something as simple as:

Before making changes to this module, assess:
1. What is your confidence level (high/medium/low) that these changes won't break existing behaviour?
2. What assumptions are you making about the codebase that you haven't verified?
3. What tests would you want to see passing before you'd be confident in this change?

If confidence is medium or low, stop and explain what additional context you need.

This acts as a circuit breaker - a point where the machine pauses and lets you decide if it should continue. It won’t catch everything, but it catches a surprising amount.

When you’re working in more fragile systems, performing these kinds of sanity check first pass investigations can save you a huge amount of rework later.

Cross-System Change

One of the step changes for me in recent months has been stepping up a level when reasoning about code so I can move “top to bottom” quicker - from system design to software design in a single session.

To do this, I built a script to git clone the entire hundreds-of-component distributed system that I attend to, so that I can have the models reason about disparate parts of the system together. This is a step change, because I can modify multiple systems at once where previously I would have had to do expensive coordination work with different teams of people to orchestrate change.

I think this is the single biggest accelerant in software development from these tools because it addresses a foundational “surface area” problem in team topologies.

Why is this important?

Over the last twenty years we’ve exploded the edges in software to satiate the corporate world’s desire for feature parallelism. I wrote about this before - about how every subdivision of a system has a cost.

Some of these edges we introduced were “good edges” - fault boundaries, scalability boundaries, async boundaries - but many of them trended our designs towards terrible nanoservice over-complexity, and distributed monolithic design. We made our systems actively worse because we wanted to expand the surface area so we could fit more people around it.

This was good for fitting more people around the problem, but often didn’t actually lead to any real-world advantage because the software became worse and more complex, and also incurred the cost of team coordination. So for each subdivision, the returns diminished, and the toil increased.

Reasoning about the system as a whole is a salve and partial solution to this scale-created problem. But it runs against the edges of the capabilities of this technology because it quickly exhausts context windows.

Building Maps

The solution?

Build a map.

To help work at system scale more effectively, I built a piece of software that worked through our Infrastructure as Code, did deep code scanning for service connections, and ingested other information about service interconnectivity to build a graph of the relationships between systems.

This map is presented as reference to model assisted tools so it can more effectively answer the question “if I make a change here, what else needs verifying or is in scope of this change”.

There are many ways to answer this question depending on your ecosystem - mine involved walking back from our Azure ARM API, through our deployment tool configuration and scanning code configurations to construct a text based map.

Of course, model assisted tooling can easily code-generate the kind of glue you need to build something like this for yourself. These maps are slow moving so they don’t need to be perfect, just mostly good enough to signpost which directories your model tooling should analyse while containing the “context sprawl”.

Can these workflows be automated as agents and skills?

Yes, it seems like they can.

Consider the skills as your canned prompts and your agents as the guard-rails for how they interact.

The challenge in “making everything a skill” is that you fill up your context window with a lot of low-level instructions that are situational, so the agent needs yet more instructions to do lots of narrowly scoped repetitions coordinating between different subsets of skills.

The agents are quite good at working out when to apply skills, but given the absence of taste, they often won’t highlight things that hamper human readability.

I’ve not yet witnessed an agent writing code that “looks and feels good” in the way that well-crafted human code does - they’re very good at paving rote repetition and procedural code, but the design of the thing, the form, is usually absent.

This is fine, because the Co-Design workflows above are designed to compensate for exactly this. You provide the authorial intent, the model provides the throughput.

Pair and Mob Co-Design

I suspect the future of software teams might look closer to the historical “masters and apprentices” models, where one experienced practitioner works with a small quorum of understudies who change systems together. I wrote about how teams could stay ahead recently, and I think this is where that thinking lands practically.

I suspect this might beget mob-programming style co-design sessions as teams engage in continual code review - which is really just design - refinement and specification. This is probably the “XP” of model-assisted programming in a team context. The same way that extreme programming took the best practices of the time and said “what if we did these things all the time”, focusing on co-design asks “what if we paired and mobbed all the time, but with machines as well as each other”.

The mobs will likely be smaller than the traditional “two pizza team” standard that has emerged over the last decade, and be closer to a “one pizza team”. Navigator-driver techniques remain from pair and mob programming, and they translate naturally when your “driver” is sometimes a model.

This is an effective model to keep what is essential in software - reliability of operators, shared understanding, and good design - while leveraging the accelerant of the new tools.

The through line

None of this is new - it’s all incremental. The XP movement told us twenty-five years ago that the answer to better software was tight feedback loops, continuous testing, pair programming, and a relentless focus on simplicity. The tools have changed but the direction should persist.

Quality of specification begets quality of implementation. Readability begets reliability. Focus begets understanding. These were true when we were writing C in the 90s and they’re true when driving agents in 2026.

Same as it ever was.

Existential Dread and the End of Programming

Tuesday, 17 February 2026

Dear diary - the temperature has changed.

Everything feels like it’s changed really and I’m not 100% sure how to feel about it.

I was speaking to my manager last week and he asked me about why the temperature seemed to have changed and I think it’s a really good way of speaking about this moment.

Last year I wrote a post with some general predictions about the effects of generative AI on computer programming, and in the time since we’ve watched the gradual improvement of models, open-source models start to “get good” and more importantly an entire category of tooling evolution that powers what is currently the state of the art in model assisted programming. And oh boy does it make me feel a lot of things about programming.

I’ll start with the clickbait:

I think programming might be over?

And I’m kinda sad and angry and excited about it in equal measure.

There’s been plenty of headlines like this over the last couple of years since Copilot launched, and the tools that followed. I talk to a lot my peers about this almost daily - about the hype cycle, and what it means for people who write code, and what it means for the industry, and the world. And I’ll be honest - for the past couple of years the sense has been “yes, these are useful tools, we’re not quite sure if the juice is worth the squeeze, but the tools are clearly moment-to-moment useful and they sometimes are so incredibly dumb”.

Nobody outside of the hyperbolic vendor headlines was really thinking “hey, this is an existential threat actually”, but I’m starting to think that maybe it is, and that we’ve marched down the path from the piece I wrote last year much quicker than even I expected us to.

You know the thing that the marketing said last year, about Copilot being able to automate most of your programming job, and then you tried it, and it didn’t really work and you just got on with your day?

The temperature changed. It works now. The slow, onwards march of technology took hold, and it works now, and it wasn’t really model innovation that did it, it was boring old programming, and model assisted systems.

In late 2025, just as everyone was signing off for Christmas, the latest generation of frontier models hit the market (Anthropic’s category leading Opus models, Google’s Codex model’s) and in conjunction with a significant uplift in the tooling around them, the promise of effectively autonomous developers is now reality.

Anthropics Claude Code, OpenCode and GitHubCLI, have effectively commoditised the market in software creation almost overnight. You can now ask your computer to build a thing (at least in existing software) and basically leave it to iterate until it’s done. It’s not particularly expensive, and the changes are generally at least of the same average quality as a blended skill Enterprise development team.

It kinda feels like it happened overnight - that moment when it all suddenly got good. But it’s here, and it’s real, and if you tried these tools last year and thought “huh, neat trick, but not good enough” you owe it to yourself to just give opencode a go. It’s night and day. Category changing workflow stuff.

So when we’ve rolled around to February 2026 and I start seeing hyperbolic sounding pieces by the FT, or quotes from Microsoft’s AI head saying words to the effect of “all blue collar jobs will be gone in 24 months”, I think I’ve gone from being an AI moderate who thinks “hey, these are useful tools but the human in the middle is central” to maybe thinking that there’s a non-trivial chance that that’s actually a reality now because I have seen it with my own eyes.

If you could see what I’ve seen with your eyes

So I think I’m about 6 weeks late to the party here - I was busy, I took mid-December to mid-January off work. I didn’t really touch a computer for anything other than playing games, and I tried to keep my head away from “the discourse”. One of the most troubling things about considering AI is the number of agendas, snake oil salesmen, and people that get filled with incandescent rage at the mention of AI that are already in the room. It’s exhausting trying to just work out what the fuck is going on in the cacophony of it all. But while I was away, everything changed and the tools got good.

So when I got back to my desk and decided to really take these tools to task in anger in their latest incarnations, with the latest models, I was astonished how far the goalposts have moved. It’s just so obvious to when you listen to the kinds of voxpops coming from the organisations now. They’ve stopped talking about AGI, because I think everyone has collectively realised that they don’t need to chase that unattainable beast to be category defining anymore - in fact - the models don’t even really need to be any better than they are now.

This new category of tools appears to outperform good-to-average engineering teams at almost all rote tasks, and excels at deep algorithmic, bounded, or deeply technical work. It didn’t require any real trickery, the current workflow just requires a bunch of iteration and increasing context management between the users computer and it’s ancillary tools via MCP, and a capable model. I’ve spent the last month immersed in real-world experimentation solving problems with these tools that would have taken me weeks to months to solve by hand, in minutes-to-hours-to-days. It’s real. I’ve seen it. I’ve shipped it to production.

Here’s a real world example - I was shipped an extract of some mainframe data for a project. No schema, no specs, just an encrypted and compressed backup and a Win32 data viewer tool designed for computers with a minimum spec requirements of a Pentium 233mhz computer. I set an opencode agent using Anthropic’s Opus 4.6 model the task of working out:

What the format of the data was
How to decrypt and decode it
To build a C# program to stream decoded data
To render the contained files to PDFs

In absence of any schemas, file format names, or information of any real sort, I handed it a single PDF file exported from the tool that was supplied by way of comparison.

Within the first twenty minutes or so, it had run a bunch of heuristics to detect that the file was compressed by the presence of repeating block patterns, and as a result had a lead on the compression algorithm.

I suggested it investigate the Win32 tool.

It spent about 30 minutes writing python scripts over a disassembler to decompile the executable, annotate the assembly code to isolate where compression and encryption occurred and identified not only which compression scheme, but via extracting strings from the executable identified the 1990s era libraries that were used to encrypt and compress the data in the first place, allowing it to instrument, extract hard-coded keys, and unpack the file.

Over the following two days of occasional prompting, using visual diffs with the one sample file, it constructed it’s own renderer to build up output files that interpreted ancillary data shipped with the tool and produce compatible rasterised files, snapshotting diffs as it went.

I’m pretty convinced that if I’d handed this problem to any engineering team I know, they’d still be sifting through bytes trying to work out what kind of file they were looking at. I know, because I’ve done this kind of work by hand before - it’s all possible, thankless, detailed work that takes a huge amount of time, not a couple of hours while I was doing my emails.

So I think programming in the traditional sense might be kind of over?

Not a good enough example?

While it was doing that, I had a couple of other agents finish off a few languishing side projects that I’ve been working on for fun. I’m much happier with the code in these projects than I was expecting because the model can trivially mimic my own style based on the context of the rest of the program. It followed my design without me even asking it to.

Still not good enough?

At the same time as both of those things were happening, I was also running another session to build an interactive architectural map of the systems I oversee, using metadata and traditional static analysis to get there. I’ve wanted to do this for years but never had the time. It was finished by the afternoon.

I’ve never been more productive.

I’ve never been more exhausted.

I’ve never been more addicted to building now that it feels like all the constraints are gone.

So I open LinkedIn and see another rote headline written by someone that hasn’t bothered to even try anything like this, about how the AI bubble is going to burst, and it’s going to all melt down, and this will all never work and I just… don’t believe. This is category changing stuff. It’s here now. It works.

And you know what, even if the back fell out of all these organisations, the union of tools like opencode, Llama.cpp, LM Studio and Alibabas open weights models mean that the genie is out of the bottle and it’s never going away. You can run this stuff on a £2,000 MacBook locally at about 90% of the quality of the frontier models. Maybe the big vendors won’t survive but what we have is already category changing enough.

The software doesn’t even have to work perfectly, or be one-shotted for there to be a drastically reduced need for programmers to fix up what’s left. Most software is buggy and messy, it’s unlikely to be worse. The vast majority of software is a remix of a concept or a well worn idea that it probably doesn’t even need to be original to be good enough - most software isn’t original works.

This is a heartbreaking thought for me

I’m simultaneously having the time of my life building and am sat at my computer with the dread of looking at the end-times. Writing a hubristic AI post would be all the rage if I had an AI to sell you, or if this was good for me in any way, but it probably isn’t?

I’ve been writing code since I was 11. I can comfortably say that it’s a foundational pillar in my identity. I am a programmer. I am an artist. I use code to effect change in the world. I think in terms of systems. This is literally who I am.

I’m not being hyperbolic - you can go to YouTube and find hours of talks, podcasts and writing I’ve done about finding myself through creating things with software. I wrote a talk called “Decades in the Machines” about finding meaning and purpose in the work. I wrote a talk about “Intentional Code” about treating code as literature. I’m travelling the world this year with a talk called “Meditations on Code as Art” about seeing the humanity in code written by people which captures the political context it was written in.

I love programming.

And I think programming might be over and I don’t know what to do with those feelings.

There are plenty of other people who are reaching the same kinds of conclusions that I am, and over the last few weeks I’ve seen a bunch of “oh code was never the point, it was all about how we could make things or add value to businesses”.

I love making beautiful things with computers and the mechanical pleasure of doing it, so I think this shift in the commercial dynamics of software might well be the end of something I want to be doing for the rest of my life.

I’m also kind of, tired? I’m addicted to the feedback loop, but working as a conductor in the middle of a flurry of agents is a different kind of mentally taxing. When you first start programming, everything feels bewildering and unknowable, every rock is an infinite black hole of complexity as you lift in, it’s disorientating.

This last month deep in with the agents I feel like I’m a baby dev again - because the pace at which the teams of agents swirl around me producing new code outpaces my capacity for comprehension. I literally cannot keep up. I can’t understand the work at the pace it’s being created. It’s breakneck, and I feel like a child again trying to fit it in my head.

I’m pretty sure that pace hampers my ability to make good design decisions and find the form in the thing as I’d like it. But in the war of objective, not method, pace wins. When I first drank the agile kool-aid in 2005 I absolutely believed that pace of iteration was the thing that made great software, and I absolutely believe to this day that the best software designs are the ones that you can change the easiest. But man it hurts.

And what if we’re now in a place where how well the software is designed has literally no impact on how easy it is to change or not? What if it doesn’t matter - this thing I care about - this thing I spent so long obsessively trying to be good at. What if it just doesn’t matter at all?

I feel like a painter at the dawn of the camera. Not yet irrelevant, but my interests are probably now a niche, even if the rest of the world hasn’t quite caught up to that thought yet. I will still do this as long as I live, but what I do to live is probably going to have to change.

If I can’t paint for my supper, perhaps I can still compose a great photo.

An existential threat to all business

I think businesses should probably feel more threatened by this than they are.

If the bubble doesn’t burst - and I don’t think it will - and the frontier models stay about 10-15% ahead of the current open-source and open-weight models, this will represent the largest shift of knowledge work and the associated profits towards the hyper-scalers we’ve ever seen. We might be looking down the barrel of the gun of outsourcing all programming to a small handful of companies - and that sucks for everybody.

And really all of this basically calls into question if software itself really has any intrinsic value.

You might have noticed the existential dread all the venture capitalist firms are currently experiencing, because the thing that’s worked for them predictably since about 2005 - building SaaS products and capturing market share - is basically a broken model if anyone can generate low stakes business software to solve the mostly trivia automation problems their solutions solve.

When code is almost free to write, it has no value at all. And subsequently, all code written with a tool that literally everyone has access to is intrinsically worth nothing. It challenges traditional software economics where the conventional wisdom is you should buy anything that’s outside of your core expertise, and build what is, because building entire categories of traditional enterprise software might actually be cheaper to own and maintain than licensing it.

Conversely, operating software reliably, being a good custodian of the data that it trades in, and cultivating organisational knowledge that can be “sold” via integration to provide context or operations for the models that are eating all the low-stakes software is probably priceless in this emergent economy.

We’re perhaps seeing the death of “algorithmic programming” in all but niche and specialist cases - because we finally found a general purpose algorithm for data processing and it’s simply a “good enough” statistical model of the world. I’d be remiss not to highlight that “the map is not the territory” - a statistical model of the world is just a model, and will not conjure accurately all the time, but the reason it works so well in software is that software is a constrained, well documented problem space, with almost infinite training material.

While trying to solve for general purpose data processing we accidentally solved programming by mistake.

This leads me to think that the future of traditionally slow moving enterprise software is to be the fastest to adopt new practices, while keeping data sovereignty, correctness, and high availability at it’s core - some very traditional programming disciplines.

It’s quite likely that the future users of software, APIs and data that you produce is likely to be agent driven integrations, so when that’s the case, whoever is fastest and cheapest will win the business. Your software is effectively operating in a “price comparison site” style ecosystem, where the best ranked, highest available and most compatible API or skill wins the business.

With some irony, all categories of products that are about web discovery (like price comparison websites) are probably dead - there’s no purpose to algorithmic curation at the behest of a third party when your own agent can do that job for you trivially from available public data sources.

A year ago I was hopeful that this shift in the landscape would see teams not having to expand by working harder (exponentially working with more people) but instead by working smarter (better tool use by existing teams) - I think as I watch the kind of productivity increases that talented individual contributors can manage with these accelerants the more I think we’ll be doing “more with less”.

Less people, less overhead, less money - but it’s important to remember that the code was only ever part of operating a successful technology business. Keeping systems online, on-call rotas, support, escalation, outage support, trouble shooting - that stuff all requires people regardless of how good and self-healing your tools are.

Code output was only ever part of the problem.

For myself I worry that the shift in focus might result in the job of a software professional being more… boring? Us generally doing the less interesting work and coordinating tools, but I think there’s also a great opportunity for small teams to have much wider impact in organisations if they change the way they interact with software, I’ll come back to this idea later.

I think business software will be able to adapt to survive this due to the accountability wrapper that business support provides. Much of the value in business software is the liability shift between the consumer of the software and the provider. If they make a mistake, it’s their fault, regardless if it’s your outcomes. Consumer software on the other hand?

I think it’s probably going to be a bloodbath.

My not-so-outlandish prediction is within 12 months, Google will make a call on putting “app generation” into Android. The Android store is notoriously not quite the cash cow that Apples high value alternative is, with the quality of models available today, every single trivial app you might want “hey can I track my shopping list”, “hey track my workout”, well, why not let consumers generate their own apps, with their own features. It’s low stakes, they don’t even have to be reliable, and they can probably put ads in them.

Once that happens, whoever is first to get there breaks the back of any non-significant consumer software. There will always be a niche. There will always be apps. But the days of launch a small little app that does some neat integrations and makes some money to start up a business with are probably gone. Your agent will do it for you. Hell, it might just use it’s own memory to perform the task and keep a log without you needing any new software at all.

Where does that leave enterprise

I think there’s a truism - people often think of software as an asset when the real value of software comes with the extreme cost of maintenance.

Software costs more to maintain than it does to author - this has always been true - and having more software is not better than having less software. Much of the bloat in the enterprise IT stack comes from trying to solve for second order human factors - we subdivide systems, to fit more people around them, to relieve the maintenance burden, and parallelise the work. Then the larger the systems get, the harder the integration becomes and actually the initial burst of parallel productivity doesn’t really hold.

The biggest opportunity in Enterprise, is to be able to reason about these large, distributed, maintainer hungry systems as a cohesive whole powered by tools that let smaller groups or single individuals reason about vast systems. I think if one of the side effects of this is we have denser, more feature rich software that doesn’t quite require unnatural subdivisions to allow humans to reckon with it then perhaps that’s actually a good software outcome.

The most nimble organisations are going to work out how to use these next generation model assisted development techniques to build faster, surrounded with human guard rails and speed limits.

The human cost

My profession is about 55 years old and change in it’s current form, and this is the first time I’ve thought “wow, maybe the way we do this is going to be really different soon”. But I’m worried. I’m worried about the extractavist nature of the technology. I don’t know where the next generation of software experts are going to come from that can better guide these tools when they inevitably make mistakes. I don’t know how, or even if, the programmers of the future are going to have enough time under the desk, to do enough repetitions, to learn what good taste in software development looks like.

The thing that’s be noticable to me over the last two years during the nascent phases of modal assisted development is that the people that get good results from these tools already understood the “lingua franca” of software. If you know how to form language to reason about software, it stands to reason that the outputs you get from a human-language interface (which to be clear, is less precise, but more expressive, than a programming language) are going to be better than someone that doesn’t understand the “meta-language” around the profession.

I think this shift away from low level software competence will have to be compensated by a renewed focus on the large scale design of systems - because without correct instruction, or conducting, or producing the systems generated by these models will atrophy in relatively low quality.

It seems to me that what “counts” as low quality will probably change over time, shifting from “code that can be comprehended by humans” to “code that fits in increasingly small context windows, or can be subdivided to do so” as that will enable a more rapid tool driven iteration. I’m not really sure I like this, but it seems like an inevitable second order effect - it doesn’t really matter what the code looks like if vanishingly few people will ever read it. If you don’t like it, you just throw it away. It’s the microservices dream writ literally large.

Regardless, I think it’s going to be hard shift for people to deal with the deluge of software change. Programmers are going to struggle if my own experience of extreme context switching is anything go to by.

I hope organisations still understand the value of experts in programming, because those that invest and grow great engineers are going to be the organisations that operate software the best, and optimise it the best, and can correct it when it strays from the path. Even if “code might be over” I’m not sure programmers are just yet.

While so much of what we do in software is remixing existing concepts, innovation isn’t going to come from an existing corpus of information, but business innovation might. You’ll still need those experts if you want to do something actually unique.

Is This The End of Programming Languages?

One of the weirder (and personally heartbreaking) second order effects of a shift towards machine generated code is probably that programming language innovation will dry up over time.

Don’t get me wrong, people will still make new programming languages, but I’m not convinced anything will ever reach mainstream adoption when the criteria for mainstream adoption becomes “can the model write effective code in this”. We’re probably generationally stuck with the languages we already have, and incremental and backwards compatible improvements on those languages.

The models work best with JavaScript, Go, Python, C#, Java - mostly as a direct side effect of the amount of training data available at the point of training. I’ll be surprised if other languages ever manage to cross the chasm, because there just won’t be enough data to encourage widespread adoption.

This really means that the only real chance for a new programming language to “make it” is for it to be attached to a notable, important, human first project - everything else will probably be damned to enthusiast niches.

Maybe that’s a good thing - it’s felt for awhile like we’re at this wonderful point of programming language conversion where everything is basically good enough now, and everything works, so maybe we did it, we solved programming in it’s current form. There is no next generation language, there is no replacement, just steady evolution. Nothing will kill React.

How to stay ahead

It feels like the software teams of the next 5 years aren’t going to be doing the same things as the software teams of the last twenty, and this is probably an inflection point for re-invention.

As strong individual contributors learn to reason about distributed systems as a whole, I suspect the days of surrounding components with teams to “look after them” is going to be a thing of the past. We’ll need enough people to be on-call and reason about a system but we’ll finally be able to pull apart the biggest resource sink in modern development - cross team coordination, planning and orchestration.

I suspect we’ll see small teams of very technical people with business context doing wide-ranging change across large systems, reasoning about the systems in a tool assisted manner. The tools will help engineers understand the context as they direct and orchestrate changes that range from front-end to back-end.

These engineers will need to have taste, and they’ll probably be involved in early hand writing of some categories of code to establish patterns for the machines to follow in the first instance, but likely will accelerate to the point where traditional workflows of pull-requests and reviews don’t make sense when faced with the pace change can be made.

This will logically lead to more “continuous delivery” and “continuous testing” style systems that get piloted and automatically promoted and rolled back, rather than reviewed. This will scare a lot of organisations that already suffer with anxiety associated with continuous change, but the pace that model assisted development enables means that if organisations don’t get on-board with this approach, then their competitors will.

Good, supportable, scalable software still matters - because organisations will be judged by how well they perform when interfacing with automations, that instead of being a side-line for a business, will be the default way many people operate with their products.

What are the bets

Any platform that doesn’t have cohesive predictable APIs will die
Adopt Agent-to-Agent Protocol for interop (A2A)
Adopt Payments Protocol (AP2)
Invest in making web-content more agent friendly (serving Markdown as alternative content types)
Equip small teams with frontier model tooling and the entire source code of an organisation to reason about as one cohesive unit - insist on permissive support teams shipping code changes at pace during the transitionary period
Invest heavily in automated testing - even if it’s agent generated, because it’ll probably be the only testing that exists
There will likely be a new role - the “full stack engineer” of this next leap. Maybe it’s “Software Producer”, maybe it’s “Product Engineer”, maybe it’s “Software Designer” - but it’ll likely be a job where the expectations are much more broad than they previously were.

I expect I’ll still be programming for the rest of my life, but I’m not sure what the industry around me will look like as I do it.

Context Transference

Friday, 30 May 2025

One of the hardest refactorings to get right is the balance between extract and inlining of functions.

Often people struggle to understand why sometimes when they “do the right thing” and extract a function, their code quality feels like it decreases - this is actually one of my biggest criticisms of Clean Code as a book - it’s rife with low quality method extraction.

I think that feeling comes from the weight of “context transference”.

Context transference is the amount of information you have to pass between boundaries, and method extraction where 100% of the callers context needs to be transferred for the new function to be meaningful is a poor refactor. You don’t reduce any cognitive load - it’s a failure of encapsulation.

In object oriented languages this is a more common problem because the context is often scoped to a class, so it’s implicit which parts of state are used in any given function, and as a result this transfer of context is equally implicit.

While it might sound like object scoped state is actually a solution to this problem, it isn’t, because the transfer of context is actually the amount of information the programmer has to mentally track during code reading, not the information a computer has to track (because functions exist purely for the programmer) - it’s a design time concern and rarely a significant runtime one.

Extraction refactorings should operate on a strict subset of the context of the parent function to be meaningful and not detrimental to reading code, as a general rule.

(Standard disclosure: generalisations include exceptions by default, I’m sure there are many cases where this doesn’t apply)

7 AI Predictions (AI, We Really Need To Talk: Part 2)

Thursday, 29 May 2025

Before we get started - I’m working on a large paper on the current hype-state of AI, what’s actually real, and what’s hubris along with the slow steady march of progress that is actually happening. This is an extract from that work.

It’s also a bit of a “Star Wars: Episode 4” moment - I’m publishing the middle first - 7 predictions (that might age badly), but I think they’re interesting enough to publish before I finished the rest of the paper. The full thing takes a deep and introspective look at both the state, and the ethical problems we have with the current wave of AI. So if you’re about to “well actually” about some ethics thing or another, it’s cool, hold your breath, that’s part 3.

AI people, we really need to talk

I am not an “AI builder”, I’m a systems builder, and a programmer first and foremost – and I’m explicitly an “AI moderate”.

What I mean by “AI moderate” is that I think AI is simultaneously one of the most interesting and exciting things that’s happened in my technical career, but equally one of the most overhyped, misdescribed, poorly marketed and generally misunderstood pieces of technology.

AI is difficult to talk about because it’s become such a poorly debated, thoroughly misunderstood and quickly changing space that it makes understanding what is real and what is hubris hard to grasp.

It’s in the best interest of the people trying to sell you AI to over-hype it’s real capabilities, but dismissing it outright is a foolish thing because despite the ghouls that chase the tail of technology trends (Blockchain, NFTs, et al) being a large cohort of the people that are chasing this trend it’s truly not the same thing.

That’s easy to see when you look at the people really involved in building out and betting big on this technology. Neural Nets are not new, ML models are not new, transformer models are a little bit newer. Exceptionally smart people have been working to this point for a long time – this isn’t a get rich quick scheme – it’s decades of research and a huge amount of investment that’s been slowly rolling onwards for the last two decades in its current form.

This is all set against the controversial backdrop of the training data that’s processed by large organisations to produce vast general-purpose models that pushes against the edges of existing laws around fair use, people’s personal ethics and challenges human exceptionalism. I’m explicitly omitting the ethical discussion from the first half of this piece so I can dedicate the whole second half to it. Discussion of AI cannot exist in a social vacuum, but it also shouldn’t overshadow factual discussion.

This is my attempt to try summarising where we are, where we’re going next, and hopefully sift through what is real and what isn’t about the AI-hype.

There’s plenty of bad faith critique and evangelism in this space, so I’m going to try and neatly side-step both of those things in the following ways:

I have nothing to sell you.
I have no vested interest in AI other than having to hold the pen on a platform strategy that has to exist in the same universe as it.

But here’s what I think is going on in the industry and why.

7 Predictions - What’s next?

Let’s start with the big claim – there’s not going to be “AGI” – artificial general intelligence – in my lifetime.

The Star Trek myth of the sentient computer with the personality, the ghost in the machine, the AI of sci-fi which we have ethical quandaries about because it might be alive? That doesn’t exist.

It’s science fiction given what we’re currently working with, and each time you see another click-baiting post towards that thing its people clowning themselves. Obviously “never say never”, but that’s not where we are, and not what we’re building right now.

On the other hand, it’s exceptionally likely that we’ll have a collection of technologies and systems that integrate in such a way that if you squint at them might look, to the amateur, like we’re progressing towards that thing. People will absolutely be selling you “AGI” sooner than you think, but it’s going to be far more traditional than you expect.

I want to share my 7 predictions about what the next 5 years of “AI” looks like in practice:

The Future of AI is LLMs on the Edge, blended with traditional systems integration
The Future of Language Models is “Small Language Model Expert Systems”
This approach will lead to a renaissance in standards-based RESTful online services and model integration technologies
Websites and apps will decline in lieu of “Assistant Computing”
The building of those traditional systems will be AI assisted
Large models either have, or will soon plateau and won’t get drastically better
Software development jobs will change, but aren’t going anywhere

I think that until such a point where we start to have something “more” than three LLMs in a trench-coat, that a more honest name for what’s currently happening is “Model-Assisted Computing” and we should have probably used that rather than the hubristic “AI” and “AGI” naming. Hell, we could have even called it “MAC” for short.

I think if you start to draw a through line from Web 2.0 through smartphones and to the current rising tide of model-assisted computing, then you’ll realise that this is where we’ve been going the entire time and it’s mostly just a continuation of the vision of the web.

Here’s how it’ll happen

The existing wave of LLMs have shown that they’re exceptionally good at fuzzy matching human input. They’re statistical transformer models that predict output given an input, making them pretty good at what you’d traditionally associate with “Q&A” based on a training set. They’re really the next evolution of Google Assistant, Cortana, Siri and Alexa – the thing on the edge that can turn natural language questions into commands that need to be fulfilled.

As an industry, assistant-lead compute has been a thing since 2011 when Siri launched but was popularised arguably by Amazon’s Alexa in 2014. We’re 15-years into this and LLMs have given us a fuzz-matching technique that’s just plain better than defining platform-specific “skills” (to use Amazons terminology) for systems integration.

These large models will marginally improve over time, but they won’t be good at doing any hard or detailed work at all because they are purely statistically models. Most of the ignorant critique of LLMs focuses heavily on this point – that the models are “wrong” – because they’re not even trying to be right. Over the past 16 months we’ve seen the rise of Retrieval-Augmented Generation – a technique that interpolates data from data sources (frequently vector databases) into the outputs of LLMs so that they can source factual data.

RAG was the first step, followed swiftly by plugin models in GPT, but both of those things are rapidly giving way to two standard protocols – MCP – the Model Context Protocol, and A2A – the Agent-to-Agent protocol. Both protocols go some way to systemising RAG, exposing tools and resources for models to call out to, and wrapping models in web-standards for authentication and discovery.

When we get this right, the accuracy problem is solved – language models are used to interface with humans, and protocols revert to traditional systems integration techniques to perform operations and source facts, effectively giving us the best of both worlds. This isn’t speculative, both protocols have been in rapid development and adoption over the last 6 months and are probably the future of “agentic computing on the open web”.

What does this mean for builders? Back to web standards we go. The easier it is for the thing you do to be described as APIs, and metadata, and commands, the easier it will be to context shift into MCP and A2A workflows that interact with large models that by default will be enabled on everyone’s pocket devices. We’re going to be here in the next 6 months.

Layered on top of that, A2A offers some /.well-known style service discovery for agents – a place for you to define the operations that your top-level domains can provide. It’s an incredibly small leap from here to realise that this will eventually evolve into something close to a DNS registry of things that the runtimes that host large models have access to, which in the most open-minded place we can be is a great thing for services on the web (“hey Siri, check my bank balance for me”) and in the darkest places provides the platform operators of those large models a vehicle to deeply integrate, but also probably levy an app-store style tax on your systems from the outer edge. Still, a globally discoverable, automatically integrate-able set of commands across the whole internet is a wonderful enabling technology.

When we get there, people will start trying to sell you this as “AGI”. Probably. Because to the amateur eye, it might kind-of look like it.

The second order effects? The read-only portion of most webapps will sink under the substrate of agent-computing. Within a few years, it’s unlikely people will be opening your app or webpage to check on data. They’ll obviously still come to rich experiences for content (the web or apps aren’t going anywhere) but purely transactional things – “buy me that cinema ticket”, “check my bank balance”, “do the simple X” – anything that can be wrapped in a one-time step up auth flow, probably will diminish in importance and only rich interactive content will survive on the screens.

That’s going to be the consumer experience. It’s easy to doubt this now because plenty of the existing implementations of these things are rough approximations that absolutely suck (everyone can make fun of googles bad AI search, for example), but this isn’t that thing. This isn’t “can you work out how to sift through this data”, this is “I’ve told you exactly how to do this, follow this well-known protocol to do this well-known integration”. It’s how all your apps work today, wrapped in a thin veneer of language models to protocol shift your requests.

But it’s coming, because the technology mostly works now and the user-interaction is the place we’ve been trying to get for decades. Frankly, also, once you step back, it’s also good computing – computing that slips under the substrate of all the technology and returns to a place of magic. It’s good UI design – no UI.

What does this mean for technology vendors?

Just keep on keeping on, in a sense. Since Web 2.0 and the mobile app revolution, services that don’t provide APIs to deeply integrate have been F-tier ghetto services that people hate using. Unless you get better at systemising your… systems and meeting the market where it’s at with good machine-to-machine APIs, your business will eventually die.

But the reality is that most technology businesses are already living that experience in real-time. This isn’t new. What is new is that we’re going to see a cottage industry of language models and agents trained on proprietary in-house data sets wrapped up in APIs and sold as agents that the assistance-driven compute services can interact with like a marketplace (you can hear the mega corps salivating at taking their 15% already).

Organisations rich in data will realise that the same expertise they used to sell with humans can be synthesized and sold as commands and tools for these models to interact with, scaling their business.

From an engineering perspective? There are tonnes of APIs to be built. You’ll probably be using Copilots to help build them quickly, but you’ll still require more engineers than ever to operationalise them and make them work. This mirrors the last decade of real-world challenges in operationalising data engineering and machine learning. It’s not easy, it’s buggy and requires a lot of focus. The AI revolution won’t replace programming jobs, but it might take some of the repetitive work around the edges away.

We’ll see a rise in organisations operationalising “Small Models”, trained on their own data, and exposed as agents. We’re also going to see the platform vendors of LLMs training specialised small language models that can be run on local compute for domain specific tasks. This will partly be in response to the market asks (more secure, domain specific), but also as a loss leader into their platforms.

This is based on the fact that today, large language model vendors have working proof that they can train smaller targeted models from synthesised data that are as effective as LLMs in a lot of domains.

The large models will plateau – we’re already seeing this now. Compared to traditional software development, model development is a lot less deterministic. Successor models aren’t always strictly “what we had before but more”, and the steeper the climb becomes, the more we will rely on LLMs connecting to constellations of specialised models and tools to do detailed work. There’s a good chance we’ve already hit near the ceiling here given the current struggles to get the next-generation models out of the door. The cynics all seem to think this is “the end of GenAI”, but what it actually is the general point of utility where the model we have today become operationalised in more interesting ways.

Finally?

Well, engineering jobs will change. People are going to get over the hiring malaise (“surely we don’t need these noisy nerds anymore!”) and realise that the biggest challenge in software isn’t writing it, it’s operating and maintaining it. Software organisations will reach the conclusion that using the models to help maintain, remove, reduce and optimise code is a saner and more sustainable path than just pouring more code onto the tyre fire.

Just having more code doesn’t help anyone.

I think this is the obvious through-line, the predictable end of the path that draws together what we dreamt of for the internet (connected services exchanging structured data), the smartphone era (everything is an app), and the frontier AI fever dreams (we’ll make a sentient machine!) into something that’s both obvious and almost real today.

And everyone will say we’ve got AGI, and it’ll still be bullshit.

Footnote

For those about to reach for the comments section, the full paper looks like this:

The State of “AI”

I am an AI Moderate
Bad Faith Critique
The Precision Problem
Current AI tools are a better hammer
7 AI Predictions – what’s next?
What does this mean for programming?

Ethics and the Adoption of AI

Technology is a labour concern
AI is incapable of art
Reactions to your understanding of “value”
Capitalism and AI
Open-Source and AI
The real-world costs of training and operating models
A model trained on the web should be given to the web

Thinking Machines

What if intelligence is a latent quality of data?
Discovering a general-purpose computing approach

So do wait for the rest :)

Notes on the Synthesis of Form

Monday, 6 January 2025

Form is one of the hardest things to understand in software, mostly because it gets conflated with style and formatting. Style and formatting influence how people read and understand your code - you can bury good design in bad style, and you can make bad design look good with good style, but form is fundamentally an expression of design.

We struggle with this because there are many decent working programmers that have never actively engaged with the design of their software, so mistake cosmetic decisions for design decisions. The reason that form is so important in software is that it’s the thing that directly influences how people understand your code, and the tool you have to articulate your design goals, and the trade-offs you’ve made to achieve them.

From is the way you name your variables, the interaction style you outline with your APIs, the kinds of data structures you pass around your software. The intent of your software design is encoded in its form.

One of the more interesting aspects of form is the tension between regular and irregular form. Regular form is the default form of a language - it’s collection of idioms, defaults, “standard ways of doing things”, compared to an irregular form, where you diverge from form to achieve some other goal.

While people might not realise they’re responding to the form of a piece of software sometimes I’ll see people talk about things being idiomatic / non-idiomatic, or “feeling right” / “feeling wrong” - this is a response to the form of the software, and how it’s articulated. You see this everywhere in different programming language - python developers often talk about “pythonic” code, golang has a philosophy of emphasising “simple and minimal” code that trends towards repetitive, and multi-paradigm languages like C# will often face backlash when language features are introduced that bring in a new form of doing something that exists elsewhere in the ecosystem. All of these behaviours are people internalising regular form and responding to it.

Regular form, and regular style are a good way of helping people feel comfortable and oriented in software, but some of the most interesting software design comes from subverting regular form to achieve some other goal.

For example, ASP.NET middleware looks like this - taken from the official documentation:

public class RequestSetOptionsMiddleware
{
    private readonly RequestDelegate _next;

    public RequestSetOptionsMiddleware(RequestDelegate next)
    {
        _next = next;
    }

    public async Task Invoke(HttpContext httpContext)
    {
        var option = httpContext.Request.Query["option"];

        if (!string.IsNullOrWhiteSpace(option))
        {
            httpContext.Items["option"] = WebUtility.HtmlEncode(option);
        }

        await _next(httpContext);
    }
}

This class can be discovered or passed to the framework at startup, and implements the “middleware” pattern common in web frameworks. If you’ve used ASP.NET Core, you’ll be familiar with these classes, but as a class, it’s actually pretty weird. The default, conventional way of thinking about classes in C# is that they should be stateful, and have methods that operate on that state. The middleware construct in ASP.NET Core neither follows that pattern nor is used in that manner - really just acting as a container for a function that gets invoked as part of the request pipeline. This is a subversion of regular form in C# to achieve a different goal - to make it easier to compose and reason about the request pipeline in a web application. ASP.NET Core and MVC in general are full of these subversions of regular form - so much so that they’ve become expected, regular forms of their own (Controllers are another example of a class behaving in a way you would not traditionally expect a class to behave, with lifecycles managed invisibly by the framework).

Framework authors often subvert regular form to express the design and intent of the frameworks they’re building and their features - despite using the same syntactic constructs as the code that lives in the “application space”.

There’s a wrinkle in this though - subverting regular form is a high-risk, high-reward strategy. It’s easy to subvert regular form and make your code harder to understand, or to subvert regular form and make your code harder to maintain. It’s a tool that should be used sparingly, and with intent. It also, by definition, requires a mastery of regular form to correctly subvert.

I went to see a great exhibition of famous film director and animator Tim Burton’s work recently, and one of the more interesting things was looking at the sketches that he did while training as an animator. Not because they exemplified his signature style (his own subversion of form) but because they were so textbook and quality. He had to learn what correct form was in traditional illustration, and why it existed, to be able to develop his own style that explicitly had something to say.

Tim Burton Training Sketches

Burton is famous for esoteric stylised characters and settings, but he had to learn the rules before he could break them. All of the best people at a discipline occasionally subvert conventional form in design because regular form is the “best worst” and least contentious path. But to subvert form, you must understand what the most common form communicates, and find a way to do it better.

Tim Burton Style

This approach to learning mastery before reform reminds me of the thought experiment of “Chestertons’ fence”:


    "In the matter of reforming things, as distinct from deforming them,
    there is one plain and simple principle;a principle which will probably
    be called a paradox. 

    There exists in such a case a certain institution or law; 
    let us say, for the sake of simplicity, a fence or gate erected
    across a road. The more modern type of reformer goes gaily up to it and
    says, "I don't see the use of this; let us clear it away."

    To which the more intelligent type of reformer will do well to answer:
    "If you don't see the use of it, I certainly won't let you clear it away.
    Go away and think. Then, when you can come back and tell me that you 
    do see the use of it, I may allow you to destroy it."

You shouldn’t change or subvert something until you know why it is.

Regular form is what drives conversations (that are mostly boring) around design patterns. Design patterns, popularised by the “Gang of Four” book of the same name, were intended as model answers to well known questions in software design of the time. They gave language and shapes to regular problem spaces. Unfortunately, as a side effect, people ended up fixated on their regular form, and “best practice” to their detriment. Many of the lessons dissolved into cross-generational knowledge held by people that never read the original works, who didn’t realise that they were presented as “things that we have observed working several times in specific contexts” and instead ended up as accidentally canonised designs.

It’s easy to think that I am advocating for “do whatever” design, but I’m not. Regular form - patterns and defaults, are valuable because they’re explicable to all. You are absolutely allowed, and should, with maturity as a designer, subvert regular form, but only when in your context you have something that achieves the same goals as the regular form, with better outcomes that says something about the design of your software, APIs, or modules. Making random non-default decisions is probably not smart, unless you’re an expert doing it knowingly and making an explicit trade-off that elevates or better solves for part of your design.

Footnote

Notes on the Synthesis of Form is a book by Christopher Alexander, a famous architect and urban planner. It’s a book about design, and how to think about design in a way that’s not just about making things look good, but about making things that work well. It was the original inspiration for the term “design patterns” in software. I invoke its name half in homage, and half as a knowing nod.

Coverage is not correctness - but it helps!

Tuesday, 12 November 2024

“Test coverage is not correctness” and “test coverage is not a quality metric” are two of the lesser understood common phrases you’ll here people parrot. I agree with the first, but strongly disagree with the second.

I love test coverage because it’s a strong, negative, leading indicator towards quality.

The presence of test coverage “we ran this code”, is not the same as the test being correct “I verified the correct behaviours”, but a lack of it tells me a lot.

It tells me to expect low quality and poor automatic verification. It tells me where the worst parts of your codebase probably are.

This is because coverage comes for free in a well-tested system - it’s a measure, not an attribute, of a codebase. It’s a torch in the dark of working effectively with legacy code.

What else does a lack of coverage say?

It tells me that a particular codepath is either so well trusted its owners perceive it cannot fail (or will fail another signal immediately), or it’s so complicated to exercise that it’s not executed often at all. All code infrequently executed is bound to eventual failure as the world changes around you.

Coverage conveys a lot of information, don’t ignore it. It might not be a proxy for correctness but that doesn’t make it useless.

Footnote: Can we fix the “coverage is not correctness” problem? Actually, yes. Mutation testing is a technique where you “automatically test your tests” - notably implemented across languages by the tool Stryker, mutation testing creates subtly modified versions of your codebase tens to hundreds of times, and executes your tests over these “mutants”.

If your tests don’t fail after they’ve been run against mutants? That proves you’re not correctly verifying some behaviour. Mutation testing is awesome and infrequently seen in the wild.

Lo-Fi Service Discovery in .NET8

Tuesday, 21 November 2023

The vast majority of systems that you build will inevitably call a HTTP API at some point. Whether it’s a microservice, a third party API, or a legacy system. Because of this, it’s not uncommon to see applications with reams of configuration variables defining where their downstream dependencies live.

This configuration is frequently a source of pain and duplication, especially in larger systems where tens or hundreds of components need to keep track of location of downstream dependencies, many of which are shared, and almost all of which change depending on deployment environment.

These configuration values get everywhere in your codebases, and often are very difficult to coordinate changes to when something changes in your deployed infrastructure.

Service discovery to the rescue

Service Discovery is a pattern that aims to solve this problem by providing a centralised location for services to register themselves, and for clients to query to find out where they are. This is a common pattern in distributed systems, and is used by many large scale systems, including Netflix, Google, and Amazon.

Service registries are often implemented as a HTTP API, or via DNS records on platforms like Kubernetes.

Service Discovery Diagram

Service discovery is a very simple pattern consisting of:

A service registry, which is a database of services and their locations
A client, which queries the registry to find out where a service is
Optionally, a push mechanism, which allows services to notify clients of changes

In most distributed systems, teams tend to use infrastructure as code to manage their deployments. This gives us a useful hook, because we can use the same infrastructure as code to register services with the registry as we deploy the infrastructure to run them.

Service discovery in .NET8 and .NET Aspire

.NET 8 introduces a new extensions package - Microsoft.Extensions.ServiceDiscovery - which is designed to interoperate with .NET Aspire, Kubernetes DNS, and App Config driven service discovery.

This package provider a hook to load service URIs from App Configuration json files, and subsequently to auto-configure HttpClient instances to use these service URIs. This allows you to use service names in the HTTP calls in your code, and have them automatically resolved to the correct URI at runtime.

This means that if you’re trying to call your foo API, that instead of calling

var response = await client.GetAsync("http://192.168.0.45/some-api");

You can call

var response = await client.GetAsync("http://foo/some-api");

And the runtime will automatically resolve the service name foo to the correct IP address and port.

This runtime resolution is designed to work with the new Aspire stack, which manages references between different running applications to make them easier to debug, but because it has fallback hooks to App Configuration which means it can be used with anything that can load configuration settings.

Here’s an example of a console application in C# 8 that uses these new service discovery features:

using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

// Register your appsettings.json config file
var configuration = new ConfigurationBuilder()
    .AddJsonFile("appsettings.json", optional: true, reloadOnChange: true)
    .Build();

// Create a service provider registering the service discovery and HttpClient extensions
var provider = new ServiceCollection()
    .AddServiceDiscovery()
    .AddHttpClient()
    .AddSingleton<IConfiguration>(configuration)
    .ConfigureHttpClientDefaults(static http =>
    {
        // Configure the HttpClient to use service discovery
        http.UseServiceDiscovery();
    })
    .BuildServiceProvider();

// Grab a new client from the service provider
var client = provider.GetService<HttpClient>()!;

// Call an API called `foo` using service discovery
var response = await client.GetAsync("http://foo/some-api");
var body = await response.Content.ReadAsStringAsync();

Console.WriteLine(body);

If we pair this with a configuration file that looks like this:

{
  "Services": {
    "foo": [
      "127.0.0.1:8080"
    ]
  }
}

At runtime, when we make our API call to http://foo/some-api, the HttpClient will automatically resolve the service name foo to 127.0.0.1:8080. For the sake of this example, we’ve stood up a Node/Express API on port 8080. It’s code looks like this:

const express = require('express');
const app = express();
const port = 8080;

app.get('/some-api', (req, res) => res.send('Hello API World!'));
app.listen(port, () => console.log(`Example app listening on port ${port}!`));

So now, when we run our application, we get the following output:

$ dotnet run
Hello API World!

That alone is pretty neat - it gives us a single well known location to keep track of our services, and allows us to use service names in our code, rather than having to hard code IP addresses and ports. But this gets even more powerful when we combine it with a mechanism to update the configuration settings the application reads from at runtime.

Using Azure App Configuration Services as a service registry

Azure App Configuration Services provides a centralised location for configuration data. It’s a fully managed service, and consists of Containers - a key/value stores that can be used to store configuration data.

App Configuration provides a REST API that can be used to read and write configuration data, along with SDKs and command line tools to update values in the store.

When you’re using .NET to build services, you can use the Microsoft.Extensions.Configuration.AzureAppConfiguration package to read configuration data from App Configuration. This package provides a way to read configuration data from App Configuration Services, integrating neatly with the IConfiguration API and ConfigurationManager class.

If you’re following the thread, this means that if we enable service discovery using the new Microsoft.Extensions.ServiceDiscovery package, we can use our app config files as a service registry. If we combine this extension with Azure App Configuration Services and it’s SDK, we can change one centralised configuration store and push updates to all of our services whenever changes are made.

This is really awesome, because it means if you’re running large distributed teams, so long as all the applications have access to the configuration container, they can address each other by service name, and the service discovery will automatically resolve the correct IP address and port, regardless of environment.

Setting up Azure App Configuration Services

You’ll need to create an App Configuration Service. You can do this by going to the Azure Portal, and clicking the “Create a resource” button. Search for “App Configuration” and click “Create”.

Create App Configuration Service

For the sake of this example, we’re going to grab a connection string from the portal, and use it to connect to the service. You can do this by clicking on the “Access Keys” button in the left hand menu, and copying the “Primary Connection String”. You’d want to use RBAC in a real system.

We’re going to add an override by clicking “Configuration Explorer” in the left hand menu, and adding a new key called Services:foo with a value of:

[
	"value-from-app-config:8080"
]

and a content type of application/json.

Setting up the Azure App Configuration SDK

We need to add a reference to the Microsoft.Extensions.Configuration.AzureAppConfiguration package to access this new override. You can do this by running the following command in your project directory:

dotnet add package Microsoft.Extensions.Configuration.AzureAppConfiguration

Next, we modify the configuration bootstrapping code in our command line app.

using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

var appConfigConnectionString = "YOUR APP CONFIG CONNECTION STRING HERE";

var configuration = new ConfigurationBuilder()
    .AddJsonFile("appsettings.json", optional: true, reloadOnChange: true)
    .AddAzureAppConfiguration(appConfigConnectionString, false) // THIS LINE HAS BEEN ADDED
    .Build();

This adds our Azure App Configuration as a configuration provider.

Nothing else in our calling code needs to change - so when we execute our application, you’ll notice that the call now fails:

$ dotnet run
Unhandled exception. System.Net.Http.HttpRequestException: No such host is known. (value-from-app-config:8080)

The Four Patterns

AI in Development Workflows

AI Introspection

Agent Assisted Synchronous Change

Observing Super-Agent

Making bets in this space

AI in Development Workflows

AI Introspection

Agent Assisted Synchronous Change

Observing Super-Agent

The Cost Apocalypse

What we need to change to enhance this adoption

Permissive Contribution Models

Increase In Defined Architectural Styles

The Role of Teams in the Future

Think about where you’re going

Never forget the people

The idea

What it looks like

How it works

What’s in the box

Is this a good idea?

A Practical Tool

Context Matters with Regards to Quality

Quality Begets Reliability

Greenfield vs Brownfield

Co-Design workflows

Specify Ahead

Review While Iterating

Human Directed Refactoring

Agent Directed Refactoring

Scaffold, Tweak, Iterate

Hand Scaffold, AI Expand

Surgical Preparation

AI Safety Checks

Cross-System Change

Building Maps

Can these workflows be automated as agents and skills?

Pair and Mob Co-Design

The through line

If you could see what I’ve seen with your eyes

This is a heartbreaking thought for me

An existential threat to all business

Where does that leave enterprise

The human cost

Is This The End of Programming Languages?

How to stay ahead

What are the bets

AI people, we really need to talk

7 Predictions - What’s next?

Here’s how it’ll happen

Footnote

The State of “AI”

Ethics and the Adoption of AI

Thinking Machines

Footnote

Service discovery to the rescue

Service discovery in .NET8 and .NET Aspire

Using Azure App Configuration Services as a service registry

Setting up Azure App Configuration Services

Setting up the Azure App Configuration SDK

History