There’s no scarcity of reviews on how AI coding assistants, brokers, and fleets of brokers have written huge quantities of code in a short while, code that reportedly implements the options desired. It’s uncommon that individuals discuss non-functional necessities like efficiency or safety in that context, perhaps as a result of that’s not a priority in most of the use instances the authors have. And it’s even rarer that individuals assess the standard of the code generated by the agent. I’d argue, although, that inside high quality is essential for improvement to proceed at a sustainable tempo over years, quite than collapse underneath its personal weight.
So, let’s take a better take a look at how the AI tooling performs relating to inside code high quality. We’ll add a function to an current utility with the assistance of an agent and take a look at what’s taking place alongside the way in which. After all, this makes it “simply” an anecdote. This memo is not at all a examine. On the similar time, a lot of what we’ll see falls into patterns and may be extrapolated, a minimum of in my expertise.
The function we’re implementing
We’ll be working with the codebase for CCMenu, a Mac utility that reveals the standing of CI/CD builds within the Mac menu bar. This provides a level of problem to the duty as a result of Mac purposes are written in Swift, which is a typical language, however not fairly as widespread as JavaScript or Python. It’s additionally a contemporary programming language with a posh syntax and sort system that requires extra precision than, once more, JavaScript or Python.

CCMenu periodically retrieves the standing from the construct servers with calls to their APIs. It presently helps servers utilizing a legacy protocol carried out by the likes of Jenkins, and it helps GitHub Actions workflows. Probably the most requested server that’s not presently supported is GitLab. So, that’s our function: we’ll implement help for GitLab in CCMenu.
The API wrapper
GitHub offers the GitHub Actions API, which is steady and properly documented. GitLab has the GitLab API, which can be properly documented. Given the character of the issue house, they’re semantically fairly comparable. They’re not the identical, although, and we’ll see how that impacts the duty later.
Internally, CCMenu has three GitHub-specific information to retrieve the construct standing from the API: a feed reader, a response parser, and a file that comprises Swift capabilities that wrap the GitHub API, together with capabilities like the next:
func requestForAllPublicRepositories(consumer: String, token: String?) -> URLRequest
func requestForAllPrivateRepositories(token: String) -> URLRequest
func requestForWorkflows(proprietor: String, repository: String, token: String?) -> URLRequest
The capabilities return URLRequest objects, that are a part of the Swift SDK and are used to make the precise community request. As a result of these capabilities are structurally fairly comparable they delegate the development of the URLRequest object to 1 shared, inside operate:
func makeRequest(methodology: String = "GET", baseUrl: URL, path: String,
params: Dictionary<String, String> = [:], token: String? = nil) -> URLRequest
Don’t fear if you happen to’re not acquainted with Swift, so long as you recognise the arguments and their varieties you’re positive.
Elective tokens
Subsequent, we should always take a look at the token argument in a bit of extra element. Requests to the API’s may be authenticated. They don’t should be authenticated however they are often authenticated. This permits purposes like CCMenu to entry info that’s restricted to sure customers. For many API’s, GitHub and GitLab included, the token is just an extended string that must be handed in an HTTP header.
In its implementation CCMenu makes use of an non-obligatory string for the token, which in Swift is denoted by a query mark following the kind, String? on this case. That is idiomatic use, and Swift forces recipients of such non-obligatory values to take care of the optionality in a secure manner, avoiding the traditional null pointer issues. There are additionally particular language options to make this simpler.
Some capabilities are nonsensical in an unauthenticated context, like requestForAllPrivateRepositories. These declare the token as non-optional, signalling to the caller {that a} token should be offered.
Let’s go
I’ve tried this experiment a few occasions, throughout the summer time utilizing Windsurf and Sonnet 3.5, and now, lately, with Claude Code and Sonnet 4.5. The method remained comparable: break down the duty into smaller chunks. For every of the chunks I requested Windsurf to provide you with a plan first earlier than asking for an implementation. With Claude Code I went straight for the implementation, counting on its inside planning; and on Git when one thing ended up going within the unsuitable course.
As a primary step I requested the agent, kind of verbatim: “Primarily based on the GitHub information for API, feed reader, and response parser, implement the identical performance for GitLab. Solely write the equal for these three information. Don’t make modifications to the UI.”
This seemed like an affordable request, and by and enormous it was. Even Windsurf, with the much less succesful mannequin, picked up on key variations and dealt with them, e.g. it recognised that what GitHub calls a repository is a challenge in GitLab; it noticed the distinction within the JSON response, the place GitLab returns the array of runs on the prime stage whereas GitHub has this array as a property in a top-level object.
I hadn’t appeared on the GitLab API docs myself at this stage and simply from a cursory scan of the generated code all the things appeared fairly okay, the code compiled and even the complicated operate varieties have been generated appropriately, or have been they?
First shock
Within the subsequent step, I requested the agent to implement the UI so as to add new pipelines/workflows. I intentionally requested it to not fear about authentication but, to simply implement the circulation for publicly accessible info. The dialogue of that step is perhaps for an additional memo, however the brand new code by some means must acknowledge {that a} token is likely to be current sooner or later
var apiToken: String? = nil
after which it could use the variable within the name the wrapper operate
let req = GitLabAPI.requestForGroupProjects(group: title, token: apiToken)
var tasks = await fetchProjects(request: req)
The apiToken variable is appropriately declared as an non-obligatory String, initialised to nil for now. Later, some code may retrieve the token from one other place relying on whether or not the consumer has determined to register. This code led to the primary compiler error:
![]()
What’s happening right here? Properly, it seems that the code for the API wrapper in step one had a little bit of a delicate downside: it declared the tokens as non-optional in all the wrapper capabilities, e.g.
func requestForGroupProjects(group: String, token: String) -> URLRequest
The underlying makeRequest operate, for one cause or one other, was created appropriately, with the token declared as non-obligatory.
The code compiled as a result of in the way in which the capabilities have been written, the wrapper capabilities undoubtedly have a string and that may in fact be handed to a operate that takes an non-obligatory string, an argument that could be a string or nothing (nil). However now, within the code above, we have now an non-obligatory string and that may’t be handed to a operate that wants a (particular) string.
The vibe repair
Being lazy I merely copy-pasted the error message again to Windsurf. (Constructing a Swift app in something however Xcode is a complete totally different story, and I bear in mind an experiment with Cline the place it alternated between including and eradicating express imports, at about 20¢ per iteration.) The repair proposed by the AI for this downside labored: it modified the call-site and inserted an empty string as a default worth for when no token was current, utilizing Swift’s ?? operator.
let req = GitLabAPI.requestForGroupProjects(group: title, token: apiToken ?? "")
var tasks = await fetchProjects(request: req)
This compiles, and it kinda works: if there’s no token an empty string is substituted, which implies that the argument handed to the operate is both the token or the empty string, it’s at all times a string and by no means nil.
So, what’s unsuitable? The entire level of declaring the token as non-obligatory was to sign that the token is non-obligatory. The AI ignored this and launched new semantics: an empty string now indicators that no token is on the market. That is
- not idiomatic,
- not self-documenting,
- unsupported by Swift’s sort system.
It additionally required modifications in each place the place this operate is known as.
The true repair
After all, what the agent ought to’ve executed is to easily change the operate declaration of the wrapper operate to make the token non-obligatory. With that change all the things works as anticipated, the semantics stay intact, and the change is proscribed to including a single ? to the operate argument’s sort, quite than spraying ?? "" everywhere in the code.
Does it actually matter?
You would possibly ask whether or not I’m splitting hair right here. I don’t assume I’m. I feel this can be a clear instance the place an AI agent left to their very own would have modified the codebase for the more serious, and it took a developer with expertise to note the problem and to direct the agent to the right implementation.
Additionally, this is only one of many examples I encountered. In some unspecified time in the future the agent needed to introduce a very pointless cache, and, in fact, couldn’t clarify why it had even prompt the cache.
It additionally failed to understand that the consumer/org overlap in GitHub doesn’t exist within the GitLab, and went to implement some difficult logic to deal with a non-existing downside. It took greater than nudging the agent in the direction of the right locations within the documentation to speak it down from insisting that the logic was wanted.
It additionally “forgot” to make use of current capabilities to assemble URLs, replicating such logic in a number of locations, usually with out implementing all performance, e.g. the choice to overwrite the bottom URL for testing functions utilizing the defaults system on macOS.
So, in these instances, and there have been extra, the generated code labored. It carried out the performance required. However the brand new code additionally would’ve added fully pointless complexity and it missed non-obvious performance, reducing the standard of the codebase and introducing delicate points.
If engaged on giant software program programs has taught me one factor it’s that investing within the inside high quality of the software program, the standard of the codebase, is a worthwhile funding. Don’t get overwhelmed by technical debt. People and brokers discover it tougher to work with an advanced codebase. With out cautious oversight, although, the AI brokers appear to have a robust tendency to introduce technical debt, making future improvement more durable, for people and brokers.
Yet another factor
If potential, CCMenu reveals the avatar of the particular person/actor that triggered the construct. In GitHub the avatar URL is a part of the response to the construct standing API name. GitLab has a “cleaner”, extra RESTful design and retains further consumer info out of the construct response. The avatar URL should be retrieved with a separate API name to a /consumer endpoint.
Each Windsurf and Claude Code stumbled over this in a serious manner. I bear in mind a longish dialog the place Claude Code needed to persuade me that the URL was within the response. (It in all probability bought combined up as a result of a number of endpoints have been described on the identical web page of the documentation.) In the long run I discovered it simpler to implement that performance with out agent help.
My conclusions
Through the experiments in the summertime I used to be on the fence. The Windsurf / Sonnet 3.5 combo did velocity up writing code, but it surely required cautious planning with prompts, and I needed to swap backwards and forwards between Windsurf and Xcode (for constructing, operating assessments, and debugging), which at all times felt considerably disorientating and bought tiring rapidly. The standard of the generated code had important points, and the agent had an inclination to get caught attempting to repair an issue. So, on the entire it felt like I wasn’t getting a lot out of utilizing the agent. And I traded doing what I like, writing code, for overseeing an AI with an inclination to write down sloppy code.
With Claude Code and Sonnet 4.5 the story is considerably totally different. It wants much less prompting, and the code has higher high quality. It’s not at all prime quality code, but it surely’s higher, requiring much less rework and fewer prompting to enhance high quality. Additionally, operating a dialog with Claude Code in a terminal window alongside Xcode felt extra pure than switching between two IDEs. For me this has tilted the scales sufficient to make use of Claude Code repeatedly.
