16.7 C
New York
Monday, May 11, 2026

Understanding Spec-Pushed-Growth: Kiro, spec-kit, and Tessl


I’ve been making an attempt to grasp one of many newest AI coding buzzword: Spec-driven growth (SDD). I checked out three of the instruments that label themselves as SDD instruments and tried to untangle what it means, as of now.

Definition

Like with many rising phrases on this fast-paced house, the definition of “spec-driven growth” (SDD) continues to be in flux. Right here’s what I can collect from how I’ve seen it used up to now: Spec-driven growth means writing a “spec” earlier than writing code with AI (“documentation first”). The spec turns into the supply of fact for the human and the AI.

GitHub: “On this new world, sustaining software program means evolving specs. […] The lingua franca of growth strikes to the next degree, and code is the last-mile strategy.”

Tessl: “A growth strategy the place specs — not code — are the first artifact. Specs describe intent in structured, testable language, and brokers generate code to match them.”

After wanting over the usages of the time period, and a few of the instruments that declare to be implementing SDD, it appears to me that in actuality, there are a number of implementation ranges to it:

  1. Spec-first: A effectively thought-out spec is written first, after which used within the AI-assisted growth workflow for the duty at hand.
  2. Spec-anchored: The spec is saved even after the duty is full, to proceed utilizing it for evolution and upkeep of the respective function.
  3. Spec-as-source: The spec is the principle supply file over time, and solely the spec is edited by the human, the human by no means touches the code.

All SDD approaches and definitions I’ve discovered are spec-first, however not all try to be spec-anchored or spec-as-source. And infrequently it’s left imprecise or completely open what the spec upkeep technique over time is supposed to be.

Understanding Spec-Pushed-Growth: Kiro, spec-kit, and Tessl

What’s a spec?

The important thing query by way of definitions in fact is: What’s a spec? There doesn’t appear to be a basic definition, the closest I’ve seen to a constant definition is the comparability of a spec to a “Product Necessities Doc”.

The time period is kind of overloaded in the meanwhile, right here is my try at defining what a spec is:

A spec is a structured, behavior-oriented artifact – or a set of associated artifacts – written in pure language that expresses software program performance and serves as steerage to AI coding brokers. Every variant of spec-driven growth defines their strategy to a spec’s construction, degree of element, and the way these artifacts are organized inside a venture.

There’s a helpful distinction to be made I feel between specs and the extra basic context paperwork for a codebase. That basic context are issues like guidelines recordsdata, or excessive degree descriptions of the product and the codebase. Some instruments name this context a reminiscence financial institution, in order that’s what I’ll use right here. These recordsdata are related throughout all AI coding classes within the codebase, whereas specs solely related to the duties that really create or change that individual performance.

An overview diagram showing agent context files in two categories: Memory Bank (AGENTS.md, project.md, architecture.md as examples), and Specs (Story-324.md, product-search.md, a folder feature-x with files like data-model.md, plan.md as example files).

It seems to be fairly time-consuming to guage SDD instruments and approaches in a method that will get near actual utilization. You would need to strive them out with completely different sizes of issues, greenfield, brownfield, and actually take the time to overview and revise the intermediate artifacts with greater than only a cursory look. As a result of as GitHub’s weblog put up about spec-kit says: “Crucially, your position isn’t simply to steer. It’s to confirm. At every part, you replicate and refine.”

For 2 of the three instruments I attempted it additionally appears to be much more work to introduce them into an current codebase, due to this fact making it even tougher to guage their usefulness for brownfield codebases. Till I hear utilization stories from individuals utilizing them for a time frame on a “actual” codebase, I nonetheless have numerous open questions on how this works in actual life.

That being stated – let’s get into three of those instruments. I’ll share an outline of how they work first (or quite how I feel they work), and can preserve my observations and questions for the top. Be aware that these instruments are very quick evolving, so they could have already modified since I used them in September.

Kiro

Kiro is the only (or most light-weight) one of many three I attempted. It appears to be principally spec-first, all of the examples I’ve discovered use it for a job, or a person story, with no point out of easy methods to use the necessities doc in a spec-anchored method over time, throughout a number of duties.

Workflow: Necessities → Design → Duties

Every workflow step is represented by one markdown doc, and Kiro guides you thru these 3 workflow steps within its VS Code based mostly distribution.

Necessities: Structured as an inventory of necessities, the place every requirement represents a “Person Story” (in “As a…” format) with acceptance standards (in “GIVEN… WHEN… THEN…” format)

A screenshot of a Kiro requirements document

Design: In my try, the design doc consisted of the sections seen within the screenshot beneath. I solely have the outcomes of one in all my makes an attempt nonetheless, so I’m unsure if it is a constant construction, or if it adjustments relying on the duty.

A screenshot of a Kiro design document, showing a component architecture diagram, and then collapsed sections titled Data Flow, Data Models, Error Handling, Testing Strategy, Implementation Approach, Migration Strategy

Duties: An inventory of duties that hint again to the requirement numbers, and that get some further UI parts to run duties one after the other, and overview adjustments per job.

A screenshot of a Kiro tasks document, showing a task with UI elements “Task in progress”, “View changes” next to them. Each task is a bullet list of TODOs, and ends with a list of requirement numbers (1.1, 1.2, 1.3)

Kiro additionally has the idea of a reminiscence financial institution, they name it “steering”. Its contents are versatile, and their workflow doesn’t appear to depend on any particular recordsdata being there (I made my utilization makes an attempt earlier than I even found the steering part). The default topology created by Kiro while you ask it to generate steering paperwork is product.md, construction.md, tech.md.

A version of the earlier overview diagram, this time specific to Kiro: The memory bank has 3 files in a steering folder called product.md, tech.md, structure.md, and the specs box shows a folder called category-label-enhancement (the name of my test feature) that contains requirements.md, design.md, tasks.md

Spec-kit

Spec-kit is GitHub’s model of SDD. It’s distributed as a CLI that may create workspace setups for a variety of frequent coding assistants. As soon as that construction is ready up, you work together with spec-kit by way of slash instructions in your coding assistant. As a result of all of its artifacts are put proper into your workspace, that is essentially the most customizable one of many three instruments mentioned right here.

Screenshot of VS Code showing the folder structure that spec-kit set up on the left (command files in .github/prompts, a .specify folder with subfolders memory, scripts, templates); and GitHub Copilot open on the right, where the user is in the process of typing /specify as a command

Workflow: Structure → 𝄆 Specify → Plan → Duties 𝄇

Spec-kit’s reminiscence financial institution idea is a prerequisite for the spec-driven strategy. They name it a structure. The structure is meant to include the excessive degree ideas which might be “immutable” and may at all times be utilized, to each change. It’s principally a really highly effective guidelines file that’s closely utilized by the workflow.

In every of the workflow steps (specify, plan, duties), spec-kit instantiates a set of recordsdata and prompts with the assistance of a bash script and a few templates. The workflow then makes heavy use of checklists within the recordsdata, to trace vital person clarifications, structure violations, analysis duties, and so forth. They’re like a “definition of accomplished” for every workflow step (although interpreted by AI, so there is no such thing as a 100% assure that they are going to be revered).

A partial screenshot of the very end of the spec.md file, showing a bunch of checklists for content quality, requirement completeness, execution status.

Beneath is an outline for example the file topology I noticed in spec-kit. Be aware how one spec is made up of many recordsdata.

A version of the earlier overview diagram, this time specific to spec-kit: The memory bank has a constitution.md file. There is an extra box labelled “templates” which is an additional concept in spec-kit, with template files for plan, spec, and tasks. The specs box shows a folder called “specs/001-when-a-user” (yes, that’s what spec-kit called it in my test) that contains 8 files, data-model, plan, tasks, spec, research, api, component.

At first look, GitHub appears to be aspiring to a spec-anchored strategy (“That’s why we’re rethinking specs — not as static paperwork, however as dwelling, executable artifacts that evolve with the venture. Specs turn out to be the shared supply of fact. When one thing doesn’t make sense, you return to the spec; when a venture grows complicated, you refine it; when duties really feel too massive, you break them down.”) Nevertheless, spec-kit creates a department for each spec that will get created, which appears to point that they see a spec as a dwelling artifact for the lifetime of a change request, not the lifetime of a function. This group dialogue is speaking about this confusion. It makes me suppose that spec-kit continues to be what I’d name spec-first solely, not spec-anchored over time.

Tessl Framework

(Nonetheless in personal beta)

Like spec-kit, the Tessl Framework is distributed as a CLI that may create all of the workspace and config construction for a wide range of coding assistants. The CLI command additionally doubles as an MCP server.

Screenshot of Cursor, showing the files Tessl created in the file tree (.tessl/framework folder), and the open MCP configuration on the right, which starts the tessl command in MCP mode

Tessl is the one one in all these three instruments that explicitly aspires to a spec-anchored strategy, and is even exploring the spec-as-source degree of SDD. A Tessl spec can function the principle artifact that’s being maintained and edited, with the code even marked with a remark on the prime saying // GENERATED FROM SPEC - DO NOT EDIT. That is presently a 1:1 mapping between spec and code recordsdata, i.e. one spec interprets into one file within the codebase. However Tessl continues to be in beta and they’re experimenting with completely different variations of this, so I can think about that this strategy is also taken on a degree the place one spec maps to a code element with a number of recordsdata. It stays to be seen what the alpha product will assist. (The Tessl staff themselves see their framework as one thing that’s extra sooner or later than their present public product, the Tessl Registry.)

Right here is an instance of a spec that I had the Tessl CLI reverse engineer (tessl doc --code ...js) from a JavaScript file in an current codebase:

A screenshot of a Tessl spec file

Tags like @generate or @take a look at appear to inform Tessl what to generate. The API part exhibits the concept of defining at the very least the interfaces that get uncovered to different elements of the codebase within the spec, presumably to make it possible for these extra essential elements of the generated element are absolutely underneath the management of the maintainer. Working tessl construct for this spec generates the corresponding JavaScript code file.

Placing the specs for spec-as-source at a fairly low abstraction degree, per code file, in all probability reduces quantity of steps and interpretations the LLM has to do, and due to this fact the prospect of errors. Even at this low abstraction degree I’ve seen the non-determinism in motion although, once I generated code a number of instances from the identical spec. It was an fascinating train to iterate on the spec and make it increasingly particular to extend the repeatability of the code era. That course of jogged my memory of a few of the pitfalls and challenges of writing an unambiguous and full specification.

A version of our earlier overview diagram, this time specific to Tessl: The memory bank box has a folder .tessl/framework with 4 files, plus KNOWLEDGE.md and AGENTS.md. The specs box shows a file dynamic-data-renderer.spec.md, a spec file. This diagram also has a box for Code, including a file dynamic-data-renderer.js. There is a bidirectional arrow between the Specs and the Code box, as in the Tessl case, those two are synced with each other.

Observations and questions

These three instruments are all labelling themselves as implementations of spec-driven growth, however they’re fairly completely different from one another. In order that’s the very first thing to bear in mind when speaking about SDD, it’s not only one factor.

One workflow to suit all sizes?

Kiro and spec-kit present one opinionated workflow every, however I’m fairly certain that neither of them is appropriate for almost all of actual life coding issues. Specifically, it’s not fairly clear to me how they’d cater to sufficient completely different drawback sizes to be typically relevant.

After I requested Kiro to repair a small bug (it was the identical one I used prior to now to strive Codex), it shortly grew to become clear that the workflow was like utilizing a sledgehammer to crack a nut. The necessities doc turned this small bug into 4 “person tales” with a complete of 16 acceptance standards, together with gems like “Person story: As a developer, I need the transformation perform to deal with edge circumstances gracefully, in order that the system stays strong when new class codecs are launched.”

I had an identical problem once I used spec-kit, I wasn’t fairly certain what dimension of drawback to make use of it for. Out there tutorials are often based mostly on creating an software from scratch, as a result of that’s best for a tutorial. One of many use circumstances I ended up making an attempt was a function that may be a 3-5 level story on one in all my previous groups. The function trusted numerous code that was already there, it was supposed to construct an outline modal that summarised a bunch of knowledge from an current dashboard. With the quantity of steps spec-kit took, and the quantity of markdown recordsdata it created for me to overview, this once more felt like overkill for the scale of the issue. It was a much bigger drawback than the one I used with Kiro, but in addition a way more elaborate workflow. I by no means even completed the complete implementation, however I feel in the identical time it took me to run and overview the spec-kit outcomes I might have applied the function with “plain” AI-assisted coding, and I’d have felt rather more in management.

An efficient SDD device would on the very least have to offer flexibility for a number of completely different core workflows, for various sizes and kinds of adjustments.

Reviewing markdown over reviewing code?

As simply talked about, and as you may see within the description of the device above, spec-kit created a LOT of markdown recordsdata for me to overview. They had been repetitive, each with one another, and with the code that already existed. Some contained code already. Total they had been simply very verbose and tedious to overview. In Kiro it was a bit of simpler, as you solely get 3 recordsdata, and it’s extra intuitive to grasp the psychological mannequin of “necessities > design > duties”. Nevertheless, as talked about, Kiro additionally was method too verbose for the small bug I used to be asking it to repair.

To be trustworthy, I’d quite overview code than all these markdown recordsdata. An efficient SDD device must present an excellent spec overview expertise.

False sense of management?

Even with all of those recordsdata and templates and prompts and workflows and checklists, I often noticed the agent in the end not comply with all of the directions. Sure, the context home windows are actually bigger, which is commonly talked about as one of many enablers of spec-driven growth. However simply because the home windows are bigger, doesn’t imply that AI will correctly choose up on the whole lot that’s in there.

For instance: Spec-kit has a analysis step someplace throughout planning, and it did numerous analysis on the present code and what’s already there, which was nice as a result of I requested it so as to add a function that constructed on prime of current code. However in the end the agent ignored the notes that these had been descriptions of current courses, it simply took them as a brand new specification and generated them over again, creating duplicates. However I didn’t solely see examples of ignoring directions, I additionally noticed the agent go method overboard as a result of it was too eagerly following directions (e.g. one of many structure articles).

The previous has proven that one of the simplest ways for us to remain in command of what we’re constructing are small, iterative steps, so I’m very skeptical that a lot of up-front spec design is a good suggestion, particularly when it’s overly verbose. An efficient SDD device must cater to an iterative strategy, however small work packages virtually appear counter to the concept of SDD.

Learn how to successfully separate practical from technical spec?

It’s a frequent thought in SDD to be intentional concerning the separation between practical spec and technical implementation. The underlying aspiration I assume is that in the end, we might have AI fill in all of the solutioning and particulars, and change to completely different tech stacks with the identical spec.

In actuality, once I was making an attempt spec-kit, I often bought confused when to remain on the practical degree, and when it was time so as to add technical particulars. The tutorial and documentation additionally weren’t fairly per it, there appear to be completely different interpretations of what “purely practical” actually means. And once I suppose again on the various, many person tales I’ve learn in my profession that weren’t correctly separating necessities from implementation, I don’t suppose now we have a superb monitor file as a occupation to do that effectively.

Who’s the goal person?

Most of the demos and tutorials for spec-driven growth instruments embrace issues like defining product and have objectives, they even incorporate phrases like “person story”. The concept right here is likely to be to make use of AI as an enabler for cross-skilling, and have builders take part extra closely in necessities evaluation? Or have builders pair with product individuals once they work on this workflow? None of that is made specific although, it’s introduced as a given {that a} developer would do all this evaluation.

By which case I’d ask myself once more, what drawback dimension and kind is SDD meant for? In all probability not for giant options which might be nonetheless very unclear, as absolutely that may require extra specialist product and necessities abilities, and many different steps like analysis and stakeholder involvement?

A 2x2 matrix, x-axis “Clarity of problem”, y-axis “Size of problem”. Each quadrant has a box with a question mark, and there is a label in the middle that says “Where does SDD sit?”

Spec-anchored and spec-as-source: Are we studying from the previous?

Whereas many individuals draw analogies between SDD and TDD or BDD, I feel one other necessary parallel to take a look at for spec-as-source particularly is MDD (model-driven growth). I labored on a number of tasks at the start of my profession that closely used MDD, and I saved being reminded about that once I was making an attempt out the Tessl Framework. The fashions in MDD had been principally the specs, albeit not in pure language, however expressed in e.g. customized UML or a textual DSL. We constructed customized code turbines to show these specs into code.

Example of a structured, parseable specification DSL from my past experience, mostly recreated from memory. Screen “Write Message” instantiates InputScreen { … } Illustrates things like references to domain model fields, inheritance from other screens for reusability of patterns, navigation logic.

In the end, MDD by no means took off for enterprise functions, it sits at a clumsy abstraction degree and simply creates an excessive amount of overhead and constraints. However LLMs take a few of the overhead and constraints of MDD away, so there’s a new hope that we will now lastly give attention to writing specs and simply generate code from them. With LLMs, we aren’t constrained by a predefined and parseable spec language anymore, and we don’t must construct elaborate code turbines. The value for that’s LLMs’ non-determinism in fact. And the parseable construction additionally had upsides that we’re shedding now: We might present the spec writer with numerous device assist to write down legitimate, full and constant specs. I’m wondering if spec-as-source, and even spec-anchoring, would possibly find yourself with the downsides of each MDD and LLMs: Inflexibility and non-determinism.

To be clear, I’m not nostalgic about my MDD expertise prior to now and saying “we would as effectively convey that again”. However we must always look to code-from-spec makes an attempt prior to now to be taught from them once we discover spec-driven immediately.

Conclusions

In my private utilization of AI-assisted coding, I additionally typically spend time on rigorously crafting some type of spec first to provide to the coding agent. So the final precept of spec-first is certainly precious in lots of conditions, and the completely different approaches of easy methods to construction that spec are very wanted. They’re among the many prime most often requested questions I hear in the meanwhile from practitioners: “How do I construction my reminiscence financial institution?”, “How do I write a superb specification and design doc for AI?”.

However the time period “spec-driven growth” isn’t very effectively outlined but, and it’s already semantically subtle. I’ve even not too long ago heard individuals use “spec” principally as a synonym for “detailed immediate”.

Concerning the instruments I’ve tried, I’ve listed lots of my questions on their actual world usefulness right here. I’m wondering if a few of them are attempting to feed AI brokers with our current workflows too actually, in the end amplifying current challenges like overview overload and hallucinations. Particularly with the extra elaborate approaches that create a lot of recordsdata, I can’t assist however consider the German compound phrase “Verschlimmbesserung”: Are we making one thing worse within the try of creating it higher?

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles