

Flaky checks have lengthy been a supply of wasted engineering time for cellular improvement groups, however current knowledge reveals they’re changing into one thing extra critical: a rising drag on supply pace. As AI-driven code technology accelerates and pipelines take in far better volumes of output, check instability is not an occasional nuisance.
This fixed rise has been recorded by all method of builders, from small groups to Google and Microsoft. The not too long ago launched Bitrise Cellular Insights report backs up this shift with arduous numbers: the probability of encountering a flaky check rose from 10% in 2022 to 26% in 2025. Virtually, which means that the common cellular improvement group now encounters unreliable check outcomes throughout a typical workflow run. That degree of unpredictability has actual penalties for organizations that rely upon quick, assured launch cycles. Flaky checks undermine belief in CI/CD infrastructure, power builders to repeat work and introduce friction on the level the place stability issues most.
This rise in flakiness isn’t occurring in a vacuum. Cellular pipelines are increasing quickly. Over the previous three years, workflow complexity grew by greater than 20%, with cellular improvement groups working broader suites of unit checks, integration checks and end-to-end checks earlier and extra usually. In precept, this strengthens high quality. In observe, it additionally will increase publicity to non-deterministic behaviours: timing points, environmental drift, brittle mocks, concurrency issues and interactions with third-party dependencies. As check protection grows, so does the floor space for failure that has nothing to do with the code being examined.
On the similar time, organizations are underneath strain to maneuver sooner. The median cellular group is transport extra incessantly than ever, with essentially the most superior groups transport at twice the common pace of prime 100 apps. In opposition to this backdrop, any friction in CI turns into a fabric danger. Engineers compelled to rerun jobs or triage false failures lose hours that would have gone in the direction of work on new options. Construct prices rise as pipelines repeat the identical work merely to show a failure was not actual. Over the course of every week, just a few unstable checks can cascade into vital delays.
Monitoring Down the Flakiness
One of the vital persistent challenges is the dearth of visibility into the place flakiness originates. As construct complexity rises, false positives or flaky checks usually rise in tandem. In lots of organizations, CI stays a black field stitched collectively from a number of instruments as artifact measurement will increase. Failures could stem from unstable check code, misconfigured runners, dependency conflicts or useful resource competition, but groups usually lack the observability wanted to pinpoint causes with confidence. With out clear visibility, debugging turns into guesswork and recurring failures develop into accepted as a part of the method relatively than points to be resolved.
The encouraging information is that high-performing groups are addressing this sample instantly. They deal with CI high quality as a prime engineering precedence and put money into monitoring that reveals how checks behave over time. The Bitrise Cellular Insights report reveals a transparent correlation: groups utilizing observability instruments noticed measurable enhancements in reliability and skilled fewer wasted runs. Bettering visibility can have as a lot affect as bettering the checks themselves; when engineers can see which circumstances fail intermittently, how usually they fail and underneath what situations, they’ll goal fixes as an alternative of chasing signs.
Rising Observability Boosts Construct Success
Higher tooling alone won’t clear up the issue. organizations must undertake a mindset that treats CI like manufacturing infrastructure. Which means defining efficiency and reliability targets for check suites, setting alerts when flakiness rises above a threshold and reviewing pipeline well being alongside function metrics. It additionally means creating clear possession over CI configuration and check stability in order that flaky behaviour isn’t allowed to build up unchecked. Groups that succeed right here usually have light-weight processes for quarantining unstable checks, time boxing investigations and making certain that fixes are prioritised earlier than the subsequent launch cycle.
As automation continues to broaden throughout the software program improvement lifecycle, the price of poor check reliability will solely enhance. AI-assisted coding instruments and agent-driven workflows are producing extra code and extra iterations than ever earlier than. This will increase the load on CI and amplifies the results of instability. And not using a steady basis, the throughput positive factors promised by AI evaporate as pipelines decelerate and engineers drown in noise.
Flaky checks could really feel like a top quality concern, however they’re additionally a efficiency drawback and a cultural one. They form how builders understand the reliability of their instruments. They affect how shortly groups can ship. Most significantly, they decide whether or not CI/CD stays a supply of confidence or turns into a supply of drag.
Stability won’t enhance by itself. Engineering leaders who wish to defend launch velocity and preserve confidence of their pipelines want clear methods to diagnose and scale back flaky behaviour. Begin with visibility, understanding when and the place instability emerges. Deal with your CI/CD infrastructure with the identical self-discipline as manufacturing techniques, and tackle small failures earlier than they develop into systemic ones. As soon as improvement groups are on prime of flaky testing, they construct a aggressive benefit, bettering launch velocity and high quality, and specializing in what issues most: the cellular consumer expertise.
