“Vibe coding” is the new caffeine: intoxicating, productivity-boosting, and capable of ruining your week if you treat it like water.
In a hackathon, vibe coding is a flex. In production at scale—especially in regulated industries—it’s either a competitive advantage or a creative way to summon auditors, outages, and reputational damage in one elegant PR.
Here’s the thing founders tend to miss (because founders are built out of optimism and cortisol):
Regulated doesn’t mean slow. Regulated means you can’t blag it.
You can absolutely move fast. You just need to move fast with evidence, not vibes.
This article is a founder-facing playbook for how to do exactly that.
The real problem isn’t AI-written code. It’s unverified change.
LLM-generated code has a specific failure mode: plausibility.
It reads clean. It compiles. It passes the tests you thought to write. It “looks right”.
And then it detonates in production because:
- the edge case you didn’t anticipate is now happening 10,000 times per minute,
- a “harmless” query turns into a database lock festival,
- concurrency turns your lovely logic into a probabilistic crime scene,
- a rollout quietly changes behaviour in a way you don’t notice until revenue drops or complaints spike.
Your job is to satisfy three audiences (and none of them care about your vibes)
When you ship production systems in regulated environments, you’re always answering to:
Founders often optimise for #1, sometimes #2, and treat #3 like a box-ticking exercise.
That’s backwards.
If you solve #3 properly, you often get #2 “for free”, and #1 becomes easier because you can ship more aggressively without fearing your own shadow.
The trick is to turn “compliance” into an engineering artefact, not a meeting series.
Compliance isn’t paperwork. It’s traceability + control
People hear “regulated” and imagine some Dickensian clerk demanding a Word document with 17 headings.
In reality, what you need is boring, mechanical, and automatable:
None of that requires heavy bureaucracy. It requires systems.
If you want to vibe code in production, you need to accept one brutal truth:

The Evidence Bundle
core patterns
If you do one thing after reading this, do this:
Every production-impacting change produces an Evidence Bundle automatically.
Not a wiki page. Not a Slack message. A generated artefact tied to the deploy.
Your Evidence Bundle should answer:
This transforms “we move fast” from a vibe into a machine.
And—this is the part founders should care about—it makes your org more fearless, which makes you faster.
Not all changes deserve the same brakes
The single most common operational mistake I see in “fast” teams is treating every commit like it has the same blast radius.
Regulated environments don’t punish speed. They punish reckless uniformity.
So risk-tier changes and gate accordingly. Keep it simple:
Then enforce different gates per tier.
The goal isn’t to slow down. The goal is to apply friction only where irreversible damage is possible.
That’s how you keep velocity without turning your production environment into a casino.

The dirty secret of AI products: behaviour is part of your API
In traditional systems, “regression testing” is mostly deterministic: code in → output out.
In AI-heavy systems, you are shipping stochastic behaviour:
- prompts change outcomes
- routing policies change outcomes
- model updates change outcomes
- safety filters change outcomes
- tool calls fail weirdly
- distribution shifts over time
So if you’re vibe coding an AI product and you don’t have behavioural regression tests, you’re basically doing surgery with a blindfold and calling it “bold”.
What works:
I’ve used multi-agent evaluation patterns where the system has to argue for correctness—e.g., native-speaker debate panels for translation quality and cultural bias detection—because “sounds plausible” is a trap at scale.
This is the point: you don’t need more “testing”. You need the right kind of testing.
The “highly regulated” bit: audits want replay, not opinions
So build systems that make those questions boring:
- version everything that affects behaviour (code, config, prompts, model IDs, routing)
- correlate user-visible outcomes with deployed versions (trace IDs + release IDs)
- immutable deploy records (who/what/when/why)
- reproducible builds
- data lineage for training/fine-tuning/evaluation sets
This is not “governance theatre”. It’s an operational superpower.
In practice, this is how you keep agentic automation safe in environments where “oops” is not an acceptable incident classification. I’ve shipped multi-agent systems in insurance contexts where mistakes are expensive and scrutiny is real—what makes it sustainable is converting risk into controls that are executable, observable, and reviewable.
Migrations and backfills are where speed goes to die (unless you treat them like explosives)
Here’s an unsexy truth: your fanciest AI architecture is irrelevant if your data layer becomes a haunted house.
Stateful changes deserve their own discipline:
- expand/contract schema migrations
- dual-write / dual-read transitions where needed
- backfills with throttling, checkpointing, and progress visibility
- idempotency everywhere
- the ability to pause, resume, and roll back without improvisation
This is also where agentic coding is most dangerous: an LLM can “help” you write a migration in seconds… and destroy weeks of data integrity in one deploy.
You’re not distrusting AI. You’re respecting entropy.
The founder’s cheat code: make rollback a muscle, not a myth
Most teams have rollback. Few teams can rollback under pressure without making it worse.
At scale, rollback must be:
- fast (minutes)
- safe (idempotent, no half-migrated state)
- rehearsed (you’ve done it when nothing was on fire)
This is the paradox founders love:
Because they’re not paralysed by fear. They know failure is recoverable.
If you want to vibe code, you must make failure cheap.
A practical “vibes → production” pipeline that actually works

Vibe Coding is fine.
Shipping vibes is not.
The closing punch.
Founders love speed because it wins markets. Regulators love evidence because it prevents harm. Production loves to fail because physics is petty.
You can satisfy all three.
Vibe code your prototypes. Vibe code your drafts. Vibe code your internal tools.
But when you ship production at massive scale—especially in regulated industries—make your org allergic to “trust me”.
That’s the game.
Subscribe to my newsletter to get the latest updates and news
Member discussion