Vibe Coding Crisis: Why AI Code Breaks

Your engineer joined the company on a Monday. By Friday she wants a meeting. She’s read the codebase, and her verdict is rebuild.
You wrote a $90K check to that dev shop nine months ago. The product is live, customers are paying. Now someone you trust is telling you to throw it all away.
The reason isn’t a security breach or a missed deadline. Vibe-coded code can’t be changed cleanly. Each fix your team writes touches one of several scattered duplicates of the same logic, so the bug you patched on Monday turns up again on Wednesday from a different file. New features take longer than the last ones. Your engineer can’t always tell you why, only that the codebase isn’t structured for change.
This post is the diagnostic. Five signals you can check on your own codebase before your engineer’s first Friday meeting. Once you’ve got an answer, our companion post covers what to do about it - the cost math and the workflow that prevents the rebuild bill.
How to tell if your codebase is vibe-coded #
You don’t have to read code. Five signals an outsider can check, and most vibe-coded codebases miss three or more of them.
| Signal | Healthy | Vibe-coded |
|---|---|---|
| Test coverage | 70%+ | 0-5% |
| Commit size | 50-200 lines | 2,000+ lines |
| Error handling | Graceful failures | White screens, crashes |
| Monitoring | Alerts within 90 seconds | “A user emailed us” |
| Code duplication | Reused functions | Same logic in 8 files |
Test coverage is the fastest signal. If your team can’t tell you a percentage, treat that as zero - AI doesn’t write tests by default. Healthy production codebases run 70% coverage or higher. Anything below 30% means zero in practice, and TDD is the discipline that fixes it.
Open the commit log next. Real developers ship dozens of small commits per week, each one reviewed by another human
before it lands. A commit titled Add user module that touched 2,000 lines, followed by Add payment module that touched 3,000 more, is AI paste. Big dumps mean nobody read the code before it went in.
To check error handling, ask QA to break a form on purpose. Paste 10,000 characters where a name should go, then drop emoji into a phone field. If the app crashes or shows a white screen, the developer only built for the happy path.
Monitoring shows up in one question: what happens at 3am if the app crashes? A healthy answer is an automated alert paging an on-call person within ninety seconds. An unhealthy one sounds like “a user usually emails us.”
Code duplication is the hardest signal to verify directly. Ask your team a simple question: “How many places in the code handle login?” If the answer takes longer than ten seconds to come back, it’s almost always “more than one.”
A failure that doesn’t make the news #
We worked with a fintech founder last year who couldn’t figure out why customers kept complaining about transaction amounts that were a few cents off. His team hadn’t noticed because the differences were tiny - fractions of a penny per transaction. The vibe-coded payment service stored dollar amounts as floats, which produces small rounding errors on every calculation.
Over four months those errors compounded across thousands of transactions. The founder found out when one of his users filed a complaint with their bank.
Nothing about it made the news. There was no breach to report and no money technically stolen, just a quiet drift in a number that nobody had bothered to test. That’s the more typical way vibe-coded apps fail in production - slowly and expensively, over enough time that the original developer is long gone by the time anyone notices.
How we ship AI-assisted code now #
We’ve shipped vibe-coded prototypes ourselves. The first time we used Lovable for a client demo, we forgot to flag it as throwaway and the founder kept building on it. That cost us a week of rewrites and a clearer rule: any AI-assisted code that touches paying users gets the test-first treatment from day one. Write the test first and make it pass; the AI can help with the implementation, but a human writes the test that defines what working actually means. That’s test-driven development , and it catches code that looks good but doesn’t work.
A second person should read every pull request before it lands. Two developers working together on the same problem catch errors the moment they’re introduced, not days later when the context is gone.
Your team should send you a weekly report showing what shipped and whether test coverage went up or down. Watch that ratio. If features climb while coverage drops, somebody is cutting corners. Ship small features often, not three-week sprints that end in a big reveal - the big reveal is where vibe-coded apps look impressive, because nobody has tested them with real users yet, and where technical debt compounds fastest .
You’ve already got one. What’s next? #
If three or more of those signals describe your codebase, the next question isn’t whether you have a vibe-coded app. Our companion post on the generate-validate-kill workflow walks through how to use vibe coding for what it’s good at, plus the cost math on why founders who try to skip the rebuild end up paying twice.
Want to skip the self-audit? We’ll read your codebase and send back a one-page assessment - test coverage and security risks reviewed, plus a clear salvage-vs-rebuild verdict. No pitch on the call. We’ve done this on 40+ rescue engagements and the answer is usually clearer than founders expect.
If you’d rather run the questions on your team yourself, the red flags checklist for dev shops gives you the specific things to ask this week.
Further reading #
- Vibe Coding Is Disposable. Stop Shipping It. - companion post covering the generate-validate-kill workflow and rebuild cost math
- Andrej Karpathy’s original “vibe coding” post - February 2025
- Qodo 2025 State of AI Code Quality Report - AI code produces 1.7x more issues
- OWASP Code Review Guide - what a real security review looks like