Vibe Coding Is Real. Here Is Where It Breaks Down.
March 28, 2026
In February 2025, Andrej Karpathy posted a casual observation that named something millions of developers were already doing: describing what they wanted to AI, accepting the code that came back, and iterating by feel rather than by deep understanding. He called it "vibe coding." Within weeks, the term was everywhere. Within months, over 70% of professional developers were using AI coding tools weekly.
The debate that followed was fierce and mostly unproductive. Enthusiasts accused skeptics of gatekeeping. Skeptics accused enthusiasts of recklessness. Both were right. Neither had the complete picture.
I built a production SaaS product almost entirely through AI-assisted coding. I use Claude Code, Cursor, and other AI development tools daily. I have seen vibe coding work remarkably well and I have seen it produce subtle disasters. The reality is more interesting than either the hype or the backlash suggests.
Here is what I have learned about where it breaks down -- and why understanding the failure modes is what separates a competent practitioner from someone who got lucky for a while.
The AI's Blind Spot: Your Business Logic
AI coding tools are trained on the internet's code. The internet's code is overwhelmingly generic -- standard patterns, common frameworks, typical CRUD operations. Your business logic is, by definition, not generic.
When you ask an AI to implement "user authentication with JWT tokens," it has seen ten thousand implementations and can synthesize an excellent one. When you ask it to implement "the regulatory calculation for blended interest rates on variable-term credit products with early payoff provisions under the current NCUA guidelines," it is in uncharted territory.
The dangerous part is that the AI will still generate code. It will not say "I do not know how to do this." It will produce something that looks like a complete implementation, uses the right variable names, follows reasonable patterns, and might even include comments explaining the logic. But the actual calculations, the edge cases, the regulatory nuances -- these will be approximations at best, and plausible fabrications at worst.
I hit this repeatedly while building features with domain-specific logic. The AI would generate a pricing calculation that looked correct and produced reasonable numbers for the test cases I tried. But when I ran it against the full test suite -- the weird edge cases that only exist because of specific business rules -- it failed. Not catastrophically. Subtly. The kind of failure where the numbers are close enough that you might not notice unless you are specifically looking.
The pattern is consistent: the more domain-specific your logic, the less you can trust AI-generated implementations. The AI can generate the scaffolding -- route handlers, database queries, response formatting, error handling -- while you write the core logic by hand. But the core logic itself needs to come from someone who understands the domain.
A useful rule of thumb: if you cannot explain the expected output for every edge case without consulting documentation, the AI definitely cannot implement it correctly.
Security: The Gap Between "Works" and "Secure"
This is the failure mode that keeps me up at night.
AI-generated code frequently works -- in the sense that it produces the correct output for valid inputs -- without being secure. And the gap between "functional" and "secure" is where real damage happens.
Consider a common scenario. You ask the AI to generate an API endpoint that accepts user input, queries a database, and returns results. The AI produces code that does exactly that. It runs, it returns correct data, your tests pass. But: Is the input properly sanitized? Maybe -- the AI usually includes parameterized queries, but not always, and not always correctly for every input path. Does it check authorization, not just authentication? Are error messages leaking system internals? Is rate limiting in place? Almost never unless you specifically ask. Are sensitive fields being filtered from the response?
Each of these issues is individually fixable. The problem is that they are individually easy to miss. AI-generated code looks complete. It follows good patterns. The security gaps are not obvious flaws -- they are omissions that require security expertise to notice.
The worst version of this involves cryptographic code. Do not let AI generate cryptographic implementations. Period. The AI will produce code that looks like encryption, uses real cryptographic functions, and might even pass basic testing. But cryptographic security depends on details that LLMs are not equipped to guarantee: timing-safe comparisons, proper initialization vectors, correct mode selection, appropriate key derivation. A cryptographic implementation that is 99% correct is 100% insecure.
Use established libraries. Let the AI wire them up. But the security decisions need to come from vetted, audited, purpose-built code.
Code That Runs vs. Code That Runs Fast
AI generates code that works. It does not generate code that performs well under load.
LLMs optimize for correctness and readability, not for performance. When the AI generates a database query, it produces one that returns correct results. Whether that query uses an index, scans the entire table, or could be restructured to avoid a join -- these are secondary considerations the AI rarely addresses.
The common performance problems in AI-generated code form a predictable list: N+1 query patterns that work for ten items and bring the database to its knees for ten thousand. Missing indexes. Linear searches where hash maps would be O(1). No caching strategy. Loading entire datasets into memory when streaming would be appropriate.
I experienced this firsthand. AI-generated dashboard queries worked perfectly during development with a few hundred records. When I loaded realistic data volumes -- tens of thousands of records -- the dashboard took fifteen seconds to load. The fix required rewriting three queries, adding two indexes, and implementing a caching layer. The AI could not have produced this optimization without understanding the specific data access patterns and volume expectations.
Then there is the debugging problem. Something breaks. You look at the code. You did not write it. You do not fully understand it. Traditional debugging relies on the developer's mental model -- you wrote it, so you know what you intended. With AI-generated code, that mental model does not exist.
Worse, AI-generated code looks like code written by someone who knew what they were doing. This creates a false sense of understanding. You read it and think you understand it because the structure is familiar, even when the specific logic is more complex than it appears.
The debugging difficulty scales with project complexity. In a small project, you can read the entire codebase. In a larger project with thousands of lines of AI-generated code, building that mental model becomes a significant effort at the worst possible time.
The honest summary: vibe coding is a legitimate and powerful way of working. It dramatically accelerates prototyping, internal tools, CRUD apps, and the first 90% of many projects. It is not a replacement for understanding your domain, your security requirements, or your performance characteristics. The developers who thrive with these tools are the ones who know when to trust the AI and when to take the wheel.
A pilot who only knows how to fly in clear weather is not a pilot. A developer who only knows where AI succeeds is not ready to use it on anything that matters.




