I originally wanted to write 2 distinct articles, but realized that they shared the same underlying idea: "Vibe coding", the act of talking to an LLM / LLM-based AI agent to code a full project without writing a line of code manually, is no more than a metaprogramming technique, where a program (the AI agent), produces another program (the code that gets committed). As such, it shares some common characteristics of metaprogramming, notably: 1) The use-case of using metaprogramming when lacking an abstraction. 2) The poor experience of maintaining metaprogramming output when you can't maintain the generator level.
Let me explain.
The missing abstraction
The C language doesn't have generics. So what do you do when you need to implement some data structure working with 2 different types (say, a Stack
struct, that can work with both int
and float
)? Well, if you're lazy, you copy-paste the int
version and change occurrences of int
to float
.
If you're sophisticated-lazy, you write a macro taking a type as argument, that defines the struct with this type. The macro will literally produce the code of both structs before feeding it to the compiler, and you sorta have a generic struct thanks to the macro metaprogramming for you.
The metaprogramming (macro) helps you plug the gap left by a missing abstraction (lack of generics).
Now, let's teleport to the higher level of web development. Web developers are often in a position where they need to implement CRUD (Create, Read, Update, Delete) for some resource, over HTTP. They're going to implement one HTTP endpoint for each action, each doing validation, serialization, some database query, returning some data with a return code, etc.
Doing that once for one resource is fine. Then you need to do the same thing for another resource, and then a third, etc. It's basically always the same, except that your endpoints have a different path, the database table has a different name, and maybe some tiny details changing here and there.
You ideally need some sort of "generic" CRUD handling over HTTP.
Well, you can use a web framework that does the magic for you, such as Django in Python or Rails in Ruby.
But if you use what they call a "micro-framework", such as Flask, joke's on you, you have to manually implement all the required endpoints. What are you gonna do?
Well, if you have heard about vibe coding, you're gonna ask an AI agent to code the endpoints. And it's gonna succeed like a pro: it has trained on gazillions examples of such code and what it's doing for you is basically sophisticated copy-pasting.
The metaprogramming (AI agent) helps you plug the gap left by a missing abstraction (lack of generic CRUD HTTP-handling logic).
But there is a significant difference with the metaprogramming done by the C macro of the previous example: In this case, you end up with the responsibility of maintaining the output code of the metaprogramming. With the C macro, you don't see the output (which is a technical step between your code and the compiler), and you maintain the macro itself.
Maintaining metaprogramming output
In desktop programming, folks are using GUI designer tools. The various GUI elements are ultimately coded with primitives in the low-level language (e.g. C++), but because it's not intuitive to program something "visual" by instantiating C++ objects, a GUI editor allows doing it visually. The GUI editor actually produces an XML representation of the interface, which then gets converted into C++ code.
It works because the programmer never has to maintain the generated C++ code, which obviously can be quirky, and is not to be considered maintainable by itself. Programmers have a higher interface (the GUI editor) which is all they need to care about.
There are examples of metaprogramming where programmers are expected to maintain the output itself. For example, the scaffolding commands of many frameworks (e.g. Django's startapp
which will instantiate a bunch of files and folders and some boilerplate). But those are usually very limited to basic boilerplate.
There aren't many examples in the industry where you need to maintain metaprogramming output of significant size yourself. Or, at least, where it isn't painful.
This is, in my opinion, the biggest problem of "vibe coding": The AI metaprograms for you, but then you need to actually maintain the generated code. There is no "higher interface" to it. Imagine being happy to be able to write in C instead of in Assembly, because you have a compiler, but then you're actually forced to maintain the Assembly code.
You could argue that this is the differentiator of "AI": the magic intelligence sauce is precisely what should make the generated code as maintainable as code written directly by a human. But, so far, this is not what seems to be happening in practice: Eventually your agent will hit a wall where it fails at implementing what you want, and the only way to iterate forward is to look at the code it has previously generated to implement it yourself (or to give more fine-grained instructions). You're looking at code you did not write yourself, that is probably low-quality, and you're now thinking about just writing it from scratch your own way.
The closest to what you could maintain at a "higher level" than the produced code would be the prompt you gave to the AI agent. The problem is that the prompt (at least currently) is a one-time evanescent thing. It's not something you're gonna check out in version control and maintain. There are various, hard-to-solve reasons for that:
(Note: Let's assume we are using a deterministic LLM.)
Firstly, LLM inference is so slow and expensive, that is not something you want to run more than once. Even assuming you would find a way to properly represent your codebase in the form of "prompts", it would be insane to have a build step in your toolchain consisting in inferring the codebase from your prompts, everytime.
Secondly, LLM inference is a chaotic function of the prompt. Imagine having a full codebase that you generate from a sequence of prompts. You now change just one word from one of the prompts. You have no idea how this will affect your codebase. You also don't have a clear idea how this affects the resulting behavior of the software (with possible regressions at random places).
We are a long way from not having to maintain the generated code.
Conclusion
To summarize the quality of programmer experience relating to programming and metaprogramming, we have seen 3 different levels, from best to worst:
- Using a tool offering the proper abstraction for our use-case (e.g. support of generics by Rust; support of RESTful endpoints by Rails).
- Using a tool lacking an abstraction, and plugging the gap with a metaprogramming technique where we maintain the metaprogramming source (e.g. C macros).
- Using a tool lacking an abstraction, and plugging the gap with a metaprogramming technique where we maintain the metaprogramming output (e.g. AI-produced code).
If (and that's a big "if") we ever manage to find a way to make prompts an actual programming base rather than an evanescent code generation thing, AI-produced code would be promoted from level 3 to level 2. But it still won't match level 1. For example, instead of marveling at how good AI agents are at writing unit tests, we should be thinking about the abstractions we lack around testing.
"Vibe coding" is not only the worst developer experience, it's a bad idea that emerges to plug gaps left by missing abstractions.