22 bugs Claude couldn't find in 5,000 lines of C++

Before we wrote a single line of HyperAnalyzer, we wanted to know whether the problem we suspected actually existed. So we took a real production Windows C++ codebase that an LLM had been editing for months and pointed a top-tier commercial static analyzer at it.

The codebase was around 5,000 lines, all human-reviewed, all passing CI, all running in production. The analyzer surfaced 22 real bugs. Not style nags. Not opinionated nits. Bugs that compile cleanly, pass code review, and would only manifest under specific runtime conditions.

What kinds of bugs

The interesting part was the distribution. We expected the usual suspects (off-by-one, null deref, leaks). What we actually found:

6 Windows-specific API misuses: CreateThread instead of _beginthreadex, LoadLibrary inside DllMain, COM init on the wrong thread. Things that only break when the process is loaded as a DLL into a host that uses CRT in a particular way.
5 unsigned arithmetic traps: subtractions that wrap around zero and silently become huge values, then get compared with < and break a guard clause.
4 dead stores: a variable assigned twice with no read in between. Always the symptom of a half-finished refactor.
3 sizeof(pointer) bugs passed to memcpy and friends. The classic confusion between array and decayed pointer.
2 cases of secret material left on the stack because the buffer was never zeroed on the early-return path.
2 performance footguns: push_back(Type{...}) where emplace_back would skip a copy, and a struct whose padding doubled its size.

Why Claude missed all of them

Every single one of these bugs would have been caught by Claude if you had asked the right question. The problem is that nobody asks. The model writes the code, sees that it compiles, runs the tests it can think of, and moves on. There is no step in the workflow that says “now look at this code with the eyes of an analyzer”.

That is the gap HyperAnalyzer fills. Not by being smarter than Claude, but by being a tool Claude can call on every diff, automatically, before anything is committed.

These 22 bugs are now our regression test set. If a build of HyperAnalyzer cannot reproduce all of them on the original codebase, the build does not ship.