From Local Failure to Cloud Clarity: Why My 9B LLMs Couldn’t Code a Simple Snake Game
Why did my local AI build a Snake game that couldn't collide and a Dino that couldn't jump? I pitted 9B local models against DeepSeek V4 Flash to see if "local-first" is ready for prime time. The result: a masterclass in why reasoning depth beats privacy every single time.
We’ve all seen the benchmarks claiming that local, "small" models are nipping at the heels of the giants. As a firm believer in the local-first movement, I decided to put Qwen3.5:9B and Gemma 4-e4b to the test. My goal? A simple retro gaming portal featuring Snake and Dino Runner.
The result was a sobering reality check. Despite dozens of iterations, the local models couldn't produce a playable product. Here is why size still matters in the world of AI-assisted coding.
The Comparison: Weight Classes in Action
| Metric | Local (Qwen 9B / Gemma 4B) | Cloud (DeepSeek V4 Flash) |
|---|---|---|
| Architectural Depth | Dense / Small MoE | High-Scale Mixture-of-Experts |
| Tool Calling | Brittle; frequent JSON syntax errors | Robust; 100% success rate in OpenCode |
| Logic Retention | "Goldfish memory" regarding variables | High global context awareness |
| End Result | Broken physics & 140+ lines of dead code | Production-ready in one attempt |
The "Anatomy of Failure": What Went Wrong Locally?
When I finally switched to DeepSeek V4 Flash, the model provided a summary of what it had to fix. Looking at the "Screenshot 2026-05-09 at 2.41.32 PM.jpg" logs, the local models' errors weren't just typos—they were fundamental logic collapses.
1. The Coordinate Mismatch (The Ghost Snake)
The local models struggled with "State." They tracked the snake's head using pixel values (e.g., 160px) but stored the body segments in grid coordinates (e.g., 0–39).
- The Bug: Since 160 will never equal 40, collision detection never triggered.
- The Fix: DeepSeek synchronized the math, ensuring the snake lived in a single, logically consistent universe.
2. Broken Physics & Static Obstacles
In the Dino Runner game, the local models treated each frame as an isolated event rather than a continuous simulation.
- Physics: Instead of accumulating velocity (gravity), the model recalculated the Dino's height from scratch every frame, leading to "teleporting" rather than jumping.
- Logic: It spawned obstacles at the edge of the screen but never wrote the logic to move them (decrementing
x). They just sat there, watching the Dino jump in place.
The "Tool-Calling" Bottleneck
This was the most frustrating part of the local experience. Using OpenCode, the AI needs to use "tools" to write files and run the browser.
Small models (under 20B parameters) have a high "cognitive load." When they are busy trying to figure out JavaScript logic, they often "forget" how to format the tool-call correctly. They output malformed JSON or ignore the system instructions entirely. DeepSeek V4, with its massive parameter count, handles the tool-call syntax and the coding logic simultaneously without breaking a sweat.
Lessons Learned: When to Go Local?
This experience doesn't mean local models are useless. They are fantastic for:
- Refactoring small, isolated functions.
- Explaining snippets of code.
- Writing unit tests for simple logic.
However, for stateful applications where the HTML, CSS, and JS must all "talk" to each other flawlessly, the reasoning gap is still wide. DeepSeek V4 didn't just fix the bugs; it performed a "house cleaning," removing 140+ lines of orphaned code and fixing memory leaks that the smaller models had introduced.
The Verdict: If you want a "one-shot" working solution for complex projects, the cloud is still king. If you're going local, don't settle for 9B—aim for at least a 32B or 70B model if your VRAM can handle it.
Have you hit a "logic wall" with local models? Let’s talk about it in the comments.