Ollama - Talking Tech

Comparison illustration of Ollama running on a personal laptop versus vLLM serving on a multi-GPU server rack

Local LLM Inference: Ollama vs vLLM

A practical comparison of Ollama and vLLM for local LLM inference in 2026 — covering continuous batching, PagedAttention, throughput benchmarks, and a decision framework to pick the right engine for your workload.

Jun 04, 2026 4 min read

A developer using a laptop with a local LLM running via Ollama, showing code and terminal on screen

AI

Running LLMs Locally with Ollama in 2026

In 2026, running LLMs locally on your own machine is no longer a niche pursuit — it's a practical, cost-effective alternative to cloud APIs. This guide covers hardware requirements, Ollama setup, model selection, and development integrations for developers who want full control over their AI stack.

May 20, 2026 3 min read

AI coding

From Local Failure to Cloud Clarity: Why My 9B LLMs Couldn’t Code a Simple Snake Game

Why did my local AI build a Snake game that couldn't collide and a Dino that couldn't jump? I pitted 9B local models against DeepSeek V4 Flash to see if "local-first" is ready for prime time. The result: a masterclass in why reasoning depth beats privacy every single time.

May 09, 2026 2 min read

Artificial Intelligence

Setting Up OpenClaw Locally with Ollama (and What I Learned Along the Way)

The idea was simple: run a capable, private AI assistant with GPU acceleration and a clean web interface. In reality, it turned into a deep dive into agent systems, model limitations, and performance tuning.

Apr 03, 2026 3 min read