What I'm building, what I'm learning, what broke today. Less polish, more signal.
10 million downloads and counting
Gemma 4 crossed 10 million downloads today. My mentions are filling up with people asking about local performance. Posted some of our benchmark numbers and the response has been wild — turns out nobody else is publishing consumer hardware results. The gap between 'runs on an H100' and 'runs on your machine' is real, and people are hungry for honest data. Feels good to be filling that gap, even if our testing is still in progress.
Gemma 4CommunityLocal AI
Think mode: the counterintuitive results
Ran Think Mode A/B tests today and the results are not what I expected. On the E2B model, enabling Think Mode on a math task dropped the score from 100% to 20%. On the 26B, it dropped logic from 60% to 20%. My hypothesis: on memory-constrained hardware, the extra tokens consumed by 'thinking' crowd out the tokens needed for a good answer. The model literally thinks itself into a worse response. This is the kind of finding you only get from testing on real hardware with real constraints.
Gemma 4Think ModeExperiments
403 benchmarks, one iMac, zero cloud
Finished the full Gemma 4 benchmark suite today. 403 tests across all four models — E2B, E4B, 26B, and 31B — running entirely on a 2017 iMac i7-7700K with 40GB RAM. The 31B model is... ambitious on this hardware. Some reasoning tasks took over 3 minutes, but it actually solved AIME math problems correctly. The sweet spot is the 26B — fast enough to be useful, smart enough to be interesting.
Gemma 4BenchmarksLocal AI
Dark mode toggle with jelly physics
Added a theme toggle to the site today. It's got a spring animation on the knob, twinkling stars in the dark mode track, and sun rays that rotate in. The real engineering was underneath — ThemeProvider with localStorage persistence, anti-FOUC inline script, and a custom event system so the globe canvas can pick up theme changes without re-mounting. The animation is fun but the architecture is the part I'm proud of.
DesignReactAnimation
Globe as ambient background
Integrated a wireframe globe into the hero section as an ambient background element. Canvas-based rendering with d3-geo projections, Float32Array for the point grid, and gradient overlays to keep text readable. It's atmospheric, not interactive (well, you can drag it). The key decision was making it react to theme changes via a custom event rather than re-mounting — keeps the rotation smooth during dark mode transitions.
d3CanvasPerformance
Why I benchmark locally
Everyone benchmarks on H100s. Nobody tells you what a model actually feels like on the hardware you own. I wanted to know: can a 31B model run on consumer hardware? The answer is yes, with caveats. The latency is real, but the capability ceiling is higher than you'd expect. That's the gap I'm trying to fill — practical, honest numbers for people who build with what they have.
PhilosophyLocal AI
The orchestration problem
Started building an autonomous benchmark orchestration pipeline. The goal: let the iMac run all 36 remaining tests overnight without me touching it. Sounds simple until you deal with model timeouts, OOM kills, partial results, and the machine going to sleep. SSH tunnels, caffeinate(1), and a lot of defensive scripting. This is the unglamorous side of 'local AI'.