Inference or Die

A tract on running it yourself

Read the full issue →

Contents · APRIL 2026

In this issue.

Preface A Tract Why this issue is angrier than the others. The argument for running in… p. 01

From the Editor · APRIL 2026

No centralization. No data exfiltration. Cheap. The Linux moment for AI looks like this.

Three tiers of inference: hosted APIs, self-hosted clusters, and the GPU in your laptop. This issue argues that the third tier has been underrated — and that a consumer-grade MacMini plus an open-weights model is already enough to unmake a chunk of the SaaS stack.

Hosted. Self-hosted. On-device. Three tiers, and we have spent too much time pretending the third one doesn’t matter.

The economics — when electricity is your only marginal cost
The hardware — MacMini, Framework, Strix Halo, and the NVIDIA DGX Spark
The models — gpt-oss, Qwen, Llama, and the open-weights bench
The attack surface — local models as a payload, and why that’s coming
The counterweight — what flips when inference is free

— Local Editorial

Cross-References