Choosing Embedding Models and Dimensions: Why 1536 Isn't Always Better Than 384Feb 10, 2026·11 min read
Anatomy of a Prompt — System, User, and Assistant ExplainedUnderstand how system, user and assistant messages shape LLM behavior.Feb 15, 2026·8 min read
OpenAI Prompt Caching: Undocumented Cross-Model Behavior and Production Cost ImplicationsI'm building an AI agent from scratch—no frameworks, no abstractions—specifically to understand where every token goes and how much it costs. This is Phase 3 of my token economics research. Phase 1 covered basic tool calling mechanics. Phase 2 reveal...Dec 19, 2025·12 min read
Model Selection for AI Agents: Measuring Token Costs Across OpenAI's Model FamilyI've been building an AI agent from scratch. No frameworks, no abstractions, to understand where every token goes and what drives cost at scale. In previous post/phase, I measured how tool definitions and conversation depth impact token usage. The fi...Dec 19, 2025·21 min read
Token Explosion in AI Agents: Why Your Costs Scale ExponentiallyI built an AI agent from scratch. Not because frameworks aren't good. They are(and I suggest you use them). But because I needed to see where every token goes. When you're building production systems that could cost $150K+/year in LLM tokens alone, y...Dec 10, 2025·15 min read
SOLID Principles for AI Systems: Why Your RAG Pipeline Needs Better ArchitectureYour RAG pipeline works perfectly in staging. You deploy to production. 10,000 concurrent users hit it. Embeddings start timing out. Vector search fails silently. LLM calls retry infinitely because someone forgot to set a max. Your "AI-powered" featu...Oct 20, 2025·12 min read
Thread Wars: Episode 3 – Rise of the Virtual ThreadsWe finally got our threads back. Now let’s not burn the world down with them.Jul 29, 2025·10 min read