The Harsh Tech

Write

Choosing Embedding Models and Dimensions: Why 1536 Isn't Always Better Than 384

Feb 10, 202611 min read

What Are Embeddings and How Vector Similarity Actually Works

Feb 8, 202614 min read

Anatomy of a Prompt — System, User, and Assistant Explained

Understand how system, user and assistant messages shape LLM behavior.

Feb 15, 20268 min read

How Tokenization Works: BPE and the Algorithm Behind Your LLM

Feb 3, 20269 min read

What Are Tokens and Why Your LLM Bill Depends on Them

Feb 1, 20269 min read

OpenAI Prompt Caching: Undocumented Cross-Model Behavior and Production Cost Implications

I'm building an AI agent from scratch—no frameworks, no abstractions—specifically to understand where every token goes and how much it costs. This is Phase 3 of my token economics research. Phase 1 covered basic tool calling mechanics. Phase 2 reveal...

Dec 19, 202512 min read

Model Selection for AI Agents: Measuring Token Costs Across OpenAI's Model Family

I've been building an AI agent from scratch. No frameworks, no abstractions, to understand where every token goes and what drives cost at scale. In previous post/phase, I measured how tool definitions and conversation depth impact token usage. The fi...

Dec 19, 202521 min read

Token Explosion in AI Agents: Why Your Costs Scale Exponentially

I built an AI agent from scratch. Not because frameworks aren't good. They are(and I suggest you use them). But because I needed to see where every token goes. When you're building production systems that could cost $150K+/year in LLM tokens alone, y...

Dec 10, 202515 min read

OOP Fundamentals for AI Applications

Why Your AI Code Needs Better Structure

Oct 25, 202514 min read

SOLID Principles for AI Systems: Why Your RAG Pipeline Needs Better Architecture

Your RAG pipeline works perfectly in staging. You deploy to production. 10,000 concurrent users hit it. Embeddings start timing out. Vector search fails silently. LLM calls retry infinitely because someone forgot to set a max. Your "AI-powered" featu...

Oct 20, 202512 min read

Thread Wars: Episode 3 – Rise of the Virtual Threads

We finally got our threads back. Now let’s not burn the world down with them.

Jul 29, 202510 min read

Command Palette

Latest articles