Series

LLM Foundations

Breaking down how LLMs actually work — tokens, embeddings, context windows, and the fundamentals you need before building anything serious.

What Are Tokens and Why Your LLM Bill Depends on Them
Learn what tokens really are, why they're not words, and how understanding tokenization saves you money on LLM API costs.
Feb 1, 20269 min read28
How Tokenization Works: BPE and the Algorithm Behind Your LLM
Learn how Byte Pair Encoding (BPE) actually works — the algorithm that powers GPT, Claude, and LLaMA tokenizers. Step-by-step with examples.
Feb 3, 20269 min read11
What Are Embeddings and How Vector Similarity Actually Works
Learn what embeddings are, how vector similarity works, and why understanding magnitude vs direction matters for semantic search and RAG systems.
Feb 8, 202614 min read18
Choosing Embedding Models and Dimensions: Why 1536 Isn't Always Better Than 384
Learn how to choose embedding models and dimensions for production RAG systems. Compare OpenAI, Voyage AI, and Google's free options for embedding
Feb 10, 202611 min read12
Anatomy of a Prompt — System, User, and Assistant Explained
Understand how system, user and assistant messages shape LLM behavior.
Feb 15, 20268 min read10

LLM Foundations