What Are Embeddings and How Vector Similarity Actually Works

If you've ever wondered how AI "understands" that "king" is closer to "queen" than to "pizza," you're about to find out. And no, it's not magic, it's math. Specifically, it's embeddings and vector similarity.

This is the foundation that powers semantic search, RAG systems, recommendation engines, and pretty much every AI feature that involves "finding similar things." Get this wrong, and your AI retrieves garbage. Get it right, and suddenly your system feels intelligent.

Let's break it down.

The Problem

Traditional keyword search is broken.

You search for "how to reset password" in your company docs. The system looks for exact matches: "reset" AND "password." It misses the document titled "Account Recovery Procedures" even though that's exactly what you need. Different words, same meaning and keyword search can't see it.

This is the cold start problem for AI: computers don't naturally understand that "reset password" and "account recovery" mean the same thing. They see strings, not semantics.

Embeddings solve this. They convert text into a mathematical form that captures meaning. Once you have that, you can measure "how similar" two pieces of text are, even if they share zero words in common.

Core Concept: Embeddings as Meaning Coordinates

Think of embeddings like GPS coordinates for meaning.

If words were cities, embeddings would be their latitude and longitude. "King" and "queen" live close together in semantic space. "King" and "pizza"? Opposite sides of the continent. That's what embeddings do. They give every word, sentence, or document a precise location in a map of meaning.

Technically, embeddings are vectors which are arrays of numbers that represent the semantic properties of an object. A vector is just a list of values, like [0.23, -0.41, 0.87, ..., 0.15], where each number indicates where that object sits along a specific dimension.

For example:

The word "dad" might be represented as: [0.1548, 0.4848, ..., 1.864]
The word "mom" might be: [0.8785, 0.8974, ..., 2.794]

These vectors capture relationships. Words with similar meanings have vectors that point in similar directions. The closer two vectors are in this multi-dimensional space, the more semantically similar the objects they represent.

Here's the key insight: embeddings don't just encode "what words are present", they encode what the text is about. That's why they work for semantic search where keyword matching fails.

What Objects Can Be Embedded?

Embeddings aren't just for words. You can embed:

Words: Individual words mapped to semantic space (Word2Vec, GloVe, FastText)
Text: Entire sentences, paragraphs, or documents (BERT, USE, Doc2Vec)
Images: Visual features and semantic content (VGG, ResNet, Inception)
Audio: Speech patterns, music characteristics (RNNs, CNNs for audio)
Graphs: Network nodes and relationships (Node2Vec, GraphSAGE)

Each type uses specialized models, but the concept is the same: convert complex objects into dense numerical vectors that capture meaningful patterns.

For this post, we'll focus on text embeddings, the foundation of RAG systems and semantic search.

What Is a Vector? (The Building Block)

Before we go deeper, let's make sure we understand what a vector actually is, because this is the foundation everything else sits on.

In mathematics, a vector is simply an array of numbers that defines a point in space. In practical terms: it's a list of numbers, like {1989, 22, 9, 180}. Each number tells you where something sits along a specific dimension.

Real-world example: Location as a 2D vector

Think about latitude and longitude. These two numbers can pinpoint any place on Earth:

Vancouver, Canada: {49.26, -123.11} (latitude, longitude)
Burnaby, Canada: {49.27, -122.97}

This is a simple 2-dimensional vector. Want to find a city near Vancouver? Just look for vectors with similar numbers. Burnaby's coordinates are very close, so we know it's nearby.

Adding dimensions for more precision

Now let's say you want to find a city that's not just near Vancouver, but also similar in size. Add a third dimension: population.

Vancouver: {49.26, -123.11, 662248}
Burnaby: {49.27, -122.97, 249125}
Seattle: {47.61, -122.33, 749256}

Suddenly Burnaby isn't as "close" anymore. Seattle is closer in both location and population size. That's what dimensions do: they add more ways to measure similarity.

From cities to concepts

Text embeddings work the same way, just with way more dimensions. Instead of 3 numbers (lat, long, population), you might have 384 or 1536 numbers, each capturing a different aspect of meaning.

For example, imagine comparing TV shows. You could create vectors based on:

Genre (sitcom, drama, horror)
Year debuted
Episode length
Number of seasons
Number of episodes

So Seinfeld becomes: {[Sitcom], 1989, 22-24 min, 9 seasons, 180 episodes} And Wednesday becomes: {[Horror], 2022, 46-57 min, 1 season, 8 episodes}

These vectors tell you: Seinfeld and Wednesday are very different shows. But Seinfeld and Cheers ({[Sitcom], 1982, 21-25 min, 11 seasons, 275 episodes}) are very similar.

The key insight: Instead of 5 dimensions (like our TV show example), text embeddings use hundreds or thousands. Each dimension captures some subtle aspect of meaning like tone, formality, topic, sentiment, time reference, and so on. The model figures out what these dimensions mean during training; you just get the numbers.

That's a vector: a point in multi-dimensional space where similar meanings cluster together.

Understanding Vector Dimensions

Every vector has dimensions. You can think of each dimension as a question that helps define meaning.

In our earlier examples, we showed vectors with just a few numbers. But real AI systems use hundreds or thousands of dimensions. For instance:

Some embedding models use 384 dimensions
Others use 768 or even 1536 dimensions

Each dimension captures a tiny part of meaning. One might represent tone (positive or negative). Another might reflect time (past or future). Others might represent gender, formality, object types, actions, or abstract ideas.

The more dimensions you have, the better the AI can understand nuance and context. But, and this is critical, more dimensions also mean higher costs, slower searches, and more storage.

We'll dig into the dimension trade-offs in Part 2. For now, just understand: dimensions are how we encode semantic complexity.

Vector Similarity: The Foundation

Once you have vectors, you need a way to measure how close they are. This is where similarity metrics come in.

There are three main methods: cosine similarity, dot product, and Euclidean distance. Each handles the two properties of vectors, direction and magnitude, differently.

Understanding Magnitude vs Direction

Every vector has two properties:

Direction: Where the vector points (the angle or orientation in space)
Magnitude: How long the vector is (the size or length)

Think of it like a compass bearing (direction) and distance traveled (magnitude).

Here's a simple 2D example with three vectors:

Vector A: [3, 4]  — Points northeast, length = 5
Vector B: [6, 8]  — Points northeast, length = 10 (2x longer than A)
Vector C: [4, 3]  — Points east-northeast, length = 5

Visually:

Notice:

A and B: Same direction, different lengths
A and C: Same length, different directions

This distinction matters because it determines which similarity metric you should use.

The Critical Question for Text Embeddings

When comparing text, should vector length matter?

Consider this:

Text A: "The weather is nice"
Text B: "The weather is nice. The weather is nice." (just A repeated)

These texts have identical meaning. B is just A repeated. If you embed both, they'll point in the same direction (same semantic content), but B's vector will be longer (more tokens).

The question: Should we treat them as identical (same direction) or different (different magnitude)?

For text semantics, direction is what matters, not magnitude.

Why? Because semantic meaning is encoded in the direction a vector points. Length is noise. It varies based on input length, model quirks, or randomness, but it doesn't change what the text is about.

This is why cosine similarity is the standard for text embeddings. It ignores magnitude and focuses purely on direction.

The Three Similarity Metrics

Let's walk through each metric with a concrete example.

Example Setup: Comparing Fruits

We'll measure similarity between strawberries and blueberries using these vectors:

Strawberry → [4, 0, 1]
Blueberry  → [3, 0, 1]

(In reality, embeddings have hundreds of dimensions, but the math is the same.)

1. Cosine Similarity (Most Common for Text)

What it measures: The angle between vectors, ignoring their length.

Formula:

cos(A,B) = A·B / (||A|| * ||B||)

Where:

A·B = dot product (multiply corresponding values and sum)
||A|| = length of vector A
||B|| = length of vector B

Calculation:

A·B = (4 * 3) + (0 * 0) + (1 * 1) = 13

||A|| = √(4² + 0² + 1²) = √17 = 4.12
||B|| = √(3² + 0² + 1²) = √10 = 3.16

cos(A,B) = 13 / (4.12 * 3.16) = 13 / 13.02 = 0.998

Cosine distance = 1 - 0.998 = 0.002

Interpretation:

Score of 1 = identical direction (perfect similarity)
Score of 0 = perpendicular (no similarity)
Score of -1 = opposite directions (complete dissimilarity)

Strawberries and blueberries score 0.998. Very similar, which makes sense. They're both small, sweet fruits.

When to use cosine similarity:

Text similarity and document comparison
Semantic search where document length varies
Any application where you care about meaning, not scale
RAG systems (this is the default)

Why it works for text: If one document says "climate change" 30 times and another says it 10 times, that's a difference in magnitude but the topic is the same. Cosine similarity correctly treats them as similar because it only looks at direction.

2. Dot Product

What it measures: Alignment of vectors, considering both direction AND magnitude.

Formula:

A·B = Σ(Aᵢ * Bᵢ)

Just multiply corresponding values and sum them.

Calculation:

A·B = (4 * 3) + (0 * 0) + (1 * 1) = 13

The dot product here is 13. Because it’s positive, the Strawberry and Blueberry vectors point in a similar direction, indicating aligned features.

The relatively large value (13) reflects strong alignment combined with non-trivial magnitude.
If the dot product were −13, it would indicate equally strong but opposite alignment. Meaning the vectors actively disagree rather than represent similar items.

Interpretation:

Positive = vectors point in similar directions
Negative = vectors point in opposite directions
Higher absolute value = stronger alignment (considering magnitude)

When to use dot product:

Recommendation systems where magnitude represents importance (e.g., user engagement levels)
Collaborative filtering
Applications where scale matters (like activity frequency)
When your embedding model was specifically trained with dot product loss

Why magnitude matters here: In recommendations, a user who watched 100 action movies is different from one who watched 10, even if their taste (direction) is the same. The dot product captures this intensity.

3. Euclidean Distance

What it measures: The straight-line distance between vectors in space, like measuring with a ruler.

Formula:

distance = √(Σ(xᵢ - yᵢ)²)

Take the difference between corresponding values, square each difference, sum them, and take the square root.

Calculation:

distance = √[(4-3)² + (0-0)² + (1-1)²]
         = √[1 + 0 + 0]
         = √1
         = 1

The Euclidean distance is 1.

A Euclidean distance of 1 means the two vectors are very close in space. They differ in only one dimension, by a value of 1, while all other dimensions are identical.

Smaller Euclidean distance ⇒ higher similarity. Distance 0 would mean the vectors are identical.

Interpretation:

Distance of 0 = identical vectors
Larger distance = more different
Considers both direction and magnitude

When to use Euclidean distance:

Clustering and anomaly detection
Applications where absolute differences in feature values matter
Count-based features (e.g., frequency of events)
Spatial data

Why it's less common for text: Euclidean distance treats the "repeated text" example (A vs 2×A) as different, even though they mean the same thing. For text, this is usually wrong.

Why Cosine Similarity Is Standard for Text

Let's revisit our "repeated text" problem:

200-word essay about the moon
20-word paragraph about the moon

Same topic = same direction in semantic space
Different lengths = different magnitudes

If we use magnitude-sensitive metrics (dot product or Euclidean):

Result: 200 vs 20 = far apart = "different" ❌ WRONG

If we ignore magnitude (cosine similarity):

Result: Same direction = "similar" ✅ CORRECT

The rule: Use cosine similarity for text embeddings because length doesn't affect meaning.

This is why every RAG tutorial you'll see uses cosine similarity by default. It's the mathematically correct choice for semantic meaning.

When to Use Each Metric

Here's the decision tree:

Cosine similarity:

Text similarity, document comparison, semantic search
When document length varies
When you care about meaning, not scale
Default choice for RAG systems

Dot product:

Recommendation systems
Collaborative filtering
When magnitude represents importance (e.g., user activity levels)
When your embedding model was trained with dot product loss

Euclidean distance:

Clustering
Anomaly detection
When absolute differences in feature values matter
Count-based features and spatial data

For 90% of text-based AI applications, cosine similarity is the answer.

How Embeddings Are Created

You don't usually train embedding models from scratch. You use pre-trained ones. But here's the general process:

Choose or train an embedding model: Pick a model suited for your data (Word2Vec, BERT, GloVe for text; VGG, ResNet for images)
Prepare your data: Format it for the model (tokenize text, resize images, etc.)
Load or train the model: Use pre-trained weights or train on your data
Generate embeddings: Input your data, get back vectors
Integrate into your application: Use embeddings for similarity search, clustering, recommendations, etc.

The key idea: embeddings learn by co-occurrence. If "king" and "queen" appear in similar contexts millions of times during training, their vectors end up close together. That's how the model learns semantic relationships.

Real-World Example: Semantic Search in Action

Let's say you're building a support chatbot. A user asks:

"How do I recover my account?"

Your knowledge base has these documents:

"Account Recovery Procedures"
"Password Reset Instructions"
"Billing and Invoicing Guide"

With keyword search:

Looks for "recover" and "account"
Misses documents 1 and 2 (different words)
Returns nothing useful

With embeddings + cosine similarity:

Embed the query: "How do I recover my account?" → vector Q
Embed all documents → vectors D1, D2, D3
Calculate cosine similarity:
- cos(Q, D1) = 0.82 ← High! "Recovery" captures the intent
- cos(Q, D2) = 0.79 ← High! "Reset" is semantically close to "recover"
- cos(Q, D3) = 0.23 ← Low, unrelated
Return documents 1 and 2

This works because: The embeddings learned that "recover," "reset," "restore," and "regain access" are semantically related, even though they're different words.

Common Mistakes

Mistake 1: Using the wrong similarity metric

Don't use Euclidean distance for text just because it sounds familiar. Cosine similarity is almost always the right choice.

Mistake 2: Thinking embeddings are reversible

You cannot convert an embedding back into the original text. Embeddings are lossy representations. They preserve semantic meaning, not exact wording.

Mistake 3: Ignoring the magnitude vs direction distinction

If you're comparing text and magnitude keeps throwing off your results, switch to cosine similarity. If you're building recommendations and ignoring magnitude loses important information, use dot product.

Mistake 4: Assuming "similar" means 0.9+ scores

Real-world diverse content typically scores 0.4-0.6 for within-topic similarity. Only near-paraphrases hit 0.7-0.9. Unrelated content scores -0.1 to 0.2. Adjust your expectations.

Things to Ponder

Take a moment to think through these. They're designed to check if the core ideas stuck, and you'll find the answers in what we covered above.

Two documents: "The sky is blue" and "The sky is blue. The sky is blue." If you embed both and measure similarity, which metric will treat them as identical? Which will treat them as different? Why?
You're building a music recommendation system. User A listened to Song X 100 times. User B listened to it 10 times. Both users love the same genre. Should you use cosine similarity or dot product to compare them? What signal would you lose with the wrong choice?
A legal document has the sentence "grounds for eviction pursuant to lease violation." A user searches "can my landlord kick me out?" Using cosine similarity, would you expect a high or low score? What's missing that would improve the match?
You embed 1 million documents and store them in a vector database. Each embedding has 1536 dimensions (floats). Roughly how much storage do you need? What if you switch to 384 dimensions?
Two embeddings: [0.5, 0.5] and [0.7, 0.7]. They point in the exact same direction but have different magnitudes. What will their cosine similarity be? What will their Euclidean distance be?

Key Takeaways

Embeddings are GPS coordinates for meaning. They convert text, images, and other objects into vectors that capture semantic relationships.

Vectors have two properties: direction (semantic meaning) and magnitude (scale). For text, direction is what matters.

Cosine similarity measures direction only, making it ideal for text. Dot product considers magnitude too, useful for recommendations. Euclidean distance measures straight-line distance, best for clustering.

Use cosine similarity for semantic search and RAG systems. It's the standard for a reason.

Real-world similarity scores are lower than you'd expect: 0.4-0.6 is normal for related content, 0.7+ is for near-duplicates.

Embeddings can't be reversed into original text, but they preserve semantic intent. You can infer what something is about, not what it said word-for-word.

Think of embeddings as the translation layer between human meaning and machine math. Get this right, and your AI stops being a fancy keyword matcher and starts actually understanding what users want.

Want to discuss this further or have questions? Hit me up on LinkedIn.

What Are Embeddings and How Vector Similarity Actually Works

The Problem

Core Concept: Embeddings as Meaning Coordinates

What Objects Can Be Embedded?

What Is a Vector? (The Building Block)

Adding dimensions for more precision

From cities to concepts

Understanding Vector Dimensions

Vector Similarity: The Foundation

Understanding Magnitude vs Direction

The Critical Question for Text Embeddings

The Three Similarity Metrics

Example Setup: Comparing Fruits

1. Cosine Similarity (Most Common for Text)

2. Dot Product

3. Euclidean Distance

Why Cosine Similarity Is Standard for Text

When to Use Each Metric

How Embeddings Are Created

Real-World Example: Semantic Search in Action

Common Mistakes

Things to Ponder

Key Takeaways

Comments

LLM Foundations

Choosing Embedding Models and Dimensions: Why 1536 Isn't Always Better Than 384

More from this blog

Anatomy of a Prompt — System, User, and Assistant Explained

Choosing Embedding Models and Dimensions: Why 1536 Isn't Always Better Than 384

How Tokenization Works: BPE and the Algorithm Behind Your LLM

What Are Tokens and Why Your LLM Bill Depends on Them

Command Palette

The Problem

Core Concept: Embeddings as Meaning Coordinates

What Objects Can Be Embedded?

What Is a Vector? (The Building Block)

Adding dimensions for more precision

From cities to concepts

Understanding Vector Dimensions

Vector Similarity: The Foundation

Understanding Magnitude vs Direction

The Critical Question for Text Embeddings

The Three Similarity Metrics

Example Setup: Comparing Fruits

1. Cosine Similarity (Most Common for Text)

2. Dot Product

3. Euclidean Distance

Why Cosine Similarity Is Standard for Text

When to Use Each Metric

How Embeddings Are Created

Real-World Example: Semantic Search in Action

Common Mistakes

Things to Ponder

Key Takeaways

Comments

LLM Foundations

Choosing Embedding Models and Dimensions: Why 1536 Isn't Always Better Than 384

More from this blog