Skip to main content

Command Palette

Search for a command to run...

Why CQRS Was Conceived: When Write-Optimized Databases Are Asked to Read

What happens when you ask a sprinter to run a marathon while juggling spreadsheet

Updated
9 min read
Why CQRS Was Conceived: When Write-Optimized Databases Are Asked to Read

Most OLTP systems are designed to do one thing well: process fast, transactional writes with consistency.
And they’re brilliant at it.

But as systems scale, someone always says:

“Let’s run this report from prod — just once.”
And then it begins.


What Write-Optimized Systems Are Good At

Before we discuss how they break, let’s be clear on what these systems were actually built for.

Write-optimized systems (typically OLTP databases) are designed for:

  • Frequent, high-speed inserts and updates

  • Small, row-based transactions

  • Strong consistency and isolation guarantees

  • Quick point lookups via indexes

They’re used for:

  • User signups

  • Order placements

  • Session updates

  • Inventory tracking

  • Payment writes

Their architecture is tuned for:

  • Row-oriented storage (great for writing and fetching individual records)

  • Strict durability (ACID)

  • Indexes optimized for key-based access

  • Fast commit paths, often with write-ahead logs (WAL)

📌 Examples: PostgreSQL, MySQL, Oracle, MongoDB (in transactional mode), DynamoDB

These systems are the backbone of real-time apps. But that optimization comes with tradeoffs. The moment you start asking them to behave like an analytics engine — things get uncomfortable.


A Quick Analogy That Actually Works

If OLAP systems are scholars trained to analyze mountains of data, OLTP systems are cashiers — optimized for rapid fire, transactional speed.

OLTP systems are built for thousands of tiny questions, not one massive one.
Ask them to “give me average revenue per user over the last 12 months across 6 dimensions,” and they panic.


What Goes Wrong When You Add Heavy Reads

Everything’s fine… until it isn’t.

Your write-optimized database is humming along — processing transactions, updating rows, handling concurrency. Then someone adds:

  • a dashboard

  • a complex join

  • a BI tool like Tableau

  • or a weekly report that crunches half the tables

And suddenly, things start to crack. But not all at once. These failures creep in silently.

Let’s break it down.


1. Joins and Aggregations Get Slower and Slower

OLTP systems can handle joins — that’s not the problem.
The problem is when you start doing large joins across multiple tables, with filters, group-bys, or date ranges spanning months.

  • Query planner starts picking suboptimal paths

  • Indexes become less useful

  • Temp tables overflow memory

  • Disk-based sorts kick in

📌 Example: A team tries to compute total order value per user from 6 normalized tables in Postgres. The query works — but now takes 28 seconds and locks rows under the hood.


2. Long-Running Queries Block Fast Transactions

Transactional DBs need to lock rows to guarantee consistency.
A long-running analytical query might:

  • trigger sequential scans

  • hold row or page locks for too long

  • delay short-lived inserts or updates

Now your login API is waiting on a BI tool’s export query.
Your app isn’t slow because writes are heavy — it’s slow because reads are greedy.

📌 Example: A SELECT with multiple joins runs for 45s in MySQL. Meanwhile, users trying to update their profiles get stuck waiting on locks from that read.


3. Buffer Pool Gets Trashed

OLTP databases use memory buffers to keep hot rows cached.
Heavy analytical reads:

  • Evict working sets

  • Blow past cache thresholds

  • Cause cold lookups for actual user requests

You’re basically paying read traffic to erase your write performance.

📌 Example: A product analytics dashboard queries 12 months of sales data in DynamoDB. Hot partitions get flooded, read capacity throttles, and write throughput tanks.


4. Index Bloat and Contention

To serve both reads and writes, teams often keep adding indexes:
“Oh, this query is slow? Let’s index created_at, status, and region.”

You keep doing this — until:

  • Writes slow down because every insert now updates 5 indexes

  • Index rebuilds take hours

  • Query plans conflict over which index to use

📌 Example: A Postgres table ends up with 9 indexes. Inserting 100K rows now takes 3× longer than before — and most queries still don’t use the right index due to planner confusion.


5. Replication Lag Creeps In

To “offload” reads, teams point BI tools to replicas.
But read load causes:

  • I/O spikes on replicas

  • Slower WAL replay

  • Delays in syncing new writes

Now dashboards are running on stale data, and no one knows it until reports show the wrong totals.

📌 Example: A read replica in MySQL lags 10 minutes behind during end-of-day reporting. Users start seeing incomplete financial data on the live dashboard.


Real Scenarios That Trigger This Trap

No one sets out to misuse their transactional database.
It usually starts with a small ask.

Then one day, the system is slow, replicas are drifting, queries are locking up — and everyone’s confused.

Let’s look at how this unfolds in the real world.


1. “Let’s Just Run This Report From Prod”

It starts simple. Someone in ops or finance asks:

“Can we just get last month’s revenue by region?”

You run a query directly on prod. It works.
Then they ask for daily breakdowns, customer segments, product lines.

Soon:

  • The query runs for 12 minutes

  • It causes row locks on the orders table

  • Your insert latency doubles

  • Everyone blames the app team

📌 Seen in: E-commerce platforms running Postgres/MySQL for order tracking and trying to layer analytics on the same tables.


2. BI Tool Pointed at Primary DB

Someone connects Tableau, Metabase, or PowerBI to your main database.
It’s just for “exploration.” Then they publish a dashboard that:

  • auto-refreshes every 15 minutes

  • scans millions of rows

  • triggers unindexed joins

Now your OLTP box is getting shredded by analytics traffic.

📌 Seen in: SaaS startups exposing internal analytics directly from their core transactional DB. Metrics start breaking, and infra costs jump overnight.


3. Devs Run Weekend Batch Jobs on Prod

A backend engineer runs a data export job over the weekend.
It’s meant to be a one-time batch — joining users, activity logs, and subscriptions.

Monday morning:

  • The replica is 2 hours behind

  • Auto-scaling was triggered unnecessarily

  • Marketing notices inconsistencies in campaign data

📌 Seen in: Subscription billing systems where heavy reports affect invoicing accuracy or quota checks.


4. Real-Time Dashboards Run on OLTP

Product says:

“We want a live dashboard showing revenue, sessions, and user retention — in real time.”

You build it on top of the existing system.
It works for the first 10 users. Then:

  • Query latency rises with traffic

  • Write performance starts jittering

  • API endpoints hit cold cache frequently

  • You patch with Redis, but it’s already messy

📌 Seen in: Ad-tech, gaming, or media platforms that require real-time analytics layered over the same tables used for transactions.


5. OLTP Abuse Disguised as “One-Time Fixes”

You need to fix missing data or backfill a value.
Someone writes a script that scans a million rows, updates each one, and runs it on prod.

The script runs fine in staging.
On prod, it:

  • Blocks the row-level lock queue

  • Fills the write-ahead log

  • Brings down the replica

  • Triggers a full recovery job

📌 Seen in: Every company, at least once.


Pattern Summary

TriggerWhat Happens
Reports on prodLong-running reads lock up writes
BI tools on primary DBUncontrolled scans eat CPU and cache
Batch jobs on live tablesReplica lag, I/O spikes, inconsistent state
Dashboards on OLTPReal-time load disrupts user-facing transactions
Manual scripts & fixesAccidental downtime, missed writes, WAL bloat

I Know What You’re Thinking: “Can’t We Just Hack Around This?”

This is where smart teams get trapped.

You’ve seen the reads clogging your OLTP system. You know something’s off. But before touching architecture, you try to patch it.

Fair. That’s what engineers do.

But most of these “fixes” don’t solve the problem — they just delay the explosion.

Let’s go through the most common ones.


“Can’t we just add more replicas?”

At first glance, it makes sense: route reads to replicas, keep writes isolated.

What really happens:

  • BI and dashboard traffic hit the replica hard

  • Replication starts lagging

  • Now your reads are based on stale data

  • You introduce confusion: “Which replica is accurate?”

  • And if any write-dependent logic leaks into read queries — chaos

📌 Seen in: Teams running Postgres/MySQL with read replicas for dashboards, only to have lag exceed SLA during peak hours.


“We’ll just run reports at night!”

Sure, if your business doesn’t need real-time insights. But...

What really happens:

  • You delay insights that product, ops, or finance need now

  • People start running them anyway during the day

  • Your system becomes “read-heavy by surprise” — again

📌 Seen in: Subscription and B2B platforms where night-time aggregation broke because users still used the system after midnight.


“Let’s add a caching layer!”

Caching works great for repeated queries. But the moment you introduce:

  • Ad hoc filters

  • Joins across tables

  • Time-based rollups

  • Real-time updates

…your cache becomes stale, fragmented, or useless.

Also: caching isn’t free. It adds complexity, invalidation logic, and risk.

📌 Seen in: Teams using Redis or Memcached to serve dashboards, only to spend more time fixing stale cache bugs than solving real problems.


“Can’t we just scale vertically?”

Yes. For a while.

Add more CPU. Add more RAM. Bump your RDS tier.

But this just buys time. Your joins are still locking rows. Your queries still fight for IO. The system is still built to serve transactions — not scans.

📌 Seen in: Teams upgrading Postgres from t3.medium → r6g.2xlarge → r6g.8xlarge… and still ending up with 2-minute queries and cold caches.


And finally: “Let’s just add more indexes.”

This one’s a classic. Query is slow? Add an index.

You do this enough times, and now:

  • Writes slow down

  • Index rebuilds take hours

  • Insert latency jumps

  • Query planner gets confused

  • Your table has 11 indexes, and no one knows which ones matter

📌 Seen in: Any long-lived OLTP system that’s been touched by multiple devs over the years.


The Common Denominator

Every fix above tries to make one system do two jobs.
But no amount of hardware, caching, or replicas will change the fact that:

A system optimized for fast, safe writes is fundamentally not designed to support complex, high-volume reads.

It’s not about tuning anymore. It’s about role clarity.


Closing Argument: Why Write-Optimized Databases Can’t Handle Reads

Write-optimized systems are built for speed, consistency, and isolation.
They excel at handling small, frequent, transactional updates with predictable structure and low latency.

But the moment you introduce read-heavy workloads — analytical joins, dashboards, aggregations, long-running scans — they start to break.

And not loudly. Silently.

You don’t get failures.
You get:

  • Contention between reads and writes

  • Memory churn from unbounded scans

  • Index bloat from trying to support both paths

  • Replication lag you can't fix

  • Latency spikes that show up only under pressure

You end up with a system that still works, but no longer works predictably.

It was never designed to answer complex questions — only to record decisions safely and quickly.

That’s the real lesson here.

If you keep asking a write-optimized database to behave like an analytics engine, you’ll either compromise the integrity of your transactions — or bury your reads under layers of caching, replicas, and retries.


📌 In the next post, we’ll stop patching around this mismatch — and start designing for it.

👉 Coming soon: “Why One System Cannot Serve Two Masters” → The architectural split that made CQRS inevitable — not as a pattern, but as a solution to workload conflict.

Why CQRS Was Conceived

Part 6 of 7

Not another “what is CQRS” series. This one shows why it became necessary — through real-world failures, overloaded systems, and architectural pressure that forced teams to split reads and writes just to keep systems alive.

Up next

Why CQRS Was Conceived: When Read-Optimized Databases Are Asked to Write

These systems are great at answering questions, but terrible at handling orders.