Why CQRS Was Conceived: When Write-Optimized Databases Are Asked to Read
What happens when you ask a sprinter to run a marathon while juggling spreadsheet

Most OLTP systems are designed to do one thing well: process fast, transactional writes with consistency.
And they’re brilliant at it.
But as systems scale, someone always says:
“Let’s run this report from prod — just once.”
And then it begins.
What Write-Optimized Systems Are Good At
Before we discuss how they break, let’s be clear on what these systems were actually built for.
Write-optimized systems (typically OLTP databases) are designed for:
Frequent, high-speed inserts and updates
Small, row-based transactions
Strong consistency and isolation guarantees
Quick point lookups via indexes
They’re used for:
User signups
Order placements
Session updates
Inventory tracking
Payment writes
Their architecture is tuned for:
Row-oriented storage (great for writing and fetching individual records)
Strict durability (ACID)
Indexes optimized for key-based access
Fast commit paths, often with write-ahead logs (WAL)
📌 Examples: PostgreSQL, MySQL, Oracle, MongoDB (in transactional mode), DynamoDB
These systems are the backbone of real-time apps. But that optimization comes with tradeoffs. The moment you start asking them to behave like an analytics engine — things get uncomfortable.
A Quick Analogy That Actually Works
If OLAP systems are scholars trained to analyze mountains of data, OLTP systems are cashiers — optimized for rapid fire, transactional speed.
OLTP systems are built for thousands of tiny questions, not one massive one.
Ask them to “give me average revenue per user over the last 12 months across 6 dimensions,” and they panic.
What Goes Wrong When You Add Heavy Reads
Everything’s fine… until it isn’t.
Your write-optimized database is humming along — processing transactions, updating rows, handling concurrency. Then someone adds:
a dashboard
a complex join
a BI tool like Tableau
or a weekly report that crunches half the tables
And suddenly, things start to crack. But not all at once. These failures creep in silently.
Let’s break it down.
1. Joins and Aggregations Get Slower and Slower
OLTP systems can handle joins — that’s not the problem.
The problem is when you start doing large joins across multiple tables, with filters, group-bys, or date ranges spanning months.
Query planner starts picking suboptimal paths
Indexes become less useful
Temp tables overflow memory
Disk-based sorts kick in
📌 Example: A team tries to compute total order value per user from 6 normalized tables in Postgres. The query works — but now takes 28 seconds and locks rows under the hood.
2. Long-Running Queries Block Fast Transactions
Transactional DBs need to lock rows to guarantee consistency.
A long-running analytical query might:
trigger sequential scans
hold row or page locks for too long
delay short-lived inserts or updates
Now your login API is waiting on a BI tool’s export query.
Your app isn’t slow because writes are heavy — it’s slow because reads are greedy.
📌 Example: A SELECT with multiple joins runs for 45s in MySQL. Meanwhile, users trying to update their profiles get stuck waiting on locks from that read.
3. Buffer Pool Gets Trashed
OLTP databases use memory buffers to keep hot rows cached.
Heavy analytical reads:
Evict working sets
Blow past cache thresholds
Cause cold lookups for actual user requests
You’re basically paying read traffic to erase your write performance.
📌 Example: A product analytics dashboard queries 12 months of sales data in DynamoDB. Hot partitions get flooded, read capacity throttles, and write throughput tanks.
4. Index Bloat and Contention
To serve both reads and writes, teams often keep adding indexes:
“Oh, this query is slow? Let’s index created_at, status, and region.”
You keep doing this — until:
Writes slow down because every insert now updates 5 indexes
Index rebuilds take hours
Query plans conflict over which index to use
📌 Example: A Postgres table ends up with 9 indexes. Inserting 100K rows now takes 3× longer than before — and most queries still don’t use the right index due to planner confusion.
5. Replication Lag Creeps In
To “offload” reads, teams point BI tools to replicas.
But read load causes:
I/O spikes on replicas
Slower WAL replay
Delays in syncing new writes
Now dashboards are running on stale data, and no one knows it until reports show the wrong totals.
📌 Example: A read replica in MySQL lags 10 minutes behind during end-of-day reporting. Users start seeing incomplete financial data on the live dashboard.
Real Scenarios That Trigger This Trap
No one sets out to misuse their transactional database.
It usually starts with a small ask.
Then one day, the system is slow, replicas are drifting, queries are locking up — and everyone’s confused.
Let’s look at how this unfolds in the real world.
1. “Let’s Just Run This Report From Prod”
It starts simple. Someone in ops or finance asks:
“Can we just get last month’s revenue by region?”
You run a query directly on prod. It works.
Then they ask for daily breakdowns, customer segments, product lines.
Soon:
The query runs for 12 minutes
It causes row locks on the
orderstableYour insert latency doubles
Everyone blames the app team
📌 Seen in: E-commerce platforms running Postgres/MySQL for order tracking and trying to layer analytics on the same tables.
2. BI Tool Pointed at Primary DB
Someone connects Tableau, Metabase, or PowerBI to your main database.
It’s just for “exploration.” Then they publish a dashboard that:
auto-refreshes every 15 minutes
scans millions of rows
triggers unindexed joins
Now your OLTP box is getting shredded by analytics traffic.
📌 Seen in: SaaS startups exposing internal analytics directly from their core transactional DB. Metrics start breaking, and infra costs jump overnight.
3. Devs Run Weekend Batch Jobs on Prod
A backend engineer runs a data export job over the weekend.
It’s meant to be a one-time batch — joining users, activity logs, and subscriptions.
Monday morning:
The replica is 2 hours behind
Auto-scaling was triggered unnecessarily
Marketing notices inconsistencies in campaign data
📌 Seen in: Subscription billing systems where heavy reports affect invoicing accuracy or quota checks.
4. Real-Time Dashboards Run on OLTP
Product says:
“We want a live dashboard showing revenue, sessions, and user retention — in real time.”
You build it on top of the existing system.
It works for the first 10 users. Then:
Query latency rises with traffic
Write performance starts jittering
API endpoints hit cold cache frequently
You patch with Redis, but it’s already messy
📌 Seen in: Ad-tech, gaming, or media platforms that require real-time analytics layered over the same tables used for transactions.
5. OLTP Abuse Disguised as “One-Time Fixes”
You need to fix missing data or backfill a value.
Someone writes a script that scans a million rows, updates each one, and runs it on prod.
The script runs fine in staging.
On prod, it:
Blocks the row-level lock queue
Fills the write-ahead log
Brings down the replica
Triggers a full recovery job
📌 Seen in: Every company, at least once.
Pattern Summary
| Trigger | What Happens |
| Reports on prod | Long-running reads lock up writes |
| BI tools on primary DB | Uncontrolled scans eat CPU and cache |
| Batch jobs on live tables | Replica lag, I/O spikes, inconsistent state |
| Dashboards on OLTP | Real-time load disrupts user-facing transactions |
| Manual scripts & fixes | Accidental downtime, missed writes, WAL bloat |
I Know What You’re Thinking: “Can’t We Just Hack Around This?”
This is where smart teams get trapped.
You’ve seen the reads clogging your OLTP system. You know something’s off. But before touching architecture, you try to patch it.
Fair. That’s what engineers do.
But most of these “fixes” don’t solve the problem — they just delay the explosion.
Let’s go through the most common ones.
“Can’t we just add more replicas?”
At first glance, it makes sense: route reads to replicas, keep writes isolated.
What really happens:
BI and dashboard traffic hit the replica hard
Replication starts lagging
Now your reads are based on stale data
You introduce confusion: “Which replica is accurate?”
And if any write-dependent logic leaks into read queries — chaos
📌 Seen in: Teams running Postgres/MySQL with read replicas for dashboards, only to have lag exceed SLA during peak hours.
“We’ll just run reports at night!”
Sure, if your business doesn’t need real-time insights. But...
What really happens:
You delay insights that product, ops, or finance need now
People start running them anyway during the day
Your system becomes “read-heavy by surprise” — again
📌 Seen in: Subscription and B2B platforms where night-time aggregation broke because users still used the system after midnight.
“Let’s add a caching layer!”
Caching works great for repeated queries. But the moment you introduce:
Ad hoc filters
Joins across tables
Time-based rollups
Real-time updates
…your cache becomes stale, fragmented, or useless.
Also: caching isn’t free. It adds complexity, invalidation logic, and risk.
📌 Seen in: Teams using Redis or Memcached to serve dashboards, only to spend more time fixing stale cache bugs than solving real problems.
“Can’t we just scale vertically?”
Yes. For a while.
Add more CPU. Add more RAM. Bump your RDS tier.
But this just buys time. Your joins are still locking rows. Your queries still fight for IO. The system is still built to serve transactions — not scans.
📌 Seen in: Teams upgrading Postgres from t3.medium → r6g.2xlarge → r6g.8xlarge… and still ending up with 2-minute queries and cold caches.
And finally: “Let’s just add more indexes.”
This one’s a classic. Query is slow? Add an index.
You do this enough times, and now:
Writes slow down
Index rebuilds take hours
Insert latency jumps
Query planner gets confused
Your table has 11 indexes, and no one knows which ones matter
📌 Seen in: Any long-lived OLTP system that’s been touched by multiple devs over the years.
The Common Denominator
Every fix above tries to make one system do two jobs.
But no amount of hardware, caching, or replicas will change the fact that:
A system optimized for fast, safe writes is fundamentally not designed to support complex, high-volume reads.
It’s not about tuning anymore. It’s about role clarity.
Closing Argument: Why Write-Optimized Databases Can’t Handle Reads
Write-optimized systems are built for speed, consistency, and isolation.
They excel at handling small, frequent, transactional updates with predictable structure and low latency.
But the moment you introduce read-heavy workloads — analytical joins, dashboards, aggregations, long-running scans — they start to break.
And not loudly. Silently.
You don’t get failures.
You get:
Contention between reads and writes
Memory churn from unbounded scans
Index bloat from trying to support both paths
Replication lag you can't fix
Latency spikes that show up only under pressure
You end up with a system that still works, but no longer works predictably.
It was never designed to answer complex questions — only to record decisions safely and quickly.
That’s the real lesson here.
If you keep asking a write-optimized database to behave like an analytics engine, you’ll either compromise the integrity of your transactions — or bury your reads under layers of caching, replicas, and retries.
📌 In the next post, we’ll stop patching around this mismatch — and start designing for it.
👉 Coming soon: “Why One System Cannot Serve Two Masters” → The architectural split that made CQRS inevitable — not as a pattern, but as a solution to workload conflict.






