Scaling Call Log Search Without Slowing Down: Why We’re Moving Away from RDS

When you handle millions of phone calls every day, you build up a lot of data. At Lynes, we process massive volumes of call logs — and that number is growing fast. Our customers rely on being able to access this data instantly, whether it’s to follow up on a missed call, replay a conversation, or search through past history.

But even in the cloud, scale eventually starts to bite.

The Problem: IOPS, Cost, and Scale

Until recently, we were running our call log database on Amazon RDS, backed by PostgreSQL. RDS is reliable and easy to manage — but as our dataset grew, so did our pain points.

The bottleneck wasn’t CPU or memory. It was storage IOPS.

Every search that touched older data triggered cold reads from disk, pushing us up against our IOPS limits (around 15K). Scaling IOPS on EBS volumes is possible — but it’s also expensive. And since this backend cost isn’t something we can charge for directly, that made it hard to justify. We still wanted blazing-fast search across older call logs — just without compromising speed or racking up unscalable costs.

It was time for a different approach.

The Solution: Self-Managed PostgreSQL on Kubernetes

We moved our call log storage to a Kubernetes-based PostgreSQL cluster using fast NVMe-backed EC2 instances — and the difference is dramatic.

Compared to our previous RDS setup, this new cluster gives us:

~20x the IOPS (from 15K to ~250K)
2x the CPU and memory per node
All at ~10% lower cost overall

To be clear: RDS is still a great default choice in most cases. It's reliable, low-maintenance, and integrates well with AWS tooling. We only chose to move away because we were already running the rest of our infrastructure in Kubernetes, and our team has the operational maturity to manage it effectively. Manual tuning wasn’t easy, but it was worth it — for our scale and performance goals.

‍

Evolving Our Architecture — Simply

Previously on RDS, we had a basic primary-replica setup — but all reads and writes still went to the primary node. The replica existed purely for failover.

With the new setup, we’ve leveled up:

One primary node handles writes (and some reads)
Two replica nodes handle all search queries
Any replica can take over as primary in a failover scenario

This setup is still simple, but now it’s optimized — both for resilience and for high-throughput search.

‍

A Smoother Transition Than Expected

Originally, we planned a gradual migration away from RDS. But in practice? The transition went smoother than we hoped. We simply switched over to the new cluster — and once everything was verified to be working, we deleted the old RDS setup.

Sometimes the cleanest migrations are the best ones.

‍

Building for Resilience

All new call log writes are automatically streamed and backed up to S3 via Firehose, with tagging and versioning. That means we have a reliable safety net, even during infrastructure changes.

And if PostgreSQL ever goes down temporarily, we can still serve recent logs from S3 for simple lookups — ensuring customers have access to critical data even in edge cases. This multi-layered approach delivers near-zero RPO and multiple independent restoration paths - both prior and following its ingestion - significantly increasing the system’s resilience.

‍

Looking Ahead: Smarter Retention with TimescaleDB

We’ve already installed the TimescaleDB extension in our cluster — not for immediate use, but for what’s coming.

As our call log history deepens, we want a smarter solution than simply deleting older rows. TimescaleDB’s automatic partitioning, compression, and potential support for vector data will help us manage long-term retention gracefully and efficiently.

When the time is right, it’ll allow us to keep more history available — without the performance tradeoffs.

‍