We’re doubling down on Elasticsearch!

As Postmark has grown over the years, we’ve hit limits with our architecture that has impaired our ability to grow. In order to get rid of those limits, we’ve taken a fresh look at our architecture and decided to double down on Elasticsearch. Read on to find out what our problems were, how Elasticsearch solves them, and how this lets us make Postmark better.

The platypus architecture #

I think all successful systems have a bit of platypus architecture. It’s evolved over time, it works, but when you take a step back and look at it…it looks a little strange. There’s some extra parts that you might not really need.

Our activity database is used to store every single email we send out, each bounce we receive, each inbound message we process, and more. It is the backend behind our activity feed on the website and also powers a lot of our API calls that retrieve information. Postmark started using just MongoDB, then added Elasticsearch for better search functionality. When MongoDB no longer suited our needs we swapped MongoDB for Cloudant but kept Elasticsearch as the power behind all our searches. However, issues with keeping the datastore and search in sync kept occurring. Another issue was our need to clean up a large amount of data every day.

We wanted to take a fresh look. At our company retreat in September, the team sat down and decided to use solely Elasticsearch as our database for messages.

Here’s why we chose Elasticsearch #

Less operational complexity #

Keeping two databases in sync is a difficult problem and has been a significant cause of downtime in the past. Additionally, keeping two database clusters supported is not insignificant, in terms of software and hardware. By moving to a new Elasticsearch cluster, we’ve reduced the overall number of servers needed. By only needing to write to Elasticsearch, we no longer have to keep databases in sync.

Confidence and experience #

We really have a lot of confidence in Elasticsearch and recent updates have added important features. The ecosystem is great around it and assures we’ll be able to install it on our own hardware for years.

Additionally, our team is now pretty familiar with Elasticsearch and its idiosyncrasies. Production experience with a database is invaluable.

Great support for our data model #

We keep a rolling 45-day window of events. If you send an email today, that means you’ll be able to search and view that email for 45 days. This means we have a high number of inserts and deletes every day. With Elasticsearch, we chose to create a new index every day. Features such as index aliases and index templates made that easy to do.

Performance improvements for customers #

API calls that used to hit two databases like getting a single bounce, now only use one. This (predictably) had much better response times.

A graph showing a decrease in average response time from about 100ms to 25ms.

Even without changing any of our queries on our Rails app, our new Elasticsearch cluster greatly improved loading times on our activity page.

A graph showing the average response time for the activity page decrease from 2000ms to 500ms.

We’re also looking at other improvements to the customer experience, like increasing the amount of time we store messages for.

We have a lot to share #

We’ve learned a lot in the last six months and want to share it with everyone. Topics like provisioning and configuration with chef, how the entire cluster is built on top of SmartOS and the effort we put into the code base and Elasticsearch config to not only make it a smooth migration, but optimized for performance and reliability. Expect to see more blog posts in the weeks to come. If you have a specific topic you’re interested in, leave a comment or send us an email.