Getting bounces is now (about) 7 times faster

Building a SAAS product is hard. It’s just as important to improve your existing features as it is to release new ones. The bounce API launched in 2010 and I’ll tell you how we’re still improving.

Knowing what to improve #

The vast majority of my time as a developer is spent in little pieces of code all around our architecture. It’s pretty difficult to get a birds-eye view of a live system to see overall trends. That’s why I love New Relic. It was easy to see that our web servers were spending more time handling the bounce API than the email API.

New Relic showed us email bounce processing was taking more server time than sending email!

That didn’t meet my expectations at all! The email API gets ten times the requests the bounce API gets! This was what targeted the bounce API for improvement.

First attempt: a 20% reduction in response times #

The bounce API is powered by Elasticsearch. Through New Relic, I was able to determine that the vast majority of the time spent was querying Elasticsearch. So, I took a look at the query. It was using a bool query, with a structure like:

{
  query: {
    bool: {
      must: [
        {...term...},
        {...term...},
        {...term...},
        {...term...}
      ]
    }
  }

I knew that filters provided much better performance because they are cached so, where possible, I moved terms from the query to a filter:

{
  query: {
    bool: {
      must: [
        {...term...},
        {...term...}
      ]
    }
  },
  filter: {
    bool: {
      must: [
        {...term...},
        {...term...}
      ]
    }
  }
}

This optimization changed our average response time from 1000ms to 800ms.

Our first test improved our bounce API performance 20%

Second attempt: more research provides benefits #

With this solid change under my belt, I wasn’t planning on making any other changes to this specific query. While doing other research on Elasticsearch, I did see something odd. All the example queries in the Elasticsearch docs use a filtered query. It’s an odd format really and the docs just talk up the benefits of using filters over queries. I knew that already, so I ignored it. But it kept popping up (really, all the examples use this format). So, I tried it.

This is what happened when we started using filters queries with Elasticsearch!

Quite the difference! Our query structure is now:

{
  query: {
    filtered: {
      query: {
        bool: {
          must: [
            {...term...},
            {...term...}
          ]
        }
      },
      filter: {
        bool: {
          must: [
            {...term...},
            {...term...}
          ]
        }
      }
    }
  }
}

Our average response times for the bounce API is now around 140ms.

We’re not resting on our laurels here at Postmark. We will continue to improve our existing features as well as create new ones.