Performance Testing at Postmark

As Postmark is growing, fast and reliable email delivery is becoming a heavier task. At the same time performance testing is more fun. The more emails we are sending, the more details we need to watch out for to preserve the same quality of service.

We have been performance testing Postmark for a long time. Postmark is growing fast and as a result, performance tests are becoming more frequent and more complex.

I would like to share with you a small insight on how we performance test Postmark.

Patient History #

Before we dive into performance testing details, let’s talk about the importance of keeping a history of all your performance tests.

In medicine, the more you know about your patient, the better understanding you have about their health. The same rule applies when performance testing.

The more we know about how Postmark’s lungs, heart, muscles work, the better we can measure performance.

Besides Postmark documentation on its architecture, the best way to get to know about Postmark’s performance is by keeping track of all performance tests that have completed - tracking history.

In our case we have started performance testing by measuring something as simple as sending 10,000 emails, with a 10kb content size.

We measured how much time it takes to send these emails out of Postmark on our staging environment and stored the results. With every update that can affect performance, we would monitor the test and compare the old and new results.

As we tested more, we have measured more data, like time to send emails to Postmark, response times, time to update statuses of all sent emails and more. I will talk about the measures we used later on in this post.

We have a separate performance test history for production and for staging environments. We test on production only when it is absolutely necessary and when traffic is low, so the impact on our customers and on the tests is minimal.

Besides keeping track of all our performance tests, we are tracking the health of Postmark all the time. To read more about how we do this from QA side, check out our article about automated testing. Automated tests also play a very important role in tracking Postmark’s behaviour.

Monitoring systems health is a different topic though, so we will leave it for future articles about performance.

The type of tests we perform #

When performance testing, we do a combination of different tests. Some of them include:

  • Load testing
  • Spike testing
  • Stress testing

The idea behind all of the above tests is the same. In its essence, what differentiates them is the size of the load, speed at which load is introduced and duration of that load.

The most frequent performance tests we do are load and spike tests. Stress testing is something we never do on production. If we need to do high load testing on production, we do it by spike testing in low traffic times.

Test environments we use #

One of the biggest problems when performance testing is having an adequate environment to run the tests on. Ideally you would run the tests in a production environment. I would not recommend this though since you could kill your production environment. You could also get inaccurate results since it's a live system, in case you are measuring performance of one particular account and not the system as whole.

We performance test in our staging environment as much as we can, and only perform isolated tests in production.

Our workflow #

When I say we do most of our performance tests in our staging environment, I think of higher load spike and stress tests. We do the standard load tests in production too, since they don’t affect performance of the users of the system and occasionally we do lower load spike testing in production as well.

Our usual performance testing workflow looks like this:

  1. load test on staging (new branch with new feature)
  2. spike/stress test on staging (new branch with new feature)
  3. compare the results to the results on stable branch (matches the code we currently have in production)
  4. redo the tests until they are better or same as on stable branch (depending on the goal)
  5. load test on production
  6. track in detail the health of the system over the next 24 hours

How we test #

In order to performance test any big web application, you need to take into account the limitations which affect the testing.

For us, the most important limitations are environment limits (IO/CPU/RAM, Network bandwidth) and Postmark API/SMTP limits.

Limits are very important in order to execute the correct type of performance tests.

For example, you would not try to performance test email by sending emails with 1Gb attachments, if your smtp/api will reject this large attachments. Also you would not fire up JMeter test with 1000 threads in which every one of them is reading 100mb email files on a machine which has 8Gb of RAM. You would probably have issues with both IO and RAM. JMeter would take up all RAM instantly and stop your test.

Once you know the limits of the environment you use for testing, and limits of the tested system, it's time for performance testing.

My usual method of performance testing Postmark is by finding first the lower limit at which performance is measurable. For example, I could send 1000 emails to Postmark and measure performance, but I would not get any value from it since sending 1000 emails is so fast that the emails would not even have the time to stay for a bit in the queue, even on staging. What I do is I search for the sweet spot where I can see whether the sending email time decreased/increased in comparison to an older stable branch.

In order to find the sweet spot I test different email sizes, email attachments, email attachments sizes, email volume, etc.

There are couple of different points of interests which you should take into account when you are testing email sending. I will break them up in couple of different groups and explain the ones I think are the most important ones.

Email related points of interest

The first and most important point of interest is of course the actual email you are sending. Every type of email you send will affect performance differently so you should focus on the following points of interest related to emails you would like to use for performance testing:

  • content size
  • content type (html/text/multipart)
  • number of attachments
  • size of the attachments
  • number of recipitents

Sending related points of interest

Once you figured out the type of email you will be sending, you should focus on the sending which has the following points of interest:

  • Maximum number of email senders which will send email at once(threads/users)
  • Number of times to send email
  • Total number of emails being sent
  • Type of sending API/SMTP
  • Ramp up period

Sending points of interest are related to how you will be sending your emails, and would mostly differentiate your testing between Load/Stress/Spike testing.

The above points of interests should be enough for you to start doing performance tests, but once you start testing you would also need to know what to measure.

Time related points of interest

Once you initiate your performance test, you would need to track all sorts of data. With Postmark, we started performance testing by simply measuring the time from when we started sending email until the email is sent out of Postmark. Today we have many different points of interest we measure. For instance:

  • time to send all emails to Postmark
  • time to see all emails in UI
  • time to send all emails out of Postmark
  • time to update all email statuses in UI
  • response times for each email sent
  • average response times

These are the most important time points of interests I am watching for. There are other ones too, which are related to the backend, and are monitored by our developers.

Validation related points of interest

The last one, but also very important points of interest are validation related. Once you send 10,000 emails, you would want to be sure that the emails are sent. We don’t send the emails to the outside world during testing. We catch them on our side. As we do that, we need to make sure we sent all emails, and that responses are correct for all emails, so we monitor following points of interest:

  • email response messages
  • statistics of every email sent
  • statistics from every status update for emails from queue->sent

With having email, sending, time and validation related points of interest in mind, we have all the information needed to do detailed performance tests.

Summary #

Performance testing is very challenging, but at the same time a very interesting and important aspect of the software development cycle.

When performance testing emails I suggest to:

  • Measure everything you can and as much as you can
  • Find the lower limit at which performance degrades and monitor it
  • Test as much as possible on your staging environment, keep in mind not to break production if you test there
  • Make sure to thoroughly performance test before releasing to production
  • Monitor performance after release to production
  • Track history of your performance tests
  • Start with simple performance tests and then expand

I hope that the information provided here will be useful when you performance test email sending.

Let me know what you think and feel free to share your thoughts and details on how you do performance testing.