This morning starting at 5:15am outbound emails were queued in our system for all customers. We started processing emails at 7:45am, and by 8:43am all queues were clear and sending was back to normal. Normally we deal with issues like this swiftly, but this morning some of our alerting failed, so it took us way too long to become aware of the issue and start working on it.
We know that availability and speed are Postmark’s most important features. It’s the guiding principle that drives everything we do, and when we don’t deliver on that, we take it extremely seriously.
Bottom line: we failed you this morning. We are sorry, and we’re going to make it right. Here’s what we plan to do:
- We are going to review our alerting workflows to make sure that every scenario and possible failure point is covered. Our alerting frequently tells us when something is wrong before we hear from a single customer, but clearly today that didn’t happen.
- We are going to address the root cause of the delays, and automate our capacity better.
- We are going to make improvements to our status page to be clearer about when we are accepting messaging and queueing them. It wasn’t right that the status page looked like messages were only slightly delayed when in fact they were severely delayed.
You rely on Postmark to send your most critical emails. Doing that reliably is the motivation that continues to get our team out of bed every morning. If you have any questions about this morning’s incident, or what we’re doing about it, please let us know at email@example.com.