Moving SMTP Listeners to Amazon EC2

Ever since we launched our SMTP listeners feature, something has been bothering me. The feature worked, but it wasn’t exactly how I wanted it to be. Here is how we had it implemented:

  • The mail server (we use Port 25’s fine PowerMTA product) listens on port 25 and accepts connections. Whenever it receives a new message, it passes it to an external program. That is called a pipe delivery mechanism since the external program is a console tool that gets launched by the mail server and the message is piped to its standard input stream.
  • We built a sophisticated management solution that modified the PowerMTA configuration whenever a customer registered a server for SMTP access. While PowerMTA really shines when quickly sending gazillion of emails at the same time, it is not very automatable. We had to edit config files and launch external processes to signal a configuration change and make the sender load the new settings.

Why Move? #

While we had the configuration problem solved, the entire system was still less than ideal. The solution was somewhat clumsy and it wasn’t very scalable. Quickly after releasing the feature we realized that a REST API is pretty nice, but many customers simply prefer to tweak their SMTP settings and not fiddle with third party libraries no matter how hard those tried to mimic the standard mailer libraries out there. We got a lot more SMTP traffic than we originally anticipated and our solution had some serious weaknesses.

  • We were running the SMTP listeners on the same machines that sent the outgoing mail. That created a single point of failure – if the listener dies, the outgoing server dies as well. If we restart the outgoing server, we have to take the listener down too.
  • I, being a stupid .NET programmer, chose to implement the pipe program in C#. C# is a fine language and the CLR is a nice platform, but it is not known for its fast startup times. Imagine getting 1000 requests and having to start the .NET virtual machine 1000 times. I tried all smart tricks like reducing the executable size, using native image generation and whatnot. Still it wasn’t the fastest thing in the world.

What We Did and How We Did It #

Solving this problem required some thinking outside of the Windows/.NET box, so we looked elsewhere. First we needed a mail server that is easy to automate and is free, so that we don’t get to pay for licenses on every listener instance that we launch. Second we needed a faster pipe program. Third we needed to easily migrate our data from the current solution to the new one.

Postfix #

We chose Postfix as our mail server simply because it’s the best mail server out there. Well, we picked it as the one that is easy to get going on a Ubuntu Linux machine and do what we want it to do. When bundled with Dovecot, you can make it authenticate users against a SQL database. That means we don’t have to write to config files and trigger configuration reloads ever again. Last, but not the least, Postfix supports encrypted TLS connections which our current PowerMTA-based solution lacks. We now support the standard STARTTLS SMTP extension which is understood by most of the mailers out there. Some of them, like the Ruby ActionMailer, detect that the servers supports STARTTLS and automatically switch to it for better security.

The coolest thing about all that is that we can host it on a self-contained EC2 micro instance. Then spawn many of those! We now have a bunch of them spread in different EC2 zones.

Lua #

.NET doesn’t run on Linux, so we had to pick something else to write our pipe delivery program. Well, technically it runs on Linux – there is the Mono project. The pipe program is pretty simple and I could easily make it run with Mono. That wouldn’t solve the performance problem though – I’d be swapping one elephant for another when I really needed a flea. So I went searching for something small and fast. That’s how I found Lua.

The Lua programming language is a tiny language that is most used as an embedded scripting language and is very popular amongst game programmers. Ever heard of this thing called World of Warcraft? Well, it uses Lua too. Lua is interpreted, but, being so small and simple, it runs very fast. I needed exactly that – the smallest possible thing that wouldn’t have me do my own memory management (and corrupt memory all over the place). I’ve done enough C++ to hate that with a passion.

I’m always fascinated with learning new programming languages and I really liked Lua. It is pretty easy to grasp as most stuff is organized around a small and simple core. Besides it is also pretty malleable. There is no built-in support for object-oriented programming as most mainstream developers may expect, but you can have your own object system built with tables similar to what most JavaScript users do with prototypes. I picked all that up pretty fast and I was cranking working production code on the first day. It’s really that simple!

The Actual Migration #

We had to build some tools to migrate our existing SMTP credentials configuration to the new SQL-based login scheme. That was pretty straightforward. We set up the new instances and tested them in isolation without letting customer traffic there. Still, we did not want to flip THE SWITCH and send all traffic to the new instances. We are only human and there was some good chance we got something wrong. That is why we used Dynect to set up a DNS-based load balancer. It allowed us to route a small percent of the traffic to the new instances and watch what happens. Things went almost smooth – we had an issue reported with some defective mailers that insisted on sending an SMTP HELO command without parameters in some cases. (Looking at you, ActionMailer!) PowerMTA simply dropped the broken command while Postfix returned an error. Fortunately the problem was easy to fix on the client side.

We gradually increased the portion of the traffic that we send to the new instances and, since last week, 100% of it goes to the new instances. Mission accomplished!