Scaling Mastodon: What it takes to house 3 users


The answer to the titular question being: not that much, as it turns out.

So, I run and coadministrate the Mastodon instance at xkcd.network, which I discussed briefly in a previous post here. The instance has been running without any problems for four months now, and I'm generally quite happy with how it's turned out. Orthogonally to the Mastodon instance, I've recently been making motions towards moving my mail infrastructure to a new and slightly more refined system (which, once complete, I'm hoping to write a post about in the future). As a result, I've been looking for some possible areas to save a little money in order to offset the cost of the new mail servers, so I reexamined the Mastodon instance with a few months of experience under my belt.

Downsizing

The Mastodon instance was at this point sitting on iris.xkcd.network, a single Scaleway ARM64-4GB virtual server (a six-core ARMv8 VM with 4 GB of RAM and 100 GB of storage split between two 50 GB volumes). For an instance with only three active users who between them don't generate a lot of traffic, that's a lot of spare capacity, though that wasn't obvious until i'd sat down and turned down some of the concurrency settings. I decided that we could (and should) probably downsize to something a little smaller and I could put the money I saved towards paying for the mail servers.

At the time of writing, an ARM64-4GB machine at Scaleway is 6 EUR per month (rounded up to the nearest Euro), so I gave myself a budget of less-than-6 EUR/month to work with. The next smallest (and smallest available) ARMv8 configuration at Scaleway is the ARM64-2GB, which is 3 EUR/month (and cheaper than the equivalent x86_64 box). However, I wasn't totally convinced that I would be able to fit all of the processes necessary to run the instance into 2 GB of RAM and still have overflow capacity. In hindsight, I probably could have put the instance on one of those machines and then tossed in a few gigabytes of swap space, but I wanted to reduce the chance that future increases in average system load would necessitate a further migration.

I then started looking at how much it would cost to run the instance distributed across multiple machines, which I decided would probably be more interesting from a system design standpoint. While it would increase the overall complexity of the system, I saw the following benefits:

The solution I came up with was to provision three Start1-XS servers: hermes.xkcd.network, hedwig.xkcd.network and becblanche.xkcd.network. The Start1-XS configuration is Scaleway's smallest x86_64 server, which has a single gigabyte of RAM, a single core and 25 GB of disk; usually these servers cost 2 EUR/month, but I removed the public IPv4 addresses assigned to hedwig and becblanche and made them IPv6-only, which reduced the cost of both to 1 EUR/month. End result: three servers (with 4 IP addresses, 3 GB of RAM and 75 GB of storage between them) for 4 EUR/month. Not bad.

Design

Mastodon is a Ruby on Rails application, which has more than a few moving parts:

A few external services are also required:

Mastodon stores multimedia files (such as local users' uploads or media retrieved from remote instances). This can optionally be stored on the local disk (in which case the resources are served by nginx), on an Amazon S3 bucket (or something API-compatible) or on a CDN host.

The way the instance was set up originally on iris was that everything ran on the same host (including the mail infrastructure), and the media was stored on the local disk. Moving to the new cluster, the plan was to set up the database and Redis on one of the IPv6-only machines, and the rest of the processes split between the other two hosts.

Implementation

The way this actually worked out was that becblanche hosted Postgresql and Redis, hedwig hosted the Sidekiq processes, the Node.js streaming API processes and the media storage, and hermes hosted Puma, the nginx reverse proxy and the mail infrastructure.

The first thing to configure was the network connectivity between the three servers. Scaleway's network is set up so that every VM has an address in RFC1918 space and can communicate with other servers within their intranet. Traffic to and from servers which have public IPv4 addresses assigned pass through bidirectional NAT on ingress and egress from Scaleway's network. This meant that all three servers were reachable to each other over the internal intranet.

I also wanted to secure the traffic passing between the three servers, as an anti-eavesdropping measure on connections between the frontend processes and the data storage backends (yes, this doesn't stop the VM hosts simply scraping sensitive data out of RAM, but if that were in my threat model then I would be running the instance on my own dedicated hardware). I briefly considered using TLS certificates for this, but that had the potential to get out of hand very quickly with certificate authorities and the like, so instead of securing at the application layer, I decided to secure at the network layer instead.

I set up a full mesh of wireguard tunnels between each of the three servers, then created a dummy loopback interface on each of them. I then installed the BIRD Internet Routing Daemon (BIRD) on each server, and configured it to advertise the address on the loopback interface to the other machines over the wireguard tunnels using OSPF (see here). There are a number of advantages to each server having a unique host address instead of addressing them by their tunnel endpoints, which I won't go into much detail about here -- the short version is it means that the addresses which the various services bind to are independent of the tunnel topology, and that the Postgres and Redis endpoint addresses used in the Mastodon configuration on hermes and hedwig are both the same.

I then installed and configured Postgresql and Redis on becblanche. This was reasonably straightforward, as the only changes I made from the stock configurations was to explicitly set the listening address to the virtual loopback address I had set up and to allow connections to Postgresql over the network.

The media storage presented an interesting problem. Simply storing all the files on disk on hermes or hedwig didn't really give a great deal of flexibility, and I wasn't about to pay someone else to host the media for me, because the whole point of this exercise is to save money and to learn to self-host things. Enter Minio, an open source implementation of the S3 API written in Go. I compiled and installed the Minio server and client programs on hedwig, and then starting working out how the whole S3 thing actually works, because up to this point I hadn't a clue. I found these two guides particularly helpful, if only because they're the only two I could find at the time. In short, I created a new storage bucket on the Minio endpoint, and set the anonymous policy (for unauthenticated users) to download-only (i.e. read-only access of objects stored in the bucket). I then created a new user access policy on the endpoint from this policy file (which allows an authenticated user to get, retrieve and delete objects from the created bucket) and created a user on the endpoint with the access policy I had just defined.

The mail infrastructure was reasonably straightforward to set up, as it was mostly a direct copy of the configuration present on iris onto hermes, with a couple of adjustments to match the IP addresses on hermes -- it looks like this. As before, I just used OpenSMTPD and dkimproxy, because I'm familiar with those tools.

The nginx configuration on hermes is where everything starts to come together. The configuration for the xkcd.network virtual host is very similar to the stock configuration that comes with Mastodon, with the exception of the proxy endpoints for the Puma and streaming API backends being set to the appropriate IP addresses on hermes and hedwig. In order to serve the media resources stored on hedwig, I set up a second virtual host, media.xkcd.network and then proxied requests on that virtual host to the Minio backend.

I then cloned the Mastodon repository onto hermes and hedwig in order to set things up, or at least I tried to on hedwig -- this is a well known fact at this point, but GitHub does not have any public IPv6 at all, and our code is all on GitHub. The solution to this was crude: NAT hedwig through hermes over the wireguard tunnel. Once I had the Mastodon code on both machines, I then installed all the necessary dependencies and set up the instance.

Testing

I didn't throw the instance at the cluster just yet though -- instead, I did the smart thing and set up a test instance under a subdomain first in order to catch any errors or misconfiguration. And there were errors and misconfiguration, mostly surrounding the media resources and getting them to load correctly.

When configuring the S3 storage on the test instance, the configuration generator had set the S3_ALIAS_HOST variable instead of the S3_HOSTNAME variable. This requires a quick bit of background on bucket addressing as far as I understand it. The way S3 was originally developed, buckets were addressed in the path component of the request URL to the S3 endpoint, e.g. http://my-s3-host.example/$BUCKET_NAME/path/to/resources. More recently, a newer scheme has been introduced where the bucket name is now a part of the hostname set in DNS and in the HTTP request headers, e.g. http://$BUCKET_NAME.my-s3-host.example/path/to/resources. The latter has the advantage that one can use various forms of indirection in order to use one's own domain with the S3 storage backends (such as CNAME records in DNS or HTTP proxying and redirection), which conveniently allows you to serve a website out of an S3 bucket.

The TL;DR of all of this is that you should set S3_HOSTNAME when you're using path-based addressing and S3_ALIAS_HOST when you're using domain-based addressing or other forms of indirection. I had the latter set when I should have had the former set -- once I figured that out, it was easy to fix.

Another issue I had was that some resources weren't loading correctly due to an inconsistent HTTPS Strict Transport Security across the site (because I was regularly chopping and changing nginx configuration), which, once I worked that one out, was also easy to fix.

It was also in this testing period that I discovered that both Puma and Sidekiq need access to the outside internet, as Sidekiq is responsible for pushing local posts out to remote instances, and Puma needs to make requests to other instances in order to do things like looking up other users and their posts. As Sidekiq was hosted on hedwig, this meant that the forwarding and NAT on hermes would have to become a permenant fixture, which I wasn't pleased with, but couldn't help.

Migration

Migrating the production instance to the new infrastructure was quite involved, however Gargron has provided a handy guide for such an event.

The main difference I faced was that I was migrating from plain disk-backed storage to S3-backed storage, so I made an ssh tunnel between iris and hedwig and then configured and ran awscli on iris to copy all the media files across. I also had to merge the origin production configuration file with the S3-related configuration I had been using with the test instance.

However, to quote my first post after I started everything back up, "Hey, I think it worked".

Bonus round: Tuning

One of the contributing factors towards making this migration in the first place was that out of curiosity I had turned down some of the concurrency parameters on iris one afternoon and discovered that I still had capacity to run the instance with no visible issues. I also had no way to introspect the performance characteristics of the instance under load in different configurations.

Changing the configuration at runtime for the purposes of testing was also quite cumbersome -- iris was running systemd, as are hermes, hedwig and becblanche (make of that what you will), so this required editing unit files and then performing a daemon-reload every time I wanted to tweak something. Additionally, if there was a spike in load for some reason then I didn't have a way to quickly spin up more Sidekiq processes or increase the number of worker processes spawned by Puma.

So I wrote some parameterised systemd unit files, which are ones whose name are of the form servicename@.service, for example dhcpcd@wlan0.service. The instance parameter which is set when the unit is instantiated (i.e. the service is enabled and started) is then set to a string which encodes the parameters to be passed to the Mastodon processes. Puma and Sidekiq have a number of knobs, so the unit files for those two services execute a Perl script which takes the option string as a parameter and then decodes it into command line arguments or environment variables as appropriate.



home