NB: I have filed this under programming because it is in its purest essence a programming problem. However, this extends above and beyond the mere laying of code and into overall systems design
I have also decided to break these into shorter chunks of around 1,000-1,200 words each.
Microservices
Part 1: The Monolith
Before diving straight into what a microservice is, we should probably step back a moment and examine what it is not. Specifically, we need to understand what the microservice architecture is an answer to and solution for. In large part, the microservice was a response to monolithic app architectures and the opportunity presented by ever increasing ease of provisioning new servers by technology such as virtualization, LXC containers (which came way before Docker), and eventually Docker itself.
For our use case, let’s examine one of the most classic and common application stacks: the humble LAMP stack.
For those who may not be familiar, LAMP stands for “Linux, Apache, Mysql, PHP”, and was and still is the stack that defines much of the internet’s web services today. This is a “full stack” design, meaning it encompasses everything from the kernel and OS up through database and runtime + webserver. Everything you need to run an app can be found in this stack.
In most instances of the LAMP stack you’ll encounter, it’s actually one full, self-contained app + database on a single server. Even in larger companies, you’ll actually see nearly the same architecture: the kernel, runtime + application, and webserver are all on the same box and heavily intertwined while the database tends to be on a separate server (or cluster if you care about HA). A load balancer will distribute requests across multiple instances of the app which are necessarily on separate boxes. Those instances of the app share state and session by storing all of that data in the database.
The problem: up or out?
The problem with monoliths is that they don’t scale well. Or, rather, they don’t scale out well. Often times they only scale up to a certain point and then the return on investment drops significantly. Consider a monolithic LAMP or MEAN (Mongo, Express, Angular, Node.js) stack that processes something like, oh I don’t know, insurance applications and HR data. Given that this is a highly seasonal task, you can see where you’d have a very regular, predictable need to scale as traffic increased. January through September, things would be rather quiet and business-as-usual for your quant little shop. Developers would be happily slinging PHP or Node while ops people hum along and hopefully patch the latest silly OpenSSL bug with branding and a logo that would make Uber jealous. Then as the end of September approaches, the head of engineering calls and all-hands meeting and reminds everyone: Winter is coming. Okay, not so much “winter” as “open enrollment”, an insanely busy season for anyone who handles insurance or benefits. October through December are a whirlwind of “patch it NOW!” and “just survive!”. Naturally, as the traffic increases literally five-fold starting on the first day of October, you need to be ready.
The head of the ops team is smart. He knows what’s coming and gathers his team to formulate a plan. You currently have two webservers + app and a three-node database cluster. The database cluster is more than capable of handling the requests, but the webapp is already operating at a consistent 50% CPU capacity even under this lighter traffic load. We want to save money, after all, so we deployed two single core 4 GB boxes during the slow season. Now that the season is coming to an end, you have a clear problem: you need more capacity. The solution is less clear: Deploy bigger servers or more servers? In other words, up or out?
The problem with up and the problem with out
Both up and out have their own problems. That’s not to say that a mircoservice architecture is free from problems, no. But it approaches them quite differently. See, the one of the main issues with monoliths is that they’re notoriously hard to deploy. As a codebase grows, it becomes more picky. Angry. Cantankerous. Solutions abound for solving this problem, but they are ultimately band-aids: Golden images, CodeDeploy, Packer (a subset of golden images). Most of the time in our perfect world, we think we have built this well-behaved app that installs well each time. Call me when that happens because I have some unicorns to sell you. So naturally, because the monolith is so hard to install once, we want to avoid installing it 10 times. Therefore, we choose to go up. We throw RAM and CPU at the problem and build the app 2-3 times instead of 10. We lose some redundancy, but we have our sanity, right?
So the problem with “out” is clear: Monoliths are a pain to deploy and build, so we want to do it as few times as possible. They’re often a pain to maintain too, so having more instances of the monolith means more server that can possibly drag me out of bed in the middle of the night with an alert from Nagios or PagerDuty. But what about up? What’s so bad about throwing more resources at the problem in a couple of boxes.
In a few short words:
- It’s expensive
- It’s wasteful (see #1)
- You sacrifice redundancy
Why “up” is wasteful
To go ahead and get out in front of the objection, up is not always wasteful. Sometimes you just need a bigger ~boat~ box. However, most of the time when your reason for going bigger is “scaling”, you’re probably wasting resources and spending more than you need to. Typically you only get up to capacity at specific peak times and the rest of that time is idle. Larger boxes on cloud infrastructure such as AWS are incredibly expensive, and having them sit idle for hours a day is wasteful. Imagine a c4.xlarge box (which costs hundreds of dollars a month) sitting idle at 5% CPU for 22 hours a day. Wouldn’t it be amazing if you could have that server when demand is high and then terminate it for the rest of the day? Imagine the savings! Additionally, your application does not perform perfectly symmetrically. Generally the “performance” is actually a bottleneck in one or two places that force you to solve the problem by either multiplying instances (scale out) or adding more raw resources to the problem (scaling up). In order to compensate for one slow function, we are throwing resources at the whole problem.
Conclusion
This is not the sole problem with monoliths, nor is this the reason to move to a microservices architecture. Rather, this is just an illustration of the problems that ops teams face on a daily basis. Many are understaffed and face a lack of training. This compounds in decisions that were made to serve the “tyranny of the urgent” as opposed to strategic investment in infrastructure and systems design. Monoliths are the easy, short term answer for many dev teams, but unfortunately they reach the end of their scalability quickly. You can only scale up so many times and eventually scaling out faces problem of its own. What happens when you can reliably deploy the app, but now you are faced with the question of scaling out at a rate many times what you were doing before? Can you handle that? Can your team of two or three handle the linear increase to scaling of a hundred boxes or more?
In the next section, we will talk about how the microservice architecture addresses the problem of scaling, additional problems it solves, and new ones that it introduces.