How to
Deploy Software

Make your team’s deploys as boring as hell and stop stressing about it.

Let's talk deployment

Whenever you make a change to your codebase, there's always going to be a risk that you're about to break something.

No one likes downtime, no one likes cranky users, and no one enjoys angry managers. So the act of deploying new code to production tends to be a pretty stressful process.

It doesn't have to be as stressful, though. There's one phrase I'm going to be reiterating over and over throughout this whole piece:

Your deploys should be as boring, straightforward, and stress-free as possible.

Deploying major new features to production should be as easy as starting a flamewar on Hacker News about spaces versus tabs. They should be easy for new employees to understand, they should be defensive towards errors, and they should be well-tested far before the first end-user ever sees a line of new code.

This is a long — sorry not sorry! — written piece specifically about the high-level aspects of deployment: collaboration, safety, and pace. There's plenty to be said for the low-level aspects as well, but those are harder to generalize across languages and, to be honest, a lot closer to being solved than the high-level process aspects. I love talking about how teams work together, and deployment is one of the most critical parts of working with other people. I think it's worth your time to evaluate how your team is faring, from time to time.

A lot of this piece stems from both my experiences during my five-year tenure at GitHub and during my last year of advising and consulting with a whole slew of tech companies big and small, with an emphasis on improving their deployment workflows (which have ranged from "pretty respectable" to "I think the servers must literally be on fire right now"). In particular, one of the startups I'm advising is Dockbit, whose product is squarely aimed at collaborating on deploys, and much of this piece stems from conversations I've had with their team. There's so many different parts of the puzzle that I thought it'd be helpful to get it written down.

I'm indebted to some friends from different companies who gave this a look-over and helped shed some light on their respective deploy perspectives: Corey Donohoe (Heroku), Jesse Toth (GitHub), Aman Gupta (GitHub), and Paul Betts (Slack). I continually found it amusing how the different companies might have taken different approaches but generally all focused on the same underlying aspects of collaboration, risk, and caution. I think there's something universal here.

Anyway, this is a long intro and for that I'd apologize, but this whole goddamn writeup is going to be long anyway, so deal with it, lol.

Table of Contents

  1. Goals

    Aren't deploys a solved problem?

  2. Prepare

    Start prepping for the deploy by thinking about testing, feature flags, and your general code collaboration approach.

  3. Branch

    Branching your code is really the fundamental part of deploying. You're segregating any possible unintended consequences of the new code you're deploying. Start thinking about different approaches involved with branch deploys, auto deploys on master, and blue/green deploys.

  4. Control

    The meat of deploys. How can you control the code that gets released? Deal with different permissions structures around deployment and merges, develop an audit trail of all your deploys, and keep everything orderly with deploy locks and deploy queues.

  5. Monitor

    Cool, so your code's out in the wild. Now you can fret about the different monitoring aspects of your deploy, gathering metrics to prove your deploy, and ultimately making the decision as to whether or not to roll back your changes.

  6. Conclusion

    "What did we learn, Palmer?"
    "I don't know, sir."
    "I don't fuckin' know either. I guess we learned not to do it again."
    "Yes, sir."

How to Deploy Software was originally published on March 1, 2016.

Goals

Aren't deploys a solved problem?

If you’re talking about the process of taking lines of code and transferring them onto a different server, then yeah, things are pretty solved and are pretty boring. You’ve got Capistrano in Ruby, Fabric in Python, Shipit in Node, all of AWS, and hell, even FTP is going to stick around for probably another few centuries. So tools aren’t really a problem right now.

So if we have pretty good tooling at this point, why do deploys go wrong? Why do people ship bugs at all? Why is there downtime? We’re all perfect programmers with perfect code, dammit.

Obviously things happen that you didn’t quite anticipate. And that’s where I think deployment is such an interesting area for small- to medium-sized companies to focus on. Very few areas will give you a bigger bang for your buck. Can you build processes into your workflow that anticipate these problems early? Can you use different tooling to help collaborate on your deploys easier?

This isn't a tooling problem; this is a process problem.

The vast, vast majority of startups I've talked to the last few years just don't have a good handle on what a "good" deployment workflow looks like from an organizational perspective.

You don't need release managers, you don't need special deploy days, you don't need all hands on deck for every single deploy. You just need to take some smart approaches.

Prepare

Start with a good foundation

You've got to walk before you run. I think there's a trendy aspect of startups out there that all want to get on the coolest new deployment tooling, but when you pop in and look at their process they're spending 80% of their time futzing with the basics. If they were to streamline that first, everything else would fall in place a lot quicker.

Tests

Testing is the easiest place with which to start. It's not necessarily part of the literal deployment process, but it has a tremendous impact on it.

There's a lot of tricks that depend on your language or your platform or your framework, but as general advice: test your code, and speed those tests up.

My favorite quote about this was written by Ryan Tomayko in GitHub's internal testing docs:

We can make good tests run fast but we can't make fast tests be good.

So start with a good foundation: have good tests. Don't skimp out on this, because it impacts everything else down the line.

Once you start having a quality test suite that you can rely upon, though, it's time to start throwing money at the problem. If you have any sort of revenue or funding behind your team, almost the number one area you should spend money on is whatever you run your tests on. If you use something like Travis CI or CircleCI, run parallel builds if you can and double whatever you're spending today. If you run on dedicated hardware, buy a huge server.

The amount of benefit I've seen companies gain by moving to a faster test suite is one of the most important productivity benefits you can earn, simply because it impacts iteration feedback cycles, time to deploy, developer happiness, and inertia. Throw money at the problem: servers are cheap, developers are not.

I made an informal Twitter poll asking my followers just how fast their tests suite ran. Granted, it's hard to account for microservices, language variation, the surprising amount of people who didn't have any tests at all, and full-stack vs quicker unit tests, but it still became pretty clear that most people are going to be waiting at least five minutes after a push to see the build status:

How fast should fast really be? GitHub's tests generally ran within 2-3 minutes while I was there. We didn't have a lot of integration tests, which allowed for relatively quick test runs, but in general the faster you can run them the faster you're going to have that feedback loop for your developers.

There are a lot of projects around aimed at helping parallelize your builds. There's parallel_tests and test-queue in Ruby, for example. There's a good chance you'll need to write your tests differently if your tests aren't yet fully independent from each other, but that's really something you should be aiming to do in either case.

Feature Flags

The other aspect of all this is to start looking at your code and transitioning it to support multiple deployed codepaths at once.

Again, our goal is that your deploys should be as boring, straightforward, and stress-free as possible. The natural stress point of deploying any new code is running into problems you can't foresee, and you ultimately impact user behavior (i.e., they experience downtime and bugs). Bad code is going to end up getting deployed even if you have the best programmers in the universe. Whether that bad code impacts 100% of users or just one user is what's important.

One easy way to handle this is with feature flags. Feature flags have been around since, well, technically since the if statement was invented, but the first time I remember really hearing about a company's usage of feature flags was Flickr's 2009 post, Flipping Out.

These allow us to turn on features that we are actively developing without being affected by the changes other developers are making. It also lets us turn individual features on and off for testing.

Having features in production that only you can see, or only your team can see, or all of your employees can see provides for two things: you can test code in the real world with real data and make sure things work and "feel right", and you can get real benchmarks as to the performance and risk involved if the feature got rolled out to the general population of all your users.

The huge benefit of all of this means that when you're ready to deploy your new feature, all you have to do is flip one line to true and everyone sees the new code paths. It makes that typically-scary new release deploy as boring, straightforward, and stress-free as possible.

Provably-correct deploys

As an additional step, feature flags provide a great way to prove that the code you're about to deploy won't have adverse impacts on performance and reliability. There's been a number of new tools and behaviors in recent years that help you do this.

I wrote a lot about this a couple years back in my companion written piece to my talk, Move Fast and Break Nothing. The gist of it is to run both codepaths of the feature flag in production and only return the results of the legacy code, collect statistics on both codepaths, and be able to generate graphs and statistical data on whether the code you're introducing to production matches the behavior of the code you're replacing. Once you have that data, you can be sure you won't break anything. Deploys become boring, straightforward, and stress-free.

Move Fast Break Nothing screenshot

GitHub open-sourced a Ruby library called Scientist to help abstract a lot of this away. The library's being ported to most popular languages at this point, so it might be worth your time to look into this if you're interested.

The other leg of all of this is percentage rollout. Once you're pretty confident that the code you're deploying is accurate, it's still prudent to only roll it out to a small percentage of users first to double-check and triple-check nothing unforeseen is going to break. It's better to break things for 5% of users instead of 100%.

There's plenty of libraries that aim to help out with this, ranging from Rollout in Ruby, Togglz in Java, fflip in JavaScript, and many others. There's also startups tackling this problem too, like LaunchDarkly.

It's also worth noting that this doesn't have to be a web-only thing. Native apps can benefit from this exact behavior too. Take a peek at GroundControl for a library that handles this behavior in iOS.


Feeling good with how you're building your code out? Great. Now that we got that out of the way, we can start talking about deploys.

Branch

Organize with branches

A lot of the organizational problems surrounding deployment stems from a lack of communication between the person deploying new code and the rest of the people who work on the app with her. You want everyone to know the full scope of changes you're pushing, and you want to avoid stepping on anyone else's toes while you do it.

There's a few interesting behaviors that can be used to help with this, and they all depend on the simplest unit of deployment: the branch.

Code branches

By "branch", I mean a branch in Git, or Mercurial, or whatever you happen to be using for version control. Cut a branch early, work on it, and push it up to your preferred code host (GitLab, Bitbucket, etc).

You should also be using pull requests, merge requests, or other code review to keep track of discussion on the code you're introducing. Deployments need to be collaborative, and using code review is a big part of that. We'll touch on pull requests in a bit more detail later in this piece.

Code Review

The topic of code review is long, complicated, and pretty specific to your organization and your risk profile. I think there's a couple important areas common to all organizations to consider, though:

Branch and deploy pacing

There's an old joke that's been passed around from time to time about code review. Whenever you open a code review on a branch with six lines of code, you're more likely to get a lot of teammates dropping in and picking apart those six lines left and right. But when you push a branch that you've been working on for weeks, you'll usually just get people commenting with a quick 👍🏼 looks good to me!

Basically, developers are usually just a bunch of goddamn lazy trolls.

You can use that to your advantage, though: build software using quick, tiny branches and pull requests. Make them small enough to where it's easy for someone to drop in and review your pull in a couple minutes or less. If you build massive branches, it will take a massive amount of time for someone else to review your work, and that leads to a general slow-down with the pace of development.

Confused at how to make everything so small? This is where those feature flags from earlier come into play. When my team of three rebuilt GitHub Issues in 2014, we had shipped probably hundreds of tiny pull requests to production behind a feature flag that only we could see. We deployed a lot of partially-built components before they were "perfect". It made it a lot easier to review code as it was going out, and it made it quicker to build and see the new product in a real-world environment.

You want to deploy quickly and often. A team of ten could probably deploy at least 7-15 branches a day without too much fretting. Again, the smaller the diff, the more boring, straightforward, and stress-free your deploys become.

Branch deploys

When you're ready to deploy your new code, you should always deploy your branch before merging. Always.

View your entire repository as a record of fact. Whatever you have on your master branch (or whatever you've changed your default branch to be) should be noted as being the absolute reflection of what is on production. In other words, you can always be sure that your master branch is "good" and is a known state where the software isn't breaking.

Branches are the question. If you merge your branch first into master and then deploy master, you no longer have an easy way to determining what your good, known state is without doing an icky rollback in version control. It's not necessarily rocket science to do, but if you deploy something that breaks the site, the last thing you want to do is have to think about anything. You just want an easy out.

This is why it's important that your deploy tooling allows you to deploy your branch to production first. Once you're sure that your performance hasn't suffered, there's no stability issues, and your feature is working as intended, then you can merge it. The whole point of having this process is not for when things work, it's when things don't work. And when things don't work, the solution is boring, straightforward, and stress-free: you redeploy master. That's it. You're back to your known "good" state.

Auto-deploys

Part of all that is to have a stronger idea of what your "known state" is. The easiest way of doing that is to have a simple rule that's never broken:

Unless you're testing a branch, whatever is deployed to production is always reflected by the master branch.

The easiest way I've seen to handle this is to just always auto-deploy the master branch if it's changed. It's a pretty simple ruleset to remember, and it encourages people to make branches for all but the most risk-free commits.

There's a number of features in tooling that will help you do this. If you're on a platform like Heroku, they might have an option that lets you automatically deploy new versions on specific branches. CI providers like Travis CI also will allow auto deploys on build success. And self-hosted tools like Heaven and hubot-deploy — tools we'll talk about in greater detail in the next section — will help you manage this as well.

Auto-deploys are also helpful when you do merge the branch you're working on into master. Your tooling should pick up a new revision and deploy the site again. Even though the content of the software isn't changing (you're effectively redeploying the same codebase), the SHA-1 does change, which makes it more explicit as to what the current known state of production is (which again, just reaffirms that the master branch is the known state).

Blue-green deploys

Martin Fowler has pushed this idea of blue-green deployment since his 2010 article (which is definitely worth a read). In it, Fowler talks about the concept of using two identical production environments, which he calls "blue" and "green". Blue might be the "live" production environment, and green might be the idle production environment. You can then deploy to green, verify that everything is working as intended, and make a seamless cutover from blue to green. Production gains the new code without a lot of risk.

One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production.

This is a pretty powerful idea, and it's become even more powerful with the growing popularity of virtualization, containers, and generally having environments that can be easily thrown away and forgotten. Instead of having a simple blue/green deployment, you can spin up production environments for basically everything in the visual light spectrum.

There's a multitude of reasons behind doing this, from having disaster recovery available to having additional time to test critical features before users see them, but my favorite is the additional ability to play with new code.

Playing with new code ends up being pretty important in the product development cycle. Certainly a lot of problems should be caught earlier in code review or through automated testing, but if you're trying to do real product work, it's sometimes hard to predict how something will feel until you've tried it out for an extended period of time with real data. This is why blue-green deploys in production are more important than having a simple staging server whose data might be stale or completely fabricated.

What's more, if you have a specific environment that you've spun up with your code deployed to it, you can start bringing different stakeholders on board earlier in the process. Not everyone has the technical chops to pull your code down on their machine and spin your code up locally — and nor should they! If you can show your new live screen to someone in the billing department, for example, they can give you some realistic feedback on it prior to it going out live to the whole company. That can catch a ton of bugs and problems early on.

Heroku Pipelines

Whether or not you use Heroku, take a look at how they've been building out their concept of "Review Apps" in their ecosystem: apps get deployed straight from a pull request and can be immediately played with in the real world instead of just being viewed through screenshots or long-winded "this is what it might work like in the future" paragraphs. Get more people involved early before you have a chance to inconvenience them with bad product later on.

Control

Controlling the deployment process

Look, I'm totally the hippie liberal yuppie when it comes organizational manners in a startup: I believe strongly in developer autonomy, a bottom-up approach to product development, and generally will side with the employee rather than management. I think it makes for happier employees and better product. But with deployment, well, it's a pretty important, all-or-nothing process to get right. So I think adding some control around the deployment process makes a lot of sense.

Luckily, deployment tooling is an area where adding restrictions ends up freeing teammates up from stress, so if you do it right it's going to be a huge, huge benefit instead of what people might traditionally think of as a blocker. In other words, your process should facilitate work getting done, not get in the way of work.

Audit trails

I'm kind of surprised at how many startups I've seen unable to quickly bring up an audit log of deployments. There might be some sort of papertrail in a chat room transcript somewhere, but it's not something that is readily accessible when you need it.

The benefit of some type of audit trail for your deployments is basically what you'd expect: you'd be able to find out who deployed what to where and when. Every now and then you'll run into problems that don't manifest themselves until hours, days, or weeks after deployment, and being able to jump back and tie it to a specific code change can save you a lot of time.

A lot of services will generate these types of deployment listings fairly trivially for you. Amazon CodeDeploy and Dockbit, for example, have a lot of tooling around deploys in general but also serves as a nice trail of what happened when. GitHub's excellent Deployment API is a nice way to integrate with your external systems while still plugging deploy status directly into Pull Requests:

GitHub's deployment API

If you're playing on expert mode, plug your deployments and deployment times into one of the many, many time series databases and services like InfluxDB, Grafana, Librato, or Graphite. The ability to compare any given metric and layer deployment metrics on top of it is incredibly powerful: seeing a gradual increase of exceptions starting two hours ago might be curious at first, but not if you see an obvious deploy happen right at that time, too.

Deploy locking

Once you reach the point of having more than one person in a codebase, you're naturally going to have problems if multiple people try to deploy different code at once. While it's certainly possible to have multiple branches deployed to production at once — and it's advisable, as you grow past a certain point — you do need to have the tooling set up to deal with those deploys. Deploy locking is the first thing to take a look at.

Deploy locking is basically what you'd expect it to be: locking production so that only one person can deploy code at a time. There's many ways to do this, but the important part is that you make this visible.

The simplest way to achieve this visibility is through chat. A common pattern might be to set up deploy commands that simultaneously lock production like:

/deploy <app>/<branch> to <environment>

i.e.,

/deploy api/new-permissions to production

This makes it clear to everyone else in chat that you're deploying. I've seen a few companies hop in Slack and mention everyone in the Slack deploy room with @here I'm deploying […]!. I think that's unnecessary, and only serves to distract your coworkers. By just tossing it in the room you'll be visible enough. If it's been awhile since a deploy and it's not immediately obvious if production is being used, you can add an additional chat command that returns the current state of production.

There's a number of pretty easy ways to plug this type of workflow into your chat. Dockbit has a Slack integration that adds deploy support to different rooms. There's also an open source option called SlashDeploy that integrates GitHub Deployments with Slack and gives you this workflow as well (as well as handling other aspects like locking).

Another possibility that I've seen is to build web tooling around all of this. Slack has a custom internal app that provides a visual interface to deployment. Pinterest has open sourced their web-based deployment system. You can take the idea of locking to many different forms; it just depends on what's most impactful for your team.

Once a deploy's branch has been merged to master, production should automatically unlock for the next person to use.

There's a certain amount of decorum required while locking production. Certainly you don't want people to wait to deploy while a careless programmer forgot they left production locked. Automatically unlocking on a merge to master is helpful, and you can also set up periodic reminders to mention the deployer if the environment had been locked for longer than 10 minutes, for instance. The idea is to shit and get off the pot as soon as possible.

Deploy queueing

Once you have a lot of deployment locks happening and you have a lot of people on board deploying, you're obviously going to have some deploy contention. For that, draw from your deepest resolve of Britishness inside of you, and form a queue.

A deploy queue has a couple parts: 1) if there's a wait, add your name to the end of the list, and 2) allow for people to cut the line (sometimes Really Important Deploys Need To Happen Right This Minute and you need to allow for that).

The only problem with deploy queueing is having too many people queued to deploy. GitHub's been facing this internally the last year or so; come Monday when everybody wants to deploy their changes, the list of those looking to deploy can be an hour or more long. I'm not particularly a microservices advocate, but I think deploy queues specifically see a nice benefit if you're able to split things off from a majestic monolith.

Permissions

There's a number of methods to help restrict who can deploy and how someone can deploy.

2FA is one option. Hopefully your employee's chat account won't get popped, and hopefully they have other security measures turned on their machine (full disk encryption, strong passwords, etc.). But for a little more peace of mind you can require a 2FA process to deploy.

You might get 2FA from your chat provider already. Campfire and Slack, for example, both support 2FA. If you want it to happen every time you deploy, however, you can build a challenge/response process into the process. Heroku and Basecamp both have a process like that internally, for instance.

Another possibility to handle the who side of permissions is to investigate what I tend to call, "riding shotgun". I've seen a number of companies who have either informal or formal processes or tooling for ensuring that at least one senior developer is involved in every deploy. There's no reason you can't build out a 2FA-style process like that into a chat client, for example, requiring both the deployer and the senior developer that's riding shotgun to confirm that code can go out.

Monitor

Admire and check your work

Once you've got your code deployed, it's time to verify that what you just did actually did what you did intend it to do.

Check the playbook

All deploys should really hit the exact same game plan each time, no matter if it's a frontend change or a backend change or anything else. You're going to want to check to see if the site is still up, if the performance took a sudden turn for the worse, if error rates started elevating, or if there's an influx of new support issues. It's to your advantage to streamline that game plan.

If you have multiple sources of information for all of these aspects, try putting a link to each of these dashboards in your final deploy confirmation in chat, for example. That'll remind everyone every time to look and verify they're not impacting any metrics negatively.

Ideally, this should all be drawn from one source. Then it's easier to direct a new employee, for example, towards the important metrics to look at while making their first deploy. Pinterest's Teletraan, for example, has all of this in one interface.

Metrics

There's a number of metrics you can collect and compare that will help you determine whether you just made a successful deploy.

The most obvious, of course, is the general error rate. Has it dramatically shot up? If so, you probably should redeploy master and go ahead and fix those problems. You can automate a lot of this, and even automate the redeploy if the error rate crosses a certain threshold. Again, if you assume the master branch is always a known state you can roll back to, it makes it much easier to automate auto-rollbacks if you trigger a slew of exceptions right after deploy.

The deployments themselves are interesting metrics to keep on-hand as well. Zooming out over the last year or so can help give you a good example of whether you're scaling the development pace up, or if it's clear that there's some problems and things are slowing down. You can also take a step further and collect metrics on who's doing the deploying and, though I haven't heard of anyone do this explicitly yet, tie error rates back to deployer and develop a good measurement of who are reliable deployers on the team.

Post-deploy cleanup

The final bit of housework that's required is the cleanup.

The slightly aggressively-titled Feature Toggles are one of the worst kinds of Technical Debt talks a bit about this. If you're building things with feature flags and staff deployments, you run the risk of complicating the long-term sustainability of your codebase:

The plumbing and scaffolding logic to support branching in code becomes a nasty form of technical debt, from the moment each feature switch is introduced. Feature flags make the code more fragile and brittle, harder to test, harder to understand and maintain, harder to support, and less secure.

You don't need to do this right after a deploy; if you have a bigger feature or bugfix that needs to go out, you'll want to spend your time monitoring metrics instead of immediately deleting code. You should do it at some point after the deploy, though. If you have a large release, you can make it part of your shipping checklist to come back and remove code maybe a day or a week after it's gone out. One approach I liked to take was to prepare two pull requests: one that toggles the feature flag (i.e., ships the feature to everyone), and one that cleans up and removes all the excess code you introduced. When I'm sure that I haven't broken anything and it looks good, I can just merge the cleanup pull request later without a lot of thinking or development.

You should celebrate this internally, too: it's the final sign that your coworker has successfully finished what they were working on. And everyone likes it when a diff is almost entirely red. Removing code is fun.

Deleted branch

You can also delete the branch when you're done with it, too. There's nothing wrong with deleting branches when you're done with them. If you're using GitHub's pull requests, for example, you can always restore a deleted branch, so you'll benefit from having it cleared out of your branch list but you won't actually lose any data. This step can also be automated, too: periodically run a script that looks for stale branches that have been merged into master, and then delete those branches.

Neato

The whole ballgame

I only get emotional about two things: a moving photo of a Golden Retriever leaning with her best friend on top of a hill overlooking an ocean looking towards a beautiful sunset, and deployment workflows. The reason I care so much about this stuff is because I really do think it's a critical part of the whole ballgame. At the end of the day, I care about two things: how my coworkers are feeling, and how good the product I'm working on is. Everything else stems from those two aspects for me.

Deployments can cause stress and frustration, particularly if your company's pace of development is sluggish. It also can slow down and prevent you from getting features and fixes out to your users.

I think it's worthwhile to think about this, and worthwhile to improve your own workflows. Spend some time and get your deploys to be as boring, straightforward, and stress-free as possible. It'll pay off.

Written by Zach Holman. Thanks for reading.

If you liked this, you might like some of the other things I've written. If you didn't like this, well, they're not all winners.

Did reading this leave you with questions, or do you have anything you'd like to talk about? Feel free to drop by my ask-me-anything repository on GitHub and file a new issue so we can chat about it in the open with other people in the community.

I hope we eventually domesticate sea otters.