Meet Matt Canty, Platform Engineer
Hello there, my name is Matt Canty and I am a Platform Engineer. I have been at Tandem Bank since February 2015 and have seen all of the highs and lows that come with working in a start-up.
When I started at Tandem Bank as a back-end developer, I quickly picked up an interest in the DevOps way of doing things. I was continuously improving our deployment automation. This lead to faster and more frequent deployments whilst reducing the number of defects.
We worked hard at Tandem to completely change the way our back-end services, iOS and Android apps and websites were deployed. But an elephant was still lurking in the room: none of our infrastructure could benefit from these DevOps approaches...
We needed infrastructure-as-code, and we needed it fast!
If you're not aware, deploying is the process of taking code a software engineer has written, and making it available to use by customers. This can mean publishing the latest version of the app to the app store or updating the website with this blog. In the context of infrastructure, this means changes to our network, servers, databases, and much more.
I joined the Platform Team during the Amazon Web Services (AWS) migration project's planning phase. I played a major role in codifying our existing infrastructure over 10 months and took part in the migration day itself.
Now that we're finally able to take advantage of the benefits of being on AWS... I have prepared some quick tips on how you might move your bank to AWS too!
1. Make Decisions (and stick to them)
Sounds easy right?... prudence is important with large projects like this. We didn't want to disrupt the entire Tandem team although you cannot help interrupting everybody at some point.
Here are a few decisions which I think made our lives easier:
- Infrastructure-as-code. The importance of this cannot be overstated. If you are on the fence, I urge you to go for it. The power of rebuilding your infrastructure at the push of a button brings many benefits.
Minimising architectural changes, but use AWS products instead of managing your own:
- Using your existing deployment tools - it does infrastructure-as-code deployments just as well as code deployments!
- Short-lived feature branches as recommended by the book Accelerate, which contains the findings of a major study showing the correlation between a business' competitive advantage and the performance of their delivery teams.
- We decided to use CloudFormation so that we could take advantage of AWS's support. To make our lives a bit easier we used a templating tool called Jinja2 too.
2. Get Help
You wouldn't do a civil ceremony alone, so don't do a large-scale AWS migration alone either. It's been done before and there are people out there who can help you steer clear of the mistakes you will inevitably make.
We partnered with NordCloud and received a lot of help from AWS too. They both guided our planning and decision making. NordCloud also promoted and implemented a strong governance approach from the start. This means that you can't accidentally allow things to be stored without encryption.
Looking back, NordCloud's persistence to provide an end-to-end automated pipeline with strict checks was invaluable. We're still improving it today and it provides a solid foundation for our team to improve features and fix problems.
3. Deploy, Deploy, Deploy
When writing software, it's relatively straight-forward to test compared to infrastructure which has many pitfalls. If you're reading this in the future - which is inevitable - things may have improved!
The AWS documentation is comprehensive, but only deploying frequently will you uncover its complexity. The decision to use CloudFormation enabled us to lean on them for support during the early phase.
Make changes and deploy often whilst you can. Before long people will depend on stability, until then you can take advantage by working fast and breaking things often. Nothing will teach you more about the AWS platform than developing on it continuously.
Top tip: Disable manual changes. Every manual or one-off change will haunt you at a later date, I promise. We have a few that we are aware of which were made out of necessity during development. We’re only just starting to fix those shortcuts now.
4. Rebuild Your Infrastructure Regularly
In addition to the previous point, I would advise regularly destroying and rebuilding your infrastructure as much as possible - even automate it! Doing so proves that you can recover should something really bad happen.
If you're capable of doing this, then you can be assured that you are doing a lot of things right. However it didn't go unnoticed for us. A bug in a script at the end of a long day could cause the entire platform build to fail. Lots of unhappy developers and a few hours would then be required to put things right.
It is very important to rehearse. You cannot test everything in code. In the smaller tests we measured things like time taken to transfer data in order to estimate how long the real thing might take on migration day.
Larger-scale rehearsals involved one or more 3rd parties being on the phone to make a change to their systems, and revert them back again. Remember 3rd parties often need weeks to have changes approved and implemented in their systems!
A full dress rehearsal actually forced us to push back at the very last moment, having seen that there was a flaw in how we expected the database migration to work. Whenever you learn something new about the migration, update the plan. You don't want to be surprised on the day.
You've worked hard for this moment. Make it count.
You should have a run book which has been meticulously vetted for poor assumptions. You should have every 3rd party engaged and expecting your call according to your plan. You should have every team member you expect actually turn up to work that day!
Even now, things may not go as expected. For us, the data transfer took a lot longer than we anticipated. When you move from pets to cattle, or data centre to cloud, there are always going to be differences you just cannot verify ahead of time.
Providing you have a strong team who understand their domain well, you will be able to make a call on the day as to whether you continue or abort.
In the end we kicked off migration at 2.30pm on Monday, December 10th and headed home at 1am after once we felt assured that the platform was stable.
You've done it. Now you can improve gradually through iteration and reap the benefits of applying infrastructure-as-code!
Good luck on your mission. It's not going to be easy.