Kristen Shi 03/02/2018

We're off to Santa Clara for SREcon 2018!

3 minute read

Get pumped! This March, Shopify engineers will be speaking at SREcon Americas in Santa Clara, CA, USA! This three-day conference runs from March 27-29, 2018, and is dedicated to highlighting excellence, best practices, and thought leadership in the areas of engineering resilience, reliability, and scalability. Shopify engineers will be giving presentations on several topics, ranging from software engineer lessons learned from oil refineries, to telling the story of how Shopify's PaaS was built on Kubernetes!

Bad software doesn’t explode. You can describe it as exploding when it throws an exception, corrupts some data, or makes your computer unusable, but it doesn’t explode. When code doesn’t work, the solution is to figure out where the logic is incorrect and fix it. While SREs may be called engineers, we rarely face the consequences of engineers in other industries.

In contrast, when a chemical engineer makes a mistake designing a refinery, the consequences are very different. We’ve all seen videos of the repercussions online. Big, loud explosions reducing massive facilities to chunks of twisted metal. The reality is working with unstable chemicals is a lot harder than keeping track of pointers in C.

Yet despite the differences, industrial process plants can be surprisingly similar to a complex software system. Where refineries will use pressure relief valves, web services will degrade gracefully. Regardless if you’re protecting against thermal runaway in a plant or a cascading failure in a data center, the fundamental ideas can be shared by both domains.

In this talk, you’ll explore the techniques and ideas used to build and operate refineries and how we can use them to make our software systems more resilient and reliable.

Wednesday, 28 March 2018 - 5:15pm-5:35pm

Shopify has grown from less than 20 production services in 2011 to more than 400 in 2017. These services currently run on a wide variety of production environments making it harder to share tools across applications. Moving to the cloud is a common occurrence, but Shopify decided to build a platform as a service (PaaS) to consolidate all production environments. This PaaS was built on a public cloud provider and Kubernetes (k8s). Despite consolidation, the PaaS added maintenance and support load on the team. SREs' build tools that are used by developers with a wide array of experience which adds challenges regarding user experience, education, support etc.

This talk contains a brief overview of development at Shopify, why we decided to move to the cloud, what we built, and what was learned along the way. This talk is not meant to be a technical introduction into how to use Kubernetes, but instead a case study regarding the team’s experiences building the PaaS. Issues regarding onboarding, education, etc. are similar to issues faced in most SRE projects. The case study is meant to generalize the lessons the team learnt over the project and how they can applied more broadly at other organizations.

Thursday, 29 March 2018 - 1:20pm–2:00pm

If you’re an engineer interested in life at Shopify, King will be at SREcon. Reach out to him on LinkedIn if you are interested in working at Shopify, or head over to our careers page!