Creating Native Components That Accept React Native Subviews

Creating Native Components That Accept React Native Subviews

React Native adoption has been steadily growing since its release in 2015, especially with its ability to quickly create cross-platform apps. A very strong open-source community has formed, producing great libraries like Reanimated and Gesture Handler that allow you to achieve native performance for animations and gestures while writing exclusively React Native code. At Shopify we are using React Native for many different types of applications, and are committed to giving back to the community.

However, sometimes there is a native component you made for another app, or already exists on the platform, which you want to quickly port to React Native and aren’t able to build cross-platform using exclusively React Native. The documentation for React Native has good examples of how to create a native module which exposes native methods or components, but what should you do if you want to use a component you already have and render React Native views inside of it? In this guide, I’ll show you how to make a native component which provides bottom sheet functionality to React Native and lets you render React views inside of it. 

A simple example is the bottom sheet pattern from Google’s Material Design. It’s a draggable view which peeks up from the bottom of the screen and is able to expand to take up the full screen. It renders subviews inside of the sheet, which can be interacted with when the sheet is expanded.

This guide only focuses on an Android native implementation and assumes a basic knowledge of Kotlin. When creating an application, it’s best to make sure all platforms have the same feature parity.

Bottom sheet functionality

Bottom sheet functionality

Table of Contents

Setting Up Your Project

If you already have a React Native project set up for Android with Kotlin and TypeScript you’re ready to begin. If not, you can run react-native init NativeComponents —template react-native-template-typescript in your terminal to generate a project that is ready to go.

As part of the initial setup, you’ll need to add some Gradle dependencies to your project.

Modify the root build.gradle (android/build.gradle) to include these lines:

Make sure to substitute your current Kotlin version in the place of 1.3.61.

This will add all of the required libraries for the code used in the rest of this guide.

You should use fixed version numbers instead of + for actual development.

Creating a New Package Exposing the Native Component

To start, you need to create a new package that will expose the native component. Create a file called NativeComponentsReactPackage.kt.

Right now this doesn’t actually expose anything new, but you’ll add to the list of View Managers soon. After creating the new package, go to your Application class and add it to the list of packages.

Creating The Main View

A ViewGroupManager<T> can be thought of as a React Native version of ViewGroup from Android. It accepts any number of children provided, laying them out according to the constraints of the type T specified on the ViewGroupManager.

Create a file called ReactNativeBottomSheet.kt and a new

The basic methods you have to implement are getName() and createViewInstance().

name is what you’ll use to reference the native class from React Native.

createViewInstance is used to instantiate the native view and do initial setup.

Inflating Layouts Using XML

Before you create a real view to return, you need to set up a layout to inflate. You can set this up programmatically, but it’s much easier to inflate from an XML layout.

Here’s a fairly basic layout file that sets up some CoordinatorLayouts with behaviours for interacting with gestures. Add this to android/app/src/main/res/layout/bottom_sheet.xml.

The first child is where you’ll put all of the main content for the screen, and the second is where you’ll put the views you want inside BottomSheet. The behaviour is defined so that the second child can translate up from the bottom to cover the first child, making it appear like a bottom sheet.

Now that there is a layout created, you can go back to the createViewInstance method in ReactNativeBottomSheet.kt.

Referencing The New XML File

First, inflate the layout using the context provided from React Native. Then save references to the children for later use.

If you aren’t using Kotlin Synthetic Properties, you can do the same thing with container = findViewById(

For now, this is all you need to initialize the view and have a fully functional bottom sheet.

The only thing left to do in this class is to manage how the views passed from React Native are actually handled.

Handling Views Passed from React Native To Native Android

By overriding addView you can change where the views are placed in the native layout. The default implementation is to add any views provided as children to the main CoordinatorLayout. However, that won’t have the effect expected, as they’ll be siblings to the bottom sheet (the second child) you made in the layout.

Instead, don’t make use of super.addView(parent, child, index) (the default implementation), but manually add the views to the layout’s children by using the references stored earlier.

The basic idea followed is that the first child passed in is expected to be the main content of the screen, and the second child is the content that’s rendered inside of the bottom sheet. Do this by simply checking the current number of children on the container. If you already added a child, add the next child to the bottomSheet.

The way this logic is written, any views passed after the first one will be added to the bottom sheet. You’re designing this class to only accept two children, so you’ll make some modifications later.

This is all you need for the first version of our bottom sheet. At this point, you can run react-native run-android, successfully compile the APK, and install it.

Referencing the New Native Component in React Native

To use the new native component in React Native you need to require it and export a normal React component. Also set up the props here, so it will properly accept a style and children.

Create a new component called BottomSheet.tsx in your React Native project and add the following:

Now you can update your basic App.tsx to include the new component.

This is all the code that is required to use the new native component. Notice that you're passing it two children. The first child is the content used for the main part of the screen, and the second child is rendered inside of our new native bottom sheet.

Adding Gestures

Now there's a working native component that renders subviews from React Native, you can add some more functionality.

Being able to interact with the bottom sheet through gestures is our main use case for this component, but what if you want to programmatically collapse/expand the bottom sheet?

Since you’re using a CoordinatorLayout with behaviour to make the bottom sheet in native code, you can make use of BottomSheetBehaviour. Going back to ReactNativeBottomSheet.kt, we will update the createViewInstance() method.

By creating a BottomSheetBehaviour you can make more customizations to how the bottom sheet functions and when you’re informed about state changes.

First, add a native method which specifies what the expanded state of the bottom sheet should be when it renders.

This adds a prop to our component called sheetState which takes a string and sets the collapsed/expanded state of the bottom sheet based on the value sent. The string sent should be either collapsed or expanded.

We can adapt our TypeScript to accept this new prop like so:

Now, when you include the component, you can change whether it’s collapsed or expanded without touching it. Here’s an example of updating your App.tsx to add a button that updates the bottom sheet state.

Now, when pressing the button, it expands the bottom sheet. However, when it’s expanded, the button disappears. If you drag the bottom sheet back down to a collapsed state, you'll notice that the button isn't updating its text. So you can set the state programmatically from React Native, but interacting with the native component isn't propagating the value of the bottom sheet's state back into React. To fix this you will add more to the *BottomSheetBehaviour* you created earlier.

This code adds a state change listener to the bottom sheet, so that when its collapsed/expanded state changes, you emit a React Native event that you listen to in the React component. The event is called "BottomSheetStateChange” and has the same value as the states accepted in setSheetState().

Back in the React component, you listen to the emitted event and call an optional listener prop to notify the parent that our state has changed due to a native interaction.

Updating the App.tsx again

Now when you drag the bottom sheet, the state of the button updates with its collapsed/expanded state.

Native Code And Cross Platform Components

When creating components in React Native our goal is always to make cross-platform components that don’t require native code to perform well, but sometimes that isn’t possible or easy to do. By creating ViewGroupManager classes, we are able to extend the functionality of our native components so that we can take full advantage of React Native’s flexible layouts, with very little code required.

Additional Information

All the code included in the guide can be found at the react-native-bottom-sheet-example repo.

This guide is just an example of how to create native views that accept React Native subviews as children. If you want a complete implementation for bottom sheets on Android, check out the react-native wrapper for android BottomSheetBehavior.

You can follow the Android guideline for CoordinatorLayout and BottomSheetBehaviour to better understand what is going on. You’re essentially creating a container with two children.

If this sounds like the kind of problems you want to solve, we're always on the lookout for talent and we’d love to hear from you. Visit our Engineering career page to find out about our open positions.

Continue reading

Your Circuit Breaker is Misconfigured

Your Circuit Breaker is Misconfigured

Circuit breakers are an incredibly powerful tool for making your application resilient to service failure. But they aren’t enough. Most people don’t know that a slightly misconfigured circuit is as bad as no circuit at all! Did you know that a change in 1 or 2 parameters can take your system from running smoothly to completely failing?

I’ll show you how to predict how your application will behave in times of failure and how to configure every parameter for your circuit breaker.

At Shopify, resilient fallbacks integrate into every part of the application. A fallback is a backup behavior which activates when a particular component or service is down. For example, when Shopify’s Redis, that stores sessions, is down, the user doesn’t have to see an error. Instead, the problem is logged and the page renders with sessions soft disabled. This results in a much better customer experience. This behaviour is achievable in many cases, however, it’s not as simple as catching exceptions that are raised by a failing service.

Imagine Redis is down and every connection attempt is timing out. Each timeout is 2 seconds long. Response times will be incredibly slow, since requests are waiting for the service to timeout. Additionally, during that time the request is doing nothing useful and will keep the thread busy.

Utilization, the percentage of a worker’s maximum available working capacity, increases indefinitely as the request queue builds up, resulting in a utilization graph like this:

Utilization during service outage
Utilization during service outrage

A worker which had a request processing rate of 5 requests per second now can only process half a request per second. That’s a tenfold decrease in throughput! With utilization this high, the service can be considered completely down. This is unacceptable for production level standards.

Semian Circuit Breaker

At Shopify, this fallback utilization problem is solved by Semian Circuit Breaker. This is a circuit breaker implementation written by Shopify. The circuit breaker pattern is based on a simple observation: if a timeout is observed for a given service one or more times, it’s likely to continue to timeout until that service recovers. Instead of hitting the timeout repeatedly, the resource is marked as dead and an exception is raised instantly on any call to it.

I'm looking at this from the configuration perspective of Semian circuit breaker but another notable circuit breaker library is Hystrix by Netflix. The core functionality of their circuit breaker is the same, however, it has less available parameters for tuning which means, as you will learn below, it can completely lose its effectiveness for capacity preservation.

A circuit breaker can take the above utilization graph and turn it into something more stable.

Utilization during service outage with a circuit breaker
Utilization during service outage with a circuit breaker

The utilization climbs for some time before the circuit breaker opens. Once open, the utilization stabilizes so the user may only experience some slight request delays which is much better.

Semian Circuit Breaker Parameters

Configuring a circuit breaker isn’t a trivial task. It’s seemingly trivial because there are just a few parameters to tune: 

  • name
  • error_threshold
  • error_timeout
  • half_open_resource_timeout
  • success_threshold.

However, these parameters cannot just be assigned to arbitrary numbers or even best guesses without understanding how the system works in detail. Changes to any of these parameters can greatly affect the utilization of the worker during a service outage.

At the end, I'll show you a configuration change that drops the utilization requirement of 263% to 4%. That’s the difference between complete outage and a slight delay. But before I get to that, let’s dive into detail about what each parameter does and how it affects the circuit breaker.


The name identifies the resource being protected. Each name gets its own personal circuit breaker. Every different service type, such as MySQL, Redis, etc. should have its own unique name to ensure that excessive timeouts in a service only opens the circuit for that service.

There is an additional aspect to consider here. The worker may be configured with multiple service instances for a single type. In certain environments, there can be dozens of Redis instances that a single worker can talk to.

We would never want a single Redis instance outage to cause all Redis connections to go down so we must give each instance a different name.

For this example, see the diagram below. We will model a total of 3 Redis instances. Each instance is given a name “redis_cache_#{instance_number}”.

3 Redis instances. Each instance is given a name “redis_cache_#{instance_number}”
3 Redis instances. Each instance is given a name “redis_cache_#{instance_number}”

You must understand how many services your worker can talk to. Each failing service will have an aggregating effect on the overall utilization. When going through the examples below, the maximum number of failing services you would like to account for is defined by failing_services. For example, if you have 3 Redis instances, but you only need to know the utilization when 2 of those go down, failing_services should be 2.

All examples and diagrams in this post are from the reference frame of a single worker. None of the circuit breaker state is shared across workers so we can simplify things this way.

Error Threshold

The error_threshold defines the number of errors to encounter within a given timespan before opening the circuit and starting to reject requests instantly. If the circuit is closed and error_threshold number of errors occur within a window of error_timeout, the circuit will open.

The larger the error_threshold, the longer the worker will be stuck waiting for input/output (I/O) before reaching the open state. The following diagram models a simple scenario where we have a single Redis instance failure.

error_threshold = 3, failing_services = 1

A single Redis instance failure
A single Redis instance failure

3 timeouts happen one after the other for the failing service instance. After the third, the circuit becomes open and all further requests raise instantly.

3 timeouts must occur during the timespan before the circuit becomes open. Simple enough, 3 times the timeout isn’t so bad. The utilization will spike, but the service will reach steady state soon after. This graph is a real world example of this spike at Shopify:

A real world example of a utilization spike at Shopify
A real world example of a utilization spike at Shopify

The utilization begins to increase when the Redis services goes down, after a few minutes, the circuit begins opening for each failing service and the utilization lowers to a steady state.

Furthermore, if there’s more than 1 failing service instance, the spike will be larger, last longer, and cause more delays for end users. Let’s come back to the example from the Name section with 3 separate Redis instances. Consider all 3 Redis instances being down. Suddenly the time until all circuits are open triples.

error_threshold = 3, failing_services = 3

3 failing services and each service has 3 timeouts before the circuit opens
3 failing services and each service has 3 timeouts before the circuit opens

There are 3 failing services and each service has 3 timeouts before the circuit opens. All the circuits must become open before the worker will stop being blocked by I/O.

Now, we have a longer time to reach steady state because each circuit breaker wastes utilization waiting for timeouts. Imagine 40 Redis instances instead of 3, a timeout of 1 second and an error_threshold of 3 means there’s a minimum time of around 2 minutes to open all the circuits.

The reason this estimate is a minimum is because the order that the requests come in cannot be guaranteed. The above diagram simplifies the scenario by assuming the requests come in a perfect order.

To keep the initial utilization spike low, the *error_threshold* should be reduced as much as possible. However, the probability of false-positives must be considered. Blips can cause the circuit to open despite the service not being down. The lower the error_threshold, the higher the probability of a false-positive circuit open.

Assuming a steady state timeout error rate is 0.1% in your time window of error_timeout. An error_timeout of 3 will give you a 0.0000001% chance of getting a false positive.

100 *(probability_of_failure)number_of_failures =(0.001)3=0.0000001%

You must balance this probability with your error_timeout to reduce the number of false positives circuit opens. When the circuit opens, it will be instantly raising for every request that is made during error_timeout.

Error Timeout

The error_timeout is the amount of time until the circuit breaker will try to query the resource again. It also determines the period to measure the error_threshold count. The larger this value is, the longer the circuit will take to recover after an outage. The larger this value is, the longer a false positive circuit open will affect the system.

error_threshold = 3, failing_services = 1

The circuit will stay open for error_timeout amount of time
The circuit will stay open for error_timeout amount of time

After the failing service causes the circuit to become open, the circuit will stay open for error_timeout amount of time. The Redis instance comes back to life and error_timeout amount of time passes so requests start sending to Redis again.

It’s important to consider the error_timeout in relation to half_open_resource_timeout. These 2 parameters are the most important for your configuration. Getting these right will determine the success of the circuit breakers resiliency mechanism in times of outage for your application.

Generally we want to minimize the error_timeout because the higher it is, the higher the recovery time. However, the primary constraints come from its interaction with these parameters. I’ll show you that maximizing error_timeout will actually preserve worker utilization.

Half Open Resource Timeout

The circuit is in half-open state when it’s checking to see if the service is back online. It does this by letting a real request through. The circuit becomes half-open after error_timeout amount of time has passed. When the operating service is completely down for an extended period of time, a steady-state behavior arises.

failing_services = 1

Circuit becomes half-open after error_timeout amount of time has passed
Circuit becomes half-open after error_timeout amount of time has passed

Error threshold expires but the service is still down. The circuit becomes half-open and a request is sent to Redis, which times out. The circuit opens again and the process repeats as long as Redis remains down.

This flip flop between the open and half-open state is periodic which means we can deterministically predict how much time is wasted on timeouts.

By this point, you may already be speculating on how to adjust wasted utilization. The error_timeout can be increased to reduce the total time wasted in the half-open state! Awesome — but the higher it goes, the slower your application will be to recover. Furthermore, false positives will keep the circuit open for longer. Not good, especially if we have many service instances. 40 Redis instances with a timeout of 1 second is 40 seconds every cycle wasted on timeouts!

So how else do we minimize the time wasted on timeouts? The only other option is to reduce the service timeout. The lower the service timeout, the less time is wasted on waiting for timeouts. However, this cannot always be done. Adjusting this timeout is highly dependent on how long the service needs to provide the requested data. We have a fundamental problem here. We cannot reduce the service timeout because of application constraints and we cannot increase the error_timeout because the recovery time will be too slow.

Enter half_open_resource_timeout, the timeout for the resource when the circuit is in the half-open state. It gets used instead of the original timeout. Simple enough! Now, we have another tunable parameter to help adjust utilization. To reduce wasted utilization, error_timeout and half_open_resource_timeout can be tuned. The smaller half_open_resource_timeout is relative to *error_timeout*, the better the utilization will be.

If we have 3 failing services, our circuit diagram looks something like this:

failing_services = 3

A total of 3 timeouts before all the circuits are open
A total of 3 timeouts before all the circuits are open

In the half-open state, each service has 1 timeout before the circuit opens. With 3 failing services, that’s a total of 3 timeouts before all the circuits are open. All the circuits must become open before the worker will stop being blocked by I/O.

Let’s solidify this example with the following timeout parameters:

error_timeout = 5 seconds
half_open_resource_timeout = 1 second

The total steady state period will be 8 seconds with 5 of those seconds spent doing useful work and the other 3 wasted waiting for I/O. That’s 37% of total utilization wasted on I/O.

Note: Hystrix does not have an equivalent parameter for half_open_resource_timeout which may make it impossible to tune a usable steady state for applications that have a high number of failing_services.

Success Threshold

The success_threshold is the amount of consecutive successes for the circuit to close again, that is, to start accepting all requests to the circuit.

The success_threshold impacts the behavior during outages which have an error rate of less than 100%. Imagine a resource error rate of 90%, with a success_threshold of 1, the circuit will flip flop between open and closed quite often. In this case there’s a 10% chance of it closing when it shouldn’t. Flip flopping also adds additional strain on the system since the circuit must spend time on I/O to re-close.

Instead, if we increase the success_threshold to 3, then the likelihood of an open becomes significantly lower. Now, 3 successes must happen in a row to open the circuit reducing the chance of flip flop to 0.1% per cycle.

Note: Hystrix does not have an equivalent parameter for success_threshold which may make it difficult to reduce the flip flopping in times of partial outage for certain applications.

Lowering Wasted Utilization

Each parameter affects wasted utilization in some way. Semian can easily be configured into a state where a service outage will consume more utilization than the capacity allows. To calculate the additional utilization required, I have put together an equation to model all of the parameters of the circuit breaker. Use it to plan your outage effectively.

The Circuit Breaker Equation

The Circuit Break Equation

This equation applies to the steady state failure scenario in the last diagram where the circuit is continuously checking the half-open state. Additional threads reduce the time spent on blocking I/O, however, the equation doesn’t account for the time it takes to context switch a thread which could be significant depending on the application. The larger the context switch time, the lower the thread count should be.

I ran a live test to test out the validity of the equation and the utilization observed closely matched the utilization predicted by the equation.

Tuning Your Circuit

Let’s run through an example and see how the parameters can be tuned to match the application needs. In this example, I’m integrating a circuit breaker for a Rails worker configured with 2 threads. We have 42 Redis instances, each configured with its own circuit and a service timeout of 0.25s.

As a starting point, let’s go with the following parameters. Failing instances is 42 because we are judging behaviour in the worst case, when all of the Redis instances are down.

Parameter  Value
0.25 seconds
2 seconds
0.25 seconds (same as service timeout)

Plugging into The Circuit Breaker Equation, we require an extra utilization of 263%. Unacceptable! Ideally we should have something less than 30% to account for regular traffic variation.

So what do we change to drop this number?

From production observation metrics, I know 99% percent of Redis requests have a response time of less than 50ms. With a value this low, we can easily drop the half_open_resource_timeout to 50ms and still be confident that the circuit will close when Redis comes back up from an outage. Additionally, we can increase the error_timeout to 30 seconds. This means a slower recovery time but it reduces the worst case utilization.

With these new numbers, the additional utilization required drops to 4%!

I use this equation as something concrete to relate back to when making tuning decisions. I hope this equation helps you with your circuit breaker configuration as it does with mine.

Author's Edit: "I fixed an error with the original circuit breaker equation in this post. success_threshold does not have an impact on the steady state utilization because it only takes 1 error to keep the circuit open again."

If this sounds like the kind of problems you want to solve, we're always on the lookout for talent and we’d love to hear from you. Visit our Engineering career page to find out about our open positions.

Continue reading

Great Code Reviews—The Superpower Your Team Needs

Great Code Reviews—The Superpower Your Team Needs

There is a general consensus that code reviews are an important aspect of highly effective teams. This research paper is one of many exploring this subject. Most organizations undergo code reviews of some form.

However, it’s all too common to see code reviews that barely scratch the surface, or that offer feedback that is unclear or hard to act upon. This robs the team the opportunity to speed up learning, share knowledge and context, and raise the quality bar on the resulting code.

At Shopify, we want to move fast while building for the long term. In our experience, having strong code review practices has a huge impact on the growth of our engineers and in the quality of the products we build.

A Scary Scenario

Imagine you join a new team and you’re given a coding task to work on. Since you’re new on the team, you really want to show what you’re made of. You want to perform. So, this is what you do:

  1. You work frantically on your task for 3 weeks.
  2. You submit a Pull Request for review with about 1000 new lines of code
  3. You get a couple comments about code style and a question that shows the person has no clue what this work is about.
  4. You get approval from both reviewers after fixing the code style and answering the question.
  5. You merge your branch into master, eyes closed, shoulders tense, grinding your teeth. After a few minutes, CI completes. Master is not broken. Yet.
  6. You live in fear for 6 months, not knowing when and how your code will break.

You may have lived through some of the situations above, and hopefully you’ve seen some of the red flags in that process.

Let’s talk about how we can make it much better.

Practical Code Review Practices

At Shopify, we value the speed of shipping, learning, and building for the long term. These values - which sometimes conflict - lead us to experiment with many techniques and team dynamics. In this article, I have distilled a series of very practical techniques we use at Shopify to ship valuable code that can stand the test of time.

A Note about terminology: We refer to Pull Requests (PR) as one unit of work that's put forth for review before merging into the base branch. Github and Bitbucket users will be familiar with this term.

1. Keep Your Pull Requests Small

As simple as this sounds, this is easily the most impactful technique you can follow to level up your code review workflow. There are 2 fundamental reasons why this works:

  • It’s mentally easier to start and complete a review for a small piece. Larger PRs will naturally make reviewers delay and procrastinate examining the work, and they are more likely to be interrupted mid-review.
  • As a reviewer, it’s exponentially harder to dive deep if the PR is long. The more code there is to examine, the bigger the mental map we need to build to understand the whole piece.

Breaking up your work in smaller chunks increases your chances of getting faster and deeper reviews.

Now, it’s impossible to set one universal standard that applies to all programming languages and all types of work. Internally, for our data engineering work, the guideline is around 200-300 lines of code affected. If we go above this threshold, we almost always break up the work into smaller blocks.

Of course, we need to be careful about breaking up PRs into chunks that are too small, since this means reviewers may need to inspect several PRs to understand the overall picture.

2. Use Draft PRs

Have you heard the metaphor of building a car vs. drawing a car? It goes something like this:

  1. You’re asked to build a car.
  2. You go away for 6 months and build a beautiful Porsche.
  3. When you show it to your users, they ask about space for their 5 children and the surf boards.

Clearly, the problem here is that the goal is poorly defined and the team jumped directly into the solution before gathering enough feedback.If after step 1 we created a drawing of the car and showed it to our users, they would have asked the same questions and we would have discovered their expectations and saved ourselves 6 months of work. Software is no different—we can make the same mistake and work for a long time on a feature or module that isn't what our users need.

At Shopify, it’s common practice to use Work In Progress (WIP) PRs to elicit early feedback whose goal is validating direction (choice of algorithm, design, API, etc). Early changes mean less wasted effort on details, polish, documentation, etc.

As an author, this means you need to be open to changing the direction of your work. At Shopify, we try to embrace the principle of strong opinions, loosely held. We want people to make decisions confidently, but also be open to learning new and better alternatives, given sufficient evidence. In practice, we use Github’s Draft PRs—they clearly signal the work is still in flow and Github prevents you from merging a Draft PR. Other tools may have similar functionality, but at the very least you can create normal PRs with a clear WIP label to indicate the work is early stage. This will help your reviewers focus on offering the right type of feedback.

3. One PR Per Concern

In addition to line count, another dimension to consider is how many concerns your unit of work is trying to address. A concern may be a feature, a bugfix, a dependency upgrade, an API change, etc. Are you introducing a new feature while refactoring at the same time? Fixing two bugs in one shot? Introducing a library upgrade and a new service?

Breaking down PRs into individual concerns has the following effects:

  • More independent review units and therefore better review quality
  • Fewer affected people, therefore less domains of expertise to gather
  • Atomicity of rollbacks, the ability of rolling back a small commit or PR. This is valuable because if something goes wrong, it will be easier to identify where errors were introduced and what to roll back.
  • Separating easy stuff from hard stuff. Imagine a new feature that requires refactoring a frequently used API. You change the API, update a dozen call-sites, and then implement your feature. 80% of your changes are obvious and skimmable with no functional changes, while 20% are new code that needs careful attention to test coverage, intended behaviour, error handling, etc. and will likely go through multiple revisions. With each revision, the reviewer will need to skim through all of the changes to find the relevant bits. By splitting this in two PRs, it becomes easy to quickly land the majority of the work and to optimize the review effort applied to the harder work.

If you end up with a PR that includes more than one concern, you can break it down into individual chunks. Doing so will accelerate the iteration cycle on each individual review, giving a faster review overall. Often part of the work can land quickly, avoiding code rot and merge conflicts.

Breaking down PRs into individual concerns

Breaking down PRs into individual concerns

In the example above, we’ve taken a PR that covered three different concerns and broke it up. You can see how each reviewer has strictly less context to go over. Best of all, as soon as any of the reviews is complete, the author can begin addressing feedback while continuing to wait for the rest of the work. In the most extreme cases, instead of completing a first draft, waiting several days (and shifting focus), and then eventually returning to address feedback, the author can work almost continuously on their family of PRs as they receive the different reviews asynchronously.

4. Focus on the Code, Not the Person

Focus on the code, not the person practice refers to communication styles and relationships between people. Fundamentally, it’s about trying to focus on making the product better, and avoiding the author perceiving a review as personal criticism.

Here are some tips you can follow:

  • As a reviewer, think, “This is our code, how can we improve on it?”
  • Offer positive remarks! If you see something done well, comment on it. This reinforces good work and helps the author balance suggestions for improvement.
  • As an author, assume best intention, and don’t take comments personally.

Below are a few examples of not-so-great review comments, and a suggestion on how we can reword to emphasize the tips above.

Less of These
 More of These
Move this to Markdown
How about moving this documentation into our Markdown README file? That way we can more easily share with other users.
Read the Google Python style guidelines
We should avoid single-character variables. How about board_size or size instead?
This feels too slow. Make it faster. Lightning fast.
 This algorithm is very easy to read but I’m concerned about performance. Let’s test this with a large dataset to gauge its efficiency.
Bool or int?
Why did you choose a list of bool values instead of integers?

Ultimately, a code review is a learning and teaching opportunity and should be celebrated as such.

5. Pick the Right People to Review

It’s often challenging to decide who should review your work. Here are some questions can use as guidance:

  • Who has context on the feature or component you’re building?
  • Who has strong skills in the language, framework, or tool you’re using?
  • Who has strong opinions on the subject?
  • Who cares about the result of what you’re doing?
  • Who should learn this stuff? Or if you’re a junior reviewing someone more senior, use this as an opportunity to ask questions and learn. Ask all the silly questions, a strong team will find the time to share knowledge.

Whatever rules your team might have, remember that it is your responsibility as an author to seek and receive a high-quality code review from a person or people with the right context.

6. Give Your Reviewers a Map

Last but definitely not least, the description on your PR is crucial. Depending on who you picked for review, different people will have different context. The onus is on the author to help reviewers by providing key information or links to more context so they can produce meaningful feedback.

Some questions you can include in your PR templates:

  • Why is this PR necessary?
  • Who benefits from this?
  • What could go wrong?
  • What other approaches did you consider? Why did you decide on this approach?
  • What other systems does this affect?

Good code is not only bug-free; it is also useful! As an author, ensure that your PR description ties your code back to your team’s objectives, ideally with link to a feature or bug description in your backlog. As a reviewer, start with the PR description; if it’s incomplete, send it back before attempting to judge the suitability of the code against undefined objectives. And remember, sometimes the best outcome of a code review is to realize that the code isn’t needed at all!

What’s the Benefit?

By adopting some of the techniques above, you can have a strong impact on the speed and quality of your software building process. But beyond that, there’s the potential for a cultural effect:

  • Teams will build a common understanding. The group understands your work better and you’re not the only person capable of evolving any one area of the codebase.
  • Teams will adopt a sense of shared responsibility. If something breaks, it’s not one person’s code that needs fixing. It’s the team’s work that needs fixing.

Any one person in a team should be able to take a holiday and disconnect from work for a number of days without risking the business or stressing about checking email to make sure the world didn’t end.

What Can I Do to Improve My Team’s Code Review Process?

If you lead teams, start experimenting with these techniques and find what works for your team.

If you’re an individual contributor, discuss with your lead on why you think code reviews techniques are important, how they help effectiveness and how they help your team.

Bring this up on your next 1:1 or your next team synch.

The Importance of Code Reviews

To close, I’ll share some words from my lead, which summarizes the importance of Code Reviews:

“We could prioritize landing mediocre but working code in the short term, and we will write the same debt-ridden code forever, or we can prioritize making you a stronger contributor, and all of your future contributions will be better (and your career brighter).

An enlightened author should be delighted to have this attention.”

We're always on the lookout for talent and we’d love to hear from you. Please take a look at our open positions on the Data Science & Engineering career page.

Continue reading

Bug Bounty Year in Review 2019

Bug Bounty Year in Review 2019

For the third year in a row, we’ve taken time to reflect on our Bug Bounty program. This past year was an exciting one for us because we ran multiple experiments and made a number of process improvements to increase our program speed. 

2020 Program Improvements

Building on our program’s continued success in 2019, we’re excited to announce more improvements. 

Bounties Paid in Full Within 7 Days

As of today, we pay bounties in full within 7 days of a report being triaged. Paying our program minimum on triage has been a resounding success for us and our hackers. After having experimented with paying full bounties on triage in Shopify-Experiments (described below), we’ve decided to make the same change to our public program.

Maximum Bounty is Now $50,000

We are increasing our maximum bounty amount to $50,000. Beginning today, we are doubling the bounty amounts for valid reports of Arbitrary Code Execution, now $20K–$50K, SQL Injection, now $20K$40K, and Privilege Escalation to Shop Owner, now $10K$30K. Trust and security is our number one priority at Shopify and these new amounts demonstrate our commitment to both.

Surfacing More Information About Duplicate Reports

Finally, we know how important it is for hackers to trust the programs they choose to work with. We value that trust. So, beginning today, anyone who files a duplicate report to our program will be added to the original report, when it exists within HackerOne. We're continuing to explore ways to share information about internally known issues with hackers and hope to have a similar announcement later this year.

Learning from Bug Bounty Peers

Towards the end of 2018, we reached out to other bug bounty programs to share experiences and lessons learned. This was amazing. We learned so much chatting with our peers and those conversations gave us better insight into improving our data analytics and experimenting with a private program.

Improving Our Analytics

At Shopify, we make data-informed decisions and our bug bounty program is no exception. However, HackerOne platform data only gives us insight into what hackers are reporting and when; it doesn’t tell us who is testing what and how often. Discussing this problem with other programs revealed how some had already tackled this obstacle; they were leveraging provisioned accounts to understand their program funnel, from invitation, to registration, to account creation, and finally testing. Hearing this, we realized we could do the same.

To participate in our bug bounty program, we have always required hackers to register for an account with a specific identifier (currently a email address). Historically, we used that registration requirement for investigating reports of suspicious platform activity. However, we realized that the same data could tell us how often people are testing our applications. Furthermore, with improvements to the HackerOne API and the ability to export all of our report data regularly, we have all the data necessary to create exciting activity reports and program trends. It’s also given us more knowledge to share in our monthly program recap tweets.

Shopify-Experiments, A Private Bug Bounty Program

Chatting with other programs, we also shared ideas about what is and isn’t working. We heard about some having success running additional private programs. Naturally, we launched a private bug bounty program to test the return on investment. We started Shopify-Experiments in mid-2019 and invited high signal, high impact hackers who have reported to our program previously or who have a proven track record on the HackerOne platform. The program allowed us to run controlled experiments aimed at improving our public program. For example, in 2019, we experimented with:

  • expanding the scope to help us better understand the workload implications
  • paying bounties in full after validating and triaging a report
  • making report disclosure mandatory and adding hackers to duplicate reports
  • allowing for self-closing reports that were submitted in good faith, but were false positives
  • increasing opportunities to collaborate with Shopify third party developers to test their apps.

These experiments had immediate benefits for our Application Security Team and the Shopify public program. For instance, after running a controlled experiment with an expanded scope, we understood the workload it would entail in our public program. So, on September 11, 2019, we added all but a few Shopify-developed apps into the scope of our public program. Since then, we’ve received great reports about these new assets, such as Report 740989 from Vulnh0lic, which identified a misconfiguration in our OAuth implementation for the Shopify Stocky app. If you’re interested in being added to the program, all it takes is 3 resolved Shopify reports with an overall signal of 3.0 or more in our program.

Improving Response Times with Automation

In 2018, our average initial response time was 17 hours. In 2019, we wanted to do better. Since we use a dedicated Slack channel to manage incoming reports, it made sense to develop a chatbot and use the HackerOne API. In January last year, we implemented HackerOne API calls to change report states, assign reports, post public and private comments as well as suggest bounty amounts.

Immediately this gave us better access to responding to reports on mobile devices. However, our chosen syntax was difficult to remember. For example, changing a report state was done via the command hackerone change_state <report_id> <state>. Responding with an auto response was hackerone auto_respond <report_id> <state> <response_id>. To make things easier, we introduced shorthands and emoji responses. Now, instead of typing hackerone change_state 123456 not-applicable, we can use h1 change_state 123456 na. For common invalid reports, we react with emojis which post the appropriate common response and close the report as not applicable.

2019 Bug Bounty Statistics

Knowing how important communication is to our hackers, we continue to pride ourselves on all of our response metrics being among the best on HackerOne. For another straight year, we reduced our communication times. Including weekends, our average time to first response was 16 hours compared to 1 day and 9 hours in 2018. This was largely a result of being able to quickly close invalid reports on weekends with Slack. We reduced our average time to triage from 3 days and 6 hours in 2018 to 2 days and 13 hours in 2019.

We were quicker to pay bounties and resolve bugs; our average time to bounty from submission was 7 days and 1 hour in 2019 versus 14 days in 2018. Our average resolution time from time of triage was down to 20 days and 3 hours from 48 days and 15 hours in 2018. Lastly, we thanked 88 hackers in 2019, compared to 86 in 2018.

Average Shopify Response Times - Hours vs. YearsAverage Shopify Response Times - Hours vs. Years

We continued to request disclosure on our resolved bugs. In 2019, we disclosed 74 bugs, up from 37 in 2018. We continue to believe it’s extremely important that we build a resource library to enable ethical hackers to grow in our program. We strongly encourage other companies to do the same.

Reports Disclosed - Number vs. YearReports Disclosed - Number of Reports vs. Year

In contrast to our speed improvements and disclosures, our bounty related statistics were down from 2018, largely a result of having hosted H1-514 in October 2018, which paid out over $130,000 to hackers. Our total amount paid to hackers was down to $126,100 versus $296,400 in 2018, despite having received approximately the same number of reports; 1,379 in 2019 compared to 1,306 in 2018.

Bounties Paid - Bounties Awarded vs. YearBounties Paid - Bounties Awarded vs. Years

Number of Reports by Year - Number of Reports vs. YearNumber of Reports by Year - Number of Reports vs. Year

Report States by Year - Number of Reports vs. YearReport States by Year - Number of Reports vs. Year

Similarly, our average bounty awarded was also down in 2019, $1,139 compared to $2,052 in 2018. This is partly attributed to the amazing bugs found at H1-514 in October 2018 and our decision to merge the Shopify Scripts bounty program, which had a minimum bounty of $100, to our core bounty program in 2019. We rewarded bounties to fewer reports; 107 in 2019 versus 182 in 2018.

After another successful year in 2019, we’re excited to work with more hackers in 2020. If you’re interested in helping to make commerce more secure, visit to start hacking or our careers page to check out our open Trust and Security positions.

Happy Hacking.
- Shopify Trust and Security

Continue reading

React Native is the Future of Mobile at Shopify

React Native is the Future of Mobile at Shopify

After years of native mobile development, we’ve decided to go full steam ahead building all of our new mobile apps using React Native. As I’ll explain, that decision doesn’t come lightly.

Each quarter, the majority of buyers purchase on mobile (with 71% of our buyers purchasing on mobile in Q3 of last year). Black Friday and Cyber Monday (together, BFCM) are the busiest time of year for our merchants, and buying activity during those days is a bellwether. During this year’s BFCM, Shopify merchants saw another 3% increase in purchases on mobile, an average of 69% of sales.

So why the switch to React Native? And why now? How does this fit in with our native mobile development? It’s a complicated answer that’s best served with a little background.

Mobile at Shopify Pre-2019

We have an engineering culture at Shopify of making specific early technology bets that help us move fast.

On the whole, we prefer to have few technologies as a foundation for engineering. This provides us multiple points of leverage:

  • we build extremely specific expertise in a small set of deep technologies (we often become core contributors)
  • every technology choice has quirks, but we learn them intimately
  • those outside of the initial team contribute, transfer and maintain code written by others
  • new people are onboarded more quickly.

At the same time, there are always new technologies emerging that provide us with an opportunity for a step change in productivity or capability. We experiment a lot for the opportunity to unlock improvements that are an order of magnitude improvement—but ultimately, we adopt few of these for our core engineering.

When we do adopt these early languages or frameworks, we make a calculated bet. And instead of shying away from the risk, we meticulously research, explore and evaluate such risks based on our unique set of conditions. As is often within risky areas, the unexplored opportunities are hidden. We instead think about how we can mitigate that risk:

  • what if a technology stops being supported by the core team?
  • what if we run into a bug we can’t fix?
  • what if the product goes in a direction against our interests?

Ruby on Rails was a nascent and obscure framework when Tobi (our CEO) first got involved as a core contributor in 2004. For years, Ruby on Rails has been seen as a non-serious, non-performant language choice. But that early bet gave Shopify the momentum to outperform the competition even though it was not a popular technology choice. By using Ruby on Rails, the team was able to build faster and attract a different set of talent by using something more modern and with a higher level of abstraction than traditional programming languages and frameworks. Paul Graham talks about his decision to use Lisp in building Viaweb to similar effect and 6 of the 10 most valuable Y Combinator companies today all use Ruby on Rails (even though again, it still remains largely unpopular). As a contrast, none of the Top 10 most valuable Y Combinator companies use Java; largely considered the battle tested enterprise language.

Similarly two years ago, Shopify decided to make the jump to Google Cloud. Again, a scary proposition for the 3rd largest US Retail eCommerce site in 2019—to do a cloud migration away from our own data centers, but to also pick an early cloud contender. We saw the technology arc of value creation moving us to focusing on what we’re good at—enabling entrepreneurship and letting others (in this case Google Cloud) focus on the undifferentiated heavy lifting of maintaining physical hardware, power, security, the operating system updates, etc.

What is React Native?

In 2015, Facebook announced and open sourced React Native; it was already being used internally for their mobile engineering. React Native is a framework for building native mobile apps using React. This means you can use a best-in-class JavaScript library (React) to build your native mobile user interfaces.

At Shopify, the idea had its skeptics at the time (and still does), but many saw its promise. At the company’s next Hackdays the entire company spent time on React Native. While the early team saw many benefits, they decided that we couldn’t ship an app we’d be proud of using React Native in 2015. For the most part, this had to do with performance and the absence of first-class Android support. What we did learn was that we liked the Reactive programming model and GraphQL. Also, we built and open-sourced a functional renderer for iOS after working with React Native. We adopted these technologies in 2015 for our native mobile stack, but not React Native for mobile development en masse. The Globe and Mail documented our aspirations in a comprehensive story about the first version of our mobile apps.

Until now, the standard for all mobile development at Shopify was native mobile development. We built mobile tooling and foundations teams focused on iOS and Android helping accelerate our development efforts. While these teams and the resulting applications were all successful, there was a suspicion that we could be more effective as a team if we could:

  • bring the power of JavaScript and the web to mobile
  • adopt a reactive programming model across all client-side applications
  • consolidate our iOS and Android development onto a single stack.

How React Native Works

React Native provides a way to build native cross platform mobile apps using JavaScript. React Native is similar to React in that it allows developers to create declarative user interfaces in JavaScript, for which it internally creates a hierarchy tree of UI elements or in React terminology a virtual DOM. Whereas the output of ReactJS targets a browser, React Native translates the virtual DOM into mobile native views using platform native bindings that interface with application logic in JavaScript. For our purposes, the target platforms are Android and iOS, but community driven effort have brought React Native to other platforms such as Windows, macOS and Apple tvOS.

ReactJS targets a browser, whereas React Native can can target mobile APIs.
ReactJS targets a browser, whereas React Native can target mobile APIs.

When Will We Not Default to Using React Native?

There are situations where React Native would not be the default option for building a mobile app at Shopify. For example, if we have a requirement of:

  • deploying on older hardware (CPU <1.5GHz)
  • extensive processing
  • ultra-high performance
  • many background threads.

Reminder: Low-level libraries including many open sourced SDKs will remain purely native. And we can always create our own native modules when we need to be close to the metal.

Why Move to React Native Now?

There were 3 main reasons now is a great time to take this stance:

  1. we learned from our acquisition of Tictail (a mobile first company that focused 100% on React Native) in 2018 how far React Native has come and made 3 deep product investments in 2019
  2. Shopify uses React extensively on the web and that know-how is now transferable to mobile
  3. we see the performance curve bending upwards (think what’s now possible in Google Docs vs. desktop Microsoft Office) and we can long-term invest in React Native like we do in Ruby, Rails, Kubernetes and Rich Media.

Mobile at Shopify in 2019

We have many mobile surfaces at Shopify for buyers and merchants to interact, both over the web and with our mobile apps. We spent time over the last year experimenting with React Native with three separate teams over three apps: Arrive, Point of Sale, and Compass.

From our experiments we learned that:

  • in rewriting the Arrive app in React Native, the team felt that they were twice as productive than using native development—even just on one mobile platform
  • testing our Point of Sale app on low-power configurations of Android hardware let us set a lower CPU threshold than previously imagined (1.5GHz vs. 2GHz)
  • we estimated ~80% code sharing between iOS and Android, and were surprised by the extremely high-levels in practice—95% (Arrive) and 99% (Compass)

As an aside, even though we’re making the decision to build all new apps using React Native, that doesn’t mean we’ll automatically start rewriting our old apps in React Native.


At the end of 2018, we decided to rewrite one of our most popular consumer apps, Arrive (which is now Shop app) in React Native. Arrive is no slouch, it’s a highly rated, high performing app that has millions of downloads on iOS. It was a good candidate because we didn’t have an Android version. Our efforts would help us reach all of the Android users who were clamoring for Arrive. It’s now React Native on both iOS and Android and shares 95% of the same code. We’ll do a deep dive into Arrive in a future blog post.

So far this rewrite resulted in:

  • less crashes on iOS than our native iOS app
  • an Android version launched
  • team composed of mobile + non-mobile developers.

The team also came up with this cool way to instantly test work-in-progress pull requests. You simply scan a QR code from an automated GitHub comment on your phone and the JavaScript bundle is updated in your app and you’re now running the latest code from that pull request. JML, our CTO, shared the process on Twitter recently.

Point of Sale

At the beginning of 2019, we did a 6-week experiment on our flagship Point of Sale (POS) app to see if it would be a good candidate for a rewrite in React Native. We learned a lot, including that our retail merchants expect almost 2x the responsiveness in our POS due to the muscle memory of using our app while also talking to customers.

In order to best serve our retail merchants and learn about React Native in a physical retail setting, we decided to build out the new POS natively for iOS and use React Native for Android.

We went ahead with 2 teams for the following reasons:

  1. we already had a team ramped up with iOS expertise, including many of the folks that built the original POS apps
  2. we wanted to be able to benchmark our React Native engineering velocity as well as app performance against the gold standard which is native iOS
  3. to meet the high performance requirements of our merchants, we felt that we’d need all of the Facebook re-architecture updates to React Native before launch (as it turns out, they weren’t critical to our performance use cases). Having two teams on two platforms, de-risked our ability to launch.

We announced a complete rewrite of POS at Unite 2019. Look for both the native iOS and React Native Android apps to launch in 2020!


The Start team at Shopify is tasked with helping folks new to entrepreneurship. Before the company wide decision to write all mobile apps in React Native came about, the team did a deep dive into Native, Flutter and React Native as possible technology choices. They chose React Native and had iOS and Android apps live in the app stores.

The first versions of Compass, which is now Shopify Learn, were launched within 3 months with ~99% of the code shared between iOS and Android. 

Mobile at Shopify 2020+

We have lots in store for 2020.

Will we rewrite our native apps? No. That’s a decision each app team makes independently

Will we continue to hire native engineers? Yes, LOTS!

We want to contribute to core React Native, build platform specific components, and continue to understand the subtleness of each of the platforms. This requires deep native expertise. Does this sound like you?

Partnering and Open Source

We believe that building software is a team sport. We have a commitment to the open web, open source and open standards.

We’re sponsoring Software Mansion and Krzysztof Magiera (co-founder of React Native for Android) in their open source efforts around React Native.

We’re working with William Candillon (host of Can It Be Done in React Native) for architecture reviews and performance work.

We’ll be partnering closely with the React Native team at Facebook on automation, 3rd party libraries and stewardship of some modules via Lean Core.

We are working with Discord to accelerate the open sourcing of FastList for React Native (a library which only renders list items that are in the viewport) and optimizing for Android.

Developer Tooling and Foundations for React Native

When you make a bet and go deep into a technology, you want to gain maximum leverage from that choice. In order for us to build fast and get the most leverage, we have two types of teams that help the rest of Shopify build quickly. The first is a tooling team that helps with engineering setup, integration and deployment. The second is a foundations team that focuses on SDKs, code reuse and open source. We’ve already begun spinning up both of these teams in 2020 to focus on React Native.

Our popular Shopify Ping app (now called Shopify Inbox) which has enabled hundreds of thousands of customer conversations is currently only iOS. In 2020, we’ll be building the Android version using React Native out of our San Francisco office and we’re hiring.

In 2019, Twitter released their desktop and mobile web apps using something called React Native Web. While this might seem confusing, it allows you to use the same React Native stack for your web app as well. Facebook promptly hired Nicolas Gallagher as a result, the lead engineer on the project. At Shopify we’ll be doing some React Native Web experiments in 2020.

Learn More About React Native at Shopify

Join Us

Shopify is always hiring sharp folks in all disciplines. Given our particular stack (Ruby on Rails/React/React Native) we’ve always invested in people even if they don’t have this particular set of experiences coming in to Shopify. In mobile engineering (btw, I love this video about engineering opinions) we’ll continue to write mobile native code and hire native engineers (iOS and Android).

Farhan Thawar is VP Engineering for Channels and Mobile at Shopify
Twitter: @fnthawar

Continue reading

Scaling Mobile Development by Treating Apps as Services

Scaling Mobile Development by Treating Apps as Services

Scaling development without slowing down the delivery speed of new features is a problem that companies face when they grow. Speed can be achieved through better tooling, but the bigger the teams and projects, the more tooling they need. When projects and teams use different tools to solve similar problems, it gets harder for tooling teams to create one solution that works for everybody. Additionally, it complicates knowledge sharing and makes it difficult for developers to contribute to other projects. This lack of knowledge and developers is magnified during incident response because only a handful of people have enough context and understanding of the system to mitigate and fix issues.

At Shopify, we believe in highly aligned, but loosely coupled teams—teams working independently from each other while sharing the same vision and goals—that move fast and minimize slowdowns in productivity. To continue working towards this goal, we designed tools to share processes and best practices that ease collaboration and code sharing. With tools, teams ship code fast while maintaining quality and productivity. Tooling worked efficiently for our web services, but we lacked something similar for mobile projects. Tools enforce processes that increase quality, reliability and reproducibility. A few examples include using

  • continuous Integration (CI) and automated testing
  • continuous Delivery to release new versions of the software
  • containers to ensure that the software runs in a controlled environment.

Treating Apps as Services

Last year, the Mobile Tooling Team shipped tools helping mobile developers be more productive, but we couldn’t enforce the usage of those tools. Moreover, checking which tools mobile apps used required digging into configuration files and scripts spread across different projects repositories. We have several mobile apps available for download between Google Play and the App Store, so this approach didn’t scale.

Fortunately, Shopify has a tool that enforces tool usage and we extended it to our mobile projects. ServicesDB tracks all production services running at Shopify and has three major goals:

  1. keep track of all running services across Shopify
  2. define what it means to own a service, and what the expectations are for an owner
  3. provide tools for owners to improve the quality of the infrastructure around their services.

ServicesDB allows us to treat apps as services with an owner, and for which we define a set of expectations. We specify, in a configuration file, the information that we need to codify best practices and allows us to check for things such as

  • Service Ownership: each project must be owned by a team or an individual, and they must be responsible for its maintenance and development. A team is accountable for any issues or requests that might arise in regards to the app.
  • Contact Information: Slack channels people use if they need more information about a certain mobile app. We also use those channels to notify teams about their projects not meeting the required checks.
  • Testing and Deployment Configuration: CI and our mobile deployment tool, Shipit Mobile, are properly configured. This check is essential because we need to be able to release a new version of our apps at any time.
  • Versioning: Apps use the latest version of our internal tools. With this check we make sure that our dependencies don’t contain known security vulnerabilities.
  • Monitoring: Bug tracking services configured to check for errors and crashes that are happening in production.
ServicesDB performs checks on one of our mobile apps

ServicesDB checks for one of our mobile apps

ServicesDB defines a contract with the development team through automatic checks for tooling requirements on mobile projects that mitigate the problem of understanding how projects are configured and which tools they are using which keeps teams highly aligned, but loosely coupled. Now, the Mobile Tooling team can see if a project can use our tooling. It allows developers to understand why some tools don’t work with their projects, and instructs them on how to fix it, as every check provides a description for how to make it pass. Some common issues are using an outdated Ruby version, or not having a bug tracking tool configured. If any of them fails, we automatically create an issue on Github to notify the team that they aren’t meeting the contract.

Github issue created when a check fails. It contains instructions to fix the failure

Github issue created when a check fails. It contains instructions to fix it.

Abstracting Tooling and Configuration Away

If you want to scale development efficiently, you need to be opinionated about the tools supported. Through ServicesDB we detect misconfigured projects, notify their owners, and help them to fix those issues. At the end of the day, we don’t want our mobile developers to think about tooling and configurations. Our goal is to make commerce better for everyone, so we want people to spend time solving commerce problems that provide a better experience to both buyers and entrepreneurs.

At the moment, we’ve only implemented some basic checks, but in the future we plan to define service level objectives for mobile apps and develop better tools for easing the creation of new projects and reducing build times, all while being confident that they will work as long as the defined contract is satisfied.

Intrigued? Shopify is hiring and we’d love to hear from you. Please take a look at our open positions on the Engineering career page

Continue reading

How to Implement a Secure Central Authentication Service in Six Steps

How to Implement a Secure Central Authentication Service in Six Steps

As Shopify merchants grow in scale they will often introduce multiple stores into their organization. Previously, this meant that staff members had to be invited to multiple stores to setup their accounts. This introduced administrative friction and more work for the staff users who had to manage multiple accounts just to do their jobs.

We created a new service to handle centralized authentication and user identity management called, surprisingly enough, Identity. Having a central authentication service within Shopify was accomplished by building functionality on the OpenID Connect (OIDC) specification. Once we had this system in place, we built a solution to reliably and securely allow users to combine their accounts to get the benefit of single sign-on. Solving this specific problem involved a team comprising product management, user experience, engineering, and data science working together with members spread across three different cities: Ottawa, Montreal, and Waterloo.

The Shop Model

Shopify is built so that all the data belonging to a particular store (called a Shop in our data model) lives in a single database instance. The data includes core commerce objects like Products, Orders, Customers, and Users. The Users model represents the staff members who have access, with specific permissions, to the administration interface for a particular Shop.

Shop Commerce Object Relationships
Shop Commerce Object Relationships

User authentication and profile management belonged to the Shop itself and worked as long as your use of Shopify never went beyond a single store. As soon as a Merchant organization expanded to using multiple stores, the experience for both the person managing store users and the individual users involved more overhead. You had to sign into each store independently as there was no single sign-on (SSO) capabilities because Shops don’t share any data between each other. The users had to manage their profile data, password, and two-step authentication on each store they had access to.

Shop isolation of users
Shop isolation of users

Modelling User Accounts Within Identity

User accounts modelled within our Identity service are two important types: Identity accounts and Legacy accounts. A service or application that a user can access via OIDC is modelled as a Destination within Identity. Examples of destinations within Shopify would be stores, the Partners dashboard, or our Community discussion forums.

A Legacy account only has access to a single store and an Identity account can be used to access multiple destinations.

Legacy account model: one destination per account. Can only access Shops
Legacy account model: one destination per account. can only access Shops

We ensured that new accounts are created as Identity accounts and that existing users with legacy accounts can be safely and securely upgraded to Identity accounts. The big problem was combining multiple legacy accounts together. When a user has the same email to sign into several different Shopify stores we combined these accounts together into a single Identity account without blocking their access to any of the stores they used.

Combined account model: each account can have access to multiple destinations
Combined account model: each account can have access to multiple destinations

There were six steps needed to get us to a single account to rule them all.

  1. Synchronize data from existing user accounts into a central Identity service.
  2. Have all authentication go through the central Identity service via OpenID Connect.
  3. Prompt users to combine their accounts together.
  4. Prompt users to enable a second factor (2FA) to protect their account.
  5. Create the combined Identity account.
  6. Prevent new legacy accounts from being created.

1. Synchronize Data From Existing User Accounts Into a Central Identity Service

We ensured that all user profile and security credential information was synchronized from the stores, where it's managed, into the centralized Identity service. This meant synchronizing data from the store to the Identity service every time one of the following user events occurred

  • creation
  • deletion
  • profile data update
  • security data update (password or 2FA).

2. Have All Authentication Go Through the Central Identity Service Via OpenID Connect (OIDC)

OpenID Connect is an extension to the OpenID 2.0 specification and the method used to delegate authentication from the Shop to the Identity service. Prior to this step, all password and 2FA verification was done within the core Shop application runtime. Given that Shopify shards the database for the core platform by Shop, all of the data associated with a given Shop is available on a single database instance.

One downside with having all authentication go through Identity is that when a user first signs into a Shopify service it requires sending the user’s browser to Identity to perform an OIDC authentication request (AuthRequest), so there is a longer delay on initial sign in to a particular store.

 Users signing into Shopify got familiar with this loading spinner
Users signing into Shopify got familiar with this loading spinner

3. Prompt Users to Combine Their Accounts Together

Users with an email address that can sign into more than one single Shopify service are prompted to combine their accounts together into a single Identity account. When a legacy user is signing into a Shopify product we interrupt the OIDC AuthRequest flow, after verifying they were authenticated but before sending them to their destination, to check if they had accounts that could be upgraded.

There were two primary upgrade paths to an Identity account for a user: auto-upgrading a single legacy account or combining multiple accounts.

Auto-upgrading a single legacy account occurs when a user’s email address only has a single store association. In this case, we convert the single account into an Identity account retaining all of their profile, password, and 2FA settings. Accounts in the Identity service are modelled using single table inheritance with a type attribute specifying which class a particular record uses. Upgrading a legacy account in this case was as simple as updating the value of this type attribute. This required no other changes anywhere else within the Shopify system because the universally unique identifier (UUID) for the account didn't change and this is the value used to identity an account in other systems.

Combining multiple accounts is triggered when a user has more than one active account (legacy or Identity) that uses the same email address. We created a new session object, called a MergeSession, for this combining process to keep track of all the data required to create the Identity account. The MergeSession was associated to an individual AuthRequest which means that when the AuthRequest was completed, the session would no longer be active. If a user went through more than a single combining process we would have to generate a new MergeSession object for each one.

The prompt users saw when they had multiple accounts that could be combined
The prompt users saw when they had multiple accounts that could be combined

Shopify doesn't require users to verify their email address when creating a new store. This means it’s possible that someone could sign up for a trial using an email address they don’t have access to. Because of this we need to verify that you have access to the email address before we show a user information about other accounts with the same email or allow you to take any actions on those other accounts. This verification involves you requesting an email be sent to your address with a link.

If the user’s email address on the store they were signing in to was verified, we list all of the other destinations where their email address was used. If a user hadn't verified their email address for the account they are authenticating into then we would only indicate that there were other accounts and they must verify their email address before proceeding with combining them.

The prompt users saw when they signed in with an unverified email address
The prompt users saw when they signed in with an unverified email address

If any of the accounts that need combining use 2FA then the user had to provide a valid code for each required account. When someone uses SMS as a 2FA method, they could potentially save some time in this step if they use the same phone number across multiple accounts because we only require a single code for all of the destinations that use the same number. This was a secure convenience to our users in an attempt to reduce time spent on this step. Individuals using an authenticator app (e.g. Google Authenticator, Authy, 1Password, etc.), however, had to provide a code per destination because the authenticator app is configured per user account and there’s nothing associating them to one another.

If a user couldn’t provide a 2FA code for any accounts other than the account they are signing into, they are able to exclude that account from being combined. Legitimate reasons why a person may be unable to provide a code include if the account uses an old SMS phone number that the person no longer has access to or the person no longer has an authenticator app configured to generate a code for that account.

The idea here is that any account which was excluded can be combined at a later date when the user re-gains access to the account.

Once the 2FA requirements for all accounts are satisfied we prompt the user to setup a new password for their combined account. We store the encrypted password hash on an object that is keeping track of state for this session.

4. Prompt Users to Enable a Second Factor to Protect Their Account

Having a user engaged in performing account maintenance was an excellent opportunity to expose them to the benefits of protecting their account with a second factor of security. We displayed a different flow to users who already had 2FA enabled on at least one of their accounts being combined as the assumption was they don’t require explanation about what 2FA is but someone who had never set it up most likely would.

5. Create the Combined Identity Account

Once a user had validated their 2FA configuration of choice, or opted out of setting it up, we performed the following actions:

Attach 2FA setup, if present, to an object that keeps track of the specific account combination session (MergeSession).

Merge session object with new password and 2FA configuration.
Merge session object with new password and 2FA configuration.

Inside a single database transaction create the complete new account, associate destinations from legacy accounts to it, and delete the old accounts

We needed to do this inside a transaction after getting all of the information from a user to prevent the potential for reducing the security of their accounts. If a user was using 2FA before starting this process and we created the Identity account immediately after the new password was provided, there exists a small window of time when their new Identity account would be less secure than their old legacy accounts. As soon as the Identity account exists and has a password associated with it, it could be used to access destinations with only knowledge of the password. Deferring account creation until both password and 2FA are defined means that the new account can be as secure as the ones being combined were.

Final state of combined account
Final state of combined account

Generate a session for the new account and use it to satisfy the AuthRequest that initiated this session in the first place.

Some of the more complex pieces of logic for this process included finding all of the related accounts for a given email address and the information about the destinations they had access to, replacing the legacy accounts when creating the Identity account, and ensuring that the Identity account was setup correctly with all of the required data defined correctly. For these parts of the solution we relied on a Ruby library called ActiveOperation. It's a very small framework allowing you to isolate and model business logic within your application in an operation class. Traditionally in a Rails application you end up having to put logic either in your controllers or models and in this case we were able to have controllers and models that were very small by defining the complex business logic as operations. These operations were easily testable given that they were isolated and had very specific responsibilities that each separate class was responsible for.

There are other libraries for handling this kind of business logic process but we chose ActiveOperation because it was easy to use, made our code easier to understand, and had built-in support for the RSpec testing framework we were using.

We added support for the new Web Authentication (WebAuthn) standard in our Identity service just as we were beginning to roll out the account combining flow to our users. This meant that we were able to allow users to use physical security keys as a second factor when securing their accounts rather than just the options of SMS or an authenticator app.

6. Prevent New Legacy Accounts From Being Created

We didn’t want any more legacy accounts created. There were two user scenarios that needed to be updated to use the Identity creation flow: signing up for a new trial store on and inviting new staff members to an existing store.

When signing up for a new store you would enter your email address as part of that process. This email address was used as the primary owner for the new store. With legacy accounts even if the email address belonged to another store we’d still be creating a new legacy account for the newly created store.

When inviting a new staff member to your store you would enter the email address for the new user and an invite would be sent that email address that includes a link to accept the invite and finish setting up their account. Similarly to the store creation process, this would always be a new legacy account on each individual store.

In both cases with the new process we determine whether the email address belongs to an Identity account already and, if so, require the user to be authenticated for the account belonging to that email address before they can proceed.

Build New Experiences for Shopify Users That Rely on SSO Identity Accounts

As of the time of this writing over 75% of active user accounts have been auto-upgraded or combined into a single Identity account. Accounts that don’t require user interaction, such as accounts that can be auto-upgraded, can be done automatically without the user signing in. The accounts that require a user to prove ownership of their accounts can only be combined when logging in. At some point in the future we will prevent users from signing into Shopify without having an Identity account.

When product teams within Shopify can rely on our active users having Identity accounts we can start building new experiences for those users that delegate authentication and profile management to the Identity service. Authorization is still up to the service leveraging these Identity accounts as Identity specifically only handles authentication and knows nothing about the permissions within the services that the accounts can access.

For our users, it means that they don’t have to create and manage a new account when Shopify launches a new service that utilizes Identity for user sign in.

If this sounds like the kind of problems you want to solve, we're always on the lookout for talent and we’d love to hear from you. Visit our Engineering career page to find out about our open positions. 

Continue reading

How Shopify Manages API Versioning and Breaking Changes

How Shopify Manages API Versioning and Breaking Changes

Earlier this year I took the train from Ottawa to Toronto. While I was waiting in line in the main hall of the station, I noticed a police officer with a detection dog. The police officer was giving the dog plenty of time at each bag or person as they worked and weaved their way back and forth along the lines. The dog would look to his handler for direction, receiving it with the wave of a hand or gesture towards the next target. That’s about the moment I began asking myself a number of questions about dogs… and APIs.

To understand why, you have to appreciate that the Canadian government recently legalized cannabis. Watching this incredibly well-trained dog work his way up and down the lines, it made me wonder, how did they “update” the dogs once the legislation changed? Can you really retrain or un-train a dog? How easy is it to implement this change, and how long does it take to roll out? So when the officer ended up next to me I couldn’t help but ask,

ME: “Excuse me, I have a question about your dog if that’s alright with you?”

OFFICER: “Sure, what’s on your mind?”

ME: “How did you retrain the dogs after the legalization of cannabis?”

OFFICER: “We didn’t. We had to retire them all and train new ones. You really can’t teach an old dog new tricks.“

ME: “Wow, seriously? How long did that take?”

OFFICER: “Yep, we needed a full THREE YEARS to retire the previous group and introduce a new generation. It was a ton of work.”

I found myself sitting on the train thinking about how simple it might have been for one layer of government plotting out the changes, to completely underestimate the downstream impact on the K9 unit of the police services. To anyone that didn’t understand the system (dogs), the change sounds simple. Simply detect substances in a set that is now n-1 in size. In reality, due to the way this dog-dependent system works, it requires significant time and effort, and a three-year program to migrate from the old system to the new.

How We Handle API Versioning

At Shopify, we have tens of thousands of partners building on our APIs that depend on us to ensure our merchants can run their businesses every day. In April of this year, we released the first official version of our API. All consumers of our APIs require stability and predictability and our API versioning scheme at Shopify allows us to continue to develop the platform while providing apps with stable API behavior and predictable timelines for adopting changes.

The increasing growth of our API RPM quarter over quarter since 2017 overlaid with growth in active API clients

The increasing growth of our API RPM quarter over quarter since 2017 overlaid with growth in active API clients

To ensure that we provide a stable and predictable API, Shopify releases a new API version every three months at the beginning of the quarter. Version names are date-based to be meaningful and semantically unambiguous (for example, 2020-01).

Shopify API Versioning Schedule

Shopify API Versioning Schedule


Although the Platform team is responsible for building the infrastructure, tooling, and systems that enforce our API versioning strategy at Shopify, there are a 1000+ engineers working across Shopify, each with the ability to ship code that can ultimately affect any of our APIs. So how do we think about versioning, and help manage changes to our APIs at scale?

Our general rule of thumb about versioning is that

API versioning is a powerful tool that comes with added responsibility. Break the API contract with the ecosystem only when there are no alternatives or it’s uneconomical to do otherwise.

API versions and changes are represented in our monolith through new frozen records, one file for versions, and one for changes. API changes are packaged together and shipped as a part of a distinct version. API changes are initially introduced to the unstable version, and can optionally have a beta flag associated with the change, to prevent the change from being visible publicly. At runtime, our code can check whether a given change is in effect through a ApiChange.in_effect? construct. I’ll show you how this, and other methods of the ApiChange module are used in an example later on.

Dealing With Breaking and Non-breaking Changes

As we continue to improve our platform, changes are necessary and can be split into two broad categories: breaking and non-breaking.

Breaking changes are more problematic and require a great deal of planning, care and go-to-market effort to ensure we support the ecosystem and provide a stable commerce platform for merchants. Ultimately, a breaking change is any change that requires a third-party developer to do any migration work to maintain the existing functionality of their application. Some examples of breaking changes are

  • adding a new or modifying an existing validation to an existing resource
  • requiring a parameter that wasn’t required before
  • changing existing error response codes/messages
  • modifying the expected payload of webhooks and async callbacks
  • changing the data type of an existing field
  • changing supported filtering on existing endpoints
  • renaming a field or endpoint
  • adding a new feature that will change the meaning of a field
  • removing an existing field or endpoint
  • changing the URL structure of an existing endpoint.

Teams inside Shopify considering a breaking change conduct an impact analysis. They put themselves into the shoes of a third-party developer using the API and think through the changes that might be required. If there is ambiguity, our developer advocacy team can reach out to our partners to gain additional insight and gauge the impact of proposed changes. 

On the other hand, to determine if a change is non-breaking, a change must pass our forward compatibility test. Forward compatible changes are those which can be adopted and used by any merchant, without limitation, regardless of whether shops have been migrated or any other additional conditions have been met.

Forward compatible changes can be freely adopted without worrying about whether there is a new user experience or the merchant’s data is adapted to work with the change, etc. Teams will keep these changes in the unstable API version and if forward compatibility cannot be met, keep access limited and managed by protecting the change with a beta flag.

Every change is named in the changes frozen record mentioned above, to track and manage the change, and can be referenced by its name, for example,


Analyzing the Impact of Breaking Changes

If a proposed change is identified as a breaking change, and there is agreement amongst the stakeholders that it’s necessary, the next step is to enable our teams to figure out just how big the change’s impact is.

Within the core monolith, teams make use of our API change tooling methods mark_breaking and mark_possibly_breaking to measure the impact of a potential breaking change. These methods work by capturing request metadata and context specific to the breaking code path then emitting this into our event pipeline, Monorail, which places the events into our data warehouse.

The mark_breaking method is called when the request would break if everything else was kept the same, while mark_possibly_breaking would be used when we aren’t sure whether the call would have an adverse effect on the calling application. An example would be the case where a property of the response has been renamed or removed entirely:


Once shipped to production, teams can use a prebuilt impact assessment report to see the potential impact of their changes across a number of dimensions.

Measuring and Managing API Adoption

Once the change has shipped as a part of an official API version, we’re able to make use of the data emitted from mark_breaking and mark_possibly_breaking to measure adoption and identify shops and apps that are still at risk. Our teams use the ApiChange.in_effect? method (made available by our API change tooling) to create conditionals and manage support for the old and new behaviour in our API. A trivial example might look something like this:

The ApiChange module and the automated instrumentation it drives allow teams at Shopify to assess the current risk to the platform based on the proportion of API calls still on the breaking path, and assist in communicating these risks to affected developers.

At Shopify, our ecosystem’s applications depend on the predictable nature of our APIs. The functionality these applications provide can be critical for the merchant’s businesses to function correctly on Shopify. In order to build and maintain trust with our ecosystem, we consider any proposed breaking change thoroughly and gauge the impact of our decisions. By providing the tooling to mark and analyze API calls, we empower teams at Shopify to assess the impact of proposed changes, and build a culture that respects the impact our decisions have on our ecosystem. There are real people out there building software for our merchants, and we want to avoid ever having to ask them to replace all the dogs at once!

We're always on the lookout for talent and we’d love to hear from you. Please take a look at our open positions on the Engineering career page.

Continue reading

Successfully Merging the Work of 1000+ Developers

Successfully Merging the Work of 1000+ Developers

Collaboration with a large team is challenging, and even more so if it’s on a single codebase, like the Shopify monolith. Shopify changes 40 times a day. We follow a trunk-based development workflow and merge around 400 commits to master daily. There are three rules that govern how we deploy safely, but they were hard to maintain at our growing scale. Soft conflicts broke master, slow deployments caused large drift between master and production, and the time to deploy emergency merges slowed due to a backlog of pull requests. To solve these issues, we upgraded the Merge Queue (our tool to automate and control the rate of merges going into master) so it integrates with GitHub, runs continuous integration (CI) before merging to master keeping it green, removes pull requests that fail CI, and maximizes deployment throughput of pull requests.

Our three essential rules about deploying safely and maintaining master:

  1. Master must always be green (passing CI). Important because we must be able to deploy from master at all times. If master is not green, our developers cannot merge, slowing all development across Shopify.
  2. Master must stay close to production. Drifting master too far ahead of what is deployed to production increases risk.
  3. Emergency merges must be fast. In case of emergencies, we must be able to quickly merge fixes intended to resolve the incident.

Merge Queue v1

Two years ago, we built the first iteration of the merge queue inside our open-source continuous deployment tool, Shipit. Our goal was to prevent master from drifting too far from production. Rather than merging directly to master, developers add pull requests to the merge queue which merges pull requests on their behalf.

Merge Queue v1 - developers add pull requests to the merge queue which merges pull requests on their behalfMerge Queue v1

Pull requests build up in the queue rather than merging to master all at once. Merge Queue v1 controlled the batch size of each deployment and prevented merging when there were too many undeployed pull requests on master. It reduced the risk of failure and possible drift from production. During incidents, we locked the queue to prevent any further pull requests from merging to master, giving space for emergency fixes.

Merge Queue v1 browser extensionMerge Queue v1 Browser Extension

Merge Queue v1 used a browser extension allowing developers to send a pull request to the merge queue within the GitHub UI, but also allowed them to quickly merge fixes during emergencies by bypassing the queue.

Problems with Merge Queue v1

Merge Queue v1 kept track of pull requests, but we were not running CI on pull requests while they sat in the queue. On some unfortunate days—ones with production incidents requiring a halt to deploys—we would have upwards of 50 pull requests waiting to be merged. A queue of this size could take hours to merge and deploy. There was also no guarantee that a pull request in the queue would pass CI after it was merged, since there could be soft conflicts (two pull requests that pass CI independently, but fail when merged together) between pull requests in the queue.

The browser extension was a major pain point because it was a poor experience for our developers. New developers sometimes forgot to install the extension which resulted in accidental direct merges to master instead of going through the merge queue, which can be disruptive if the deploy backlog is already large, or if there is an ongoing incident.

Merge Queue v2

This year, we completed Merge Queue v2. We focused on optimizing our throughput by reducing the time that the queue is idle, and improving the user experience by replacing the browser extension with a more integrated experience. We also wanted to address the pieces that the first merge queue didn’t address: keeping master green and faster emergency merges. In addition, our solution needed to be resilient to flaky tests—tests that can fail nondeterministically.

No More Browser Extension

Merge Queue v2 came with a new user experience. We wanted an interface for our developers to interact with that felt native to GitHub. We drew inspiration from Atlantis, which we were already using for our Terraform setup, and went with a comment-based interface.

Merge Queue v2 went with a comment-based interfaceMerge Queue v2 went with a comment-based interface

A welcome message gets issued on every pull request with instructions on how to use the merge queue. Every merge now starts with a /shipit comment. This comment fires a webhook to our system to let us know that a merge request has been initiated. We check if Branch CI has passed and if the pull request has been approved by a reviewer before adding the pull request to the queue. If successful, we issue a thumbs up emoji reaction to the /shipit comment using the GitHub addReaction GraphQL mutation.

In the case of errors, such as invalid base branch, or missing reviews, we surface the errors as additional comments on the pull request.

Jumping the queue by merging directly to master is bad for overall throughput. To ensure that everyone uses the queue, we disable the ability to merge directly to master using GitHub branch protection programmatically as part of the merge queue onboarding process.


However, we still need to be able to bypass the queue in an emergency, like resolving a service disruption. For these cases, we added a separate /shipit --emergency command that skips any checks and merges directly to master. This helps communicate to developers that this action is reserved for emergencies only and gives us auditability into the cases where this gets used.

Keeping Master Green

In order to keep master green, we took another look at how and when we merged a change to master. If we run CI before merging to master, we ensure that only green changes merge. This improves the local development experience by eliminating the cases of pulling a broken master, and by speeding up the deploy process by not having to worry about delays due to a failing build.

Our solution here is to have what we call a “predictive branch,” implemented as a git branch, onto which pull requests are merged, and CI is run. The predictive branch serves as a possible future version of master, but one where we are still free to manipulate it. We avoid maintaining a local checkout, which incurs the cost of running a stateful system that can easily be out of sync, and instead interact with this branch using the GraphQL GitHub API.

To ensure that the predictive branch on GitHub is consistent with our desired state, we use a similar pattern as React’s “Virtual DOM.” The system constructs an in-memory representation of the desired state and runs a reconciliation algorithm we developed that performs the necessary mutations to the state on GitHub. The reconciliation algorithm synchronizes our desired state to GitHub by performing two main steps. The first step is to discard obsolete merge commits. These are commits that we may have created in the past, but are no longer needed for the desired state of the tree. The second step is to create the missing desired merge commits. Once these merge commits are created, a corresponding CI run will be triggered. This pattern allows us to alter our desired state freely when the queue changes and gives us a layer of resiliency in the case of desynchronization.

Merge Queue v2Merge Queue v2 runs CI in the queue

To ensure our goal of keeping master green, we need to also remove pull requests that fail CI from the queue to prevent them from cascading failures to all pull requests behind them. However, like many other large codebases, our core Shopify monolith suffers from flaky tests. The existence of these flaky tests makes removing pull requests from the queue difficult because we lack certainty about whether failed tests are legitimate or flaky. While we have work underway to clean up the test suite, we have to be resilient to the situation we have today.

We added a failure-tolerance threshold, and only remove pull requests when the number of successive failures exceeds the failure tolerance. This is based on the idea that legitimate failures will propagate to all later CI runs, but flaky tests will not block later CI runs from passing. Larger failure tolerances will increase the accuracy, but at the tradeoff of taking longer to remove problematic changes from the queue. In order to calculate the best value, we can take a look at the flakiness rate. To illustrate, let’s assume a flakiness rate of 25%. These are the probabilities of a false positive based on how many successive failures we get.

Failure tolerance
0 25%
1 6.25%
2 1.5%
3 0.39%
4 0.097%

From these numbers, it’s clear that the probability decreases significantly with each increase to the failure tolerance. The possibility will never reach exactly 0%, but in this case, a value of 3 will bring us sufficiently close. This means that on the fourth consecutive failure, we will remove the first pull request failing CI from the queue.

Increasing Throughput

An important objective for Merge Queue v2 was to ensure we can maximize throughput. We should be continuously deploying and making sure that each deployment contains the maximum amount of pull requests we deem acceptable.

To continuously deploy, we make sure that we have a constant flow of pull requests that are ready to go. Merge Queue v2 affords this by ensuring that CI is started for pull requests as soon as they are added to the queue. The impact is especially noticeable during incidents when we lock the queue. Since CI is running before merging to master, we will have pull requests passing and ready to deploy by the time the incident is resolved and the queue is unlocked. From the following graph, the number of queued pull requests rises as the queue gets locked, and then drops as the queue is unlocked and pull requests get merged immediately.

The number of queued pull requests rises as the queue gets locked, and then drops as the queue is unlocked and pull requests get merged immediately

To optimize the number of pull requests for each deploy, we split the pull requests in the merge queue up into batches. We define a batch as the maximum number of pull requests we can put in a single deploy. Larger batches result in higher theoretical throughput, but higher risk. In practice, the increased risk of larger batches impedes throughput by causing failures that are harder to isolate, and results in an increased number of rollbacks. In our application, we went with a batch size of 8 as a balance between throughput and risk.

At any given time, we run CI on 3 batches worth of pull requests in the queue. Having a bounded number of batches ensures that we’re only using CI resources on what we will need soon, rather than the entire set of pull requests in the queue. This helps reduce cost and resource utilization.


We improved the user experience, safety of deploying to production, and throughput of deploys through the introduction of the Merge Queue v2. While we accomplished our goals for our current level of scale, there will be patterns and assumptions that we’ll need to revisit as we grow. Our next steps will focus on the user experience and ensure developers have the context to make decisions every step of the way. Merge Queue v2 has given us flexibility to build for the future, and this is only the beginning of our plans to scale deploys.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Four Steps to Creating Effective Game Day Tests

Four Steps to Creating Effective Game Day Tests

At Shopify, we use Game Day tests to practice how we react to unpredictable situations. Game Day tests involve deliberately triggering failure modes within our production systems, and analyzing whether the systems handle these problems in the ways we expect. I’ll walk through a set of best practices that we use for our internal Shopify Game Day tests, and how you can apply these guidelines to your own testing.

Shopify’s primary responsibility is to provide our merchants with a stable ecommerce platform. Even a small outage can have a dramatic impact on their businesses, so we put a lot of work into preventing them before they occur. We verify our code changes rigorously before they’re deployed, both through automated tests and manual verification. We also require code reviews from other developers who are aware of the context of these changes and their potential impact to the larger platform.

But these upfront checks are only part of the equation. Inevitably, things will break in ways that we don’t expect, or due to forces that are outside our control. When this happens, we need to quickly respond to the issue, analyze the situation at hand, and restore the system back to a healthy state. This requires close coordination between humans and automated systems, and the only way to ensure that it goes smoothly is to practice it beforehand. Game Day tests are a great way of training your team to expect the unexpected.

1. List All the Things That Could Break

The first step to running a successful Game Day test is to compile a list of all the potential failure scenarios that you’re interested in analyzing. Collaborate with your team to take a detailed inventory of everything that could possibly cause your systems to go haywire. List all the problem areas you know about, but don’t stop there—stretch your imagination! 

  • What are the parts of your infrastructure that you think are 100% safe? 
  • Where are your blind spots?
  • What would happen if your servers started inexplicably running out of disk space? 
  • What would happen if you suffered a DNS outage or a DDOS attack? 
  • What would happen if all network calls to a host started timing out?
  • Can your systems support 20x their current load?

You’ll likely end up with too many scenarios to reasonably test during a single Game Day testing session. Whittle down the list by comparing the estimated impact of each scenario against the difficulty you’d face in trying to reasonably simulate it. Try to avoid weighing particular scenarios based on your estimates of the likelihood that those scenarios will happen. Game Day testing is about insulating your systems against perfect storm incidents, which often hinge on failure points whose danger was initially underestimated.

2. Create a Series of Experiments

At Shopify, we’ve found that we get the best results from our Game Day tests when we run them as a series of controlled experiments. Once you’ve compiled a list of things that could break, you should start thinking about how they will break, as a list of discrete hypotheses. 

  • What are the side effects that you expect will be triggered when you simulate an outage during your test? 
  • Will the correct alerts be dispatched? 
  • Will downstream systems manifest the expected behaviors?
  • When you stop simulating a problem, will your systems recover back to their original state?

If you express these expectations in the form of testable hypotheses, it becomes much easier to plan the actual Game Day session itself. Use a separate spreadsheet (using a tool like Google Sheets or similar) to catalogue each of the prerequisite steps that your team will walk through to simulate a specific failure scenario. Below those steps indicate the behaviors that you hypothesize will occur when you trigger that scenario, along with an indicator for whether this behavior occurs. Lastly, make sure to list the necessary steps to restore your system back to its original state.

Example spreadsheet for a Game Day test that simulates an upstream service outage. A link to this spreadsheet is available in the “Additional Resources” section below.

Example spreadsheet for a Game Day test that simulates an upstream service outage. A link to this spreadsheet is available in the “Additional Resources” section below. 

3. Test Your Human Systems Too

By this point, you’ve compiled a series of short experiments that describe how you expect your systems to react to a list of failure scenarios. Now it’s time to run your Game Day test and validate your experimental hypotheses. There are a lot of different ways to run an Game Day test. One approach isn’t necessarily better than another. How you approach the testing should be tailored to the types of systems you’re testing, the way your team is structured and communicates, the impact your testing poses to production traffic, and so on. Whatever approach you take, just make sure that you track your experiment results as you go along!

However, there is one common element that should be present regardless of the specifics of your particular testing setup: team involvement. Game Day tests aren’t just about seeing how your automated systems react to unexpected pressures—you should also use the opportunity to analyze how your team handles these situations on the people side. Good team communication under pressure can make a huge difference when it comes to mitigating the impact of a production incident. 

  • What are the types of interactions that need to happen among team members as an incident unfolds? 
  • Is there a protocol for how work is distributed among multiple people? 
  • Do you need to communicate with anyone from outside your immediate team?

Make sure you have a basic system in place to prevent people from doing the same task twice, or incorrectly assuming that something is already being handled.

4. Address Any Gaps Uncovered

After running your Game Day test, it’s time to patch the holes that you uncovered. Your experiment spreadsheets should be annotated with whether each hypothesis held up in practice.

  • Did your off hours alerting system page the on-call developer? 
  • Did you correctly switch over to reading from the backup database? 
  • Were you able to restore things back to their original healthy state?

For any gaps you uncover, work with your team to determine why the expected behavior didn’t occur, then establish a plan for how to correct the failed behavior. After doing so, you should ideally run a new Game Day test to verify that your hypotheses are now valid with the new fixes in place.

This is also the opportunity to analyze any gaps in communication between your team, or problems that you identified regarding how people distribute work among themselves when they’re under pressure. Set aside some time for a follow up discussion with the other Game Day participants to discuss the results of the test, and ask for their input on what they thought went well versus what could use some improvement. Finally, make any necessary changes to your team’s guidelines for how to respond to these incidents going forward.

In Conclusion

Using these best practices, you should be able to execute a successful Game Day test that gives you greater confidence in how your systems—and the humans that control them—will respond during unexpected incidents. And remember that a Game Day test isn’t a one-time event: you should periodically update your hypotheses and conduct new tests to make sure that your team remains prepared for the unexpected. Happy testing!

Additional resources


Continue reading

Sam Saffron AMA: Performance and Monitoring with Ruby

Sam Saffron AMA: Performance and Monitoring with Ruby

Sam Saffron is a co-founder of Discourse and the creator of the mini_profiler, memory_profiler, mini_mime and mini_racer gems. He has written extensively about various performance topics on and is dedicated to ensuring Discourse keeps running fast.

Sam visited Shopify in Ottawa and talked to us about Discourse’s approach to Ruby performance and monitoring. He also participated in an AMA and answered the top voted questions submitted by Shopifolk which we are sharing here.

Ruby has a bad reputation when it comes to performance. What do you think are the actual problems? And do you think the community is on the right track to fix this reputation?

Sam Saffron: I think there are a lot of members of the community that are very keen to improve performance. And this runs all the way from above. DHH is also very interested in improving performance of Ruby.

I think the big problem that we have is resources and focus. A lot of times, I can feel that as a community we’re not focusing necessarily on the right thing. It’s very tempting, in performance, just to look at a micro bench. And it’s easy just to look at micro bench and make something 20 times faster, but in the big scheme of things you may not be fixing the right thing. So, it doesn’t make a big difference.

I think one area that Ruby can get better at, is finding the actual real production bottlenecks that people are seeing out there, and working towards solving them. And when I think about performance for us at Discourse, the biggest pain is memory, not CPU. When looking at adoption of Discourse, a lot of it depends on the people being able to run it on very cheap servers and they’re very constrained on memory. It’s a huge difference to adoption for us whether we can run on a 512MB system versus 1024MB. We see these memory issues in our hosting as well, our CPUs are usually doing okay, but memory is where we have issues. I wish the community would focus more on memory.

Just to summarize, I wish we looked at what big pain points consumers in the ecosystem are having and just set the agenda based on that. The other thing would be to spend more time on memory.

Are there any Ruby features or patterns that you generally avoid for performance reasons?

Sam Saffron: That’s an interesting question. Well, I’ll avoid ActiveRecord sometimes if I have something performance sensitive. For example, when I think of a user flow that I’m working on, it could be one that the user will visit once a month, or it could be one an extremely busy route like the topic page. If I’m working on the topic page, it’s a performance sensitive area, then maybe I may opt to skip ActiveRecord and just use MiniSql.

As for using Ruby patterns, I don’t go and write while loops just because I hate blocks and I know that blocks are a little bit slower. I like how wonderful Ruby looks and how wonderful it reads. So, I won’t be like, “Oh, yeah, I have to write C in Ruby now because I don’t want to use blocks anywhere.” I think it’s a there’s a balancing act with patterns and I’ll only strive or move away for two reasons. One is clarity. If the code will be clearer without like using some of these sophisticated patterns, I’ll just go for clear and dumb versus fancy, sophisticated and pretty. I prefer clear and dumb. An example of that is I hate using /unless/. It’s a pet peeve that I have, I won’t use the /unless/ keyword because I find it harder for me to comprehend what the code means. And the second is for performance reasons only. Only rarely where I absolutely have to take the performance hit, will I do that.

Sam Saffron presenting at Shopify in OttawaSam Saffron presenting at Shopify in Ottawa

What is the right moment to shift focus on the performance of a product, rather than on other features? Do you have any tripwires or metrics in place?

Sam Saffron: We’re constantly thinking about performance at Discourse. We’ve always got the monitoring in place and we’re always looking at our graphs to see how things are going. I don’t think performance is something that you forget about for two years then go back and say, “Yeah, we’ll do a round of performance now.” I think there should be a culture of performance instilled day-to-day and always be considering it. It doesn’t mean performance the only thing you should be thinking about but it should be in the back of your mind as something that is a constant that you are trying to do.

There’s a balancing act. You want to ship new features, but as long as performance is something the team is constantly thinking about, then I think it’s safe. I would never consider shipping a new feature that is very slow just because I want to get the feature out there. I prefer to have the feature both correct and fast before shipping it.

What was one of the most difficult performance bugs you’ve found? How did you stay focused and motivated?

Sam Saffron: The thing that keeps me focused is having very clear goals. It’s important when you’re dealing with performance issues. You have a graph, it’s going a certain shape, and you want to change the shape of it. That’s your goal. You forget about everything else and it’s about taking that graph from this shape to that shape. When you can break a problem down from something that is impossible into something that is practical and easy to reason about, it’s at that point, you can attack these problems.

Particular war stories are hard—there’s nothing that screams out at me as the worst bug we’ve had. I guess memory leaks have been traditionally, some of the hardest problems we’ve faced. Back in the old days we used the TheRubyRacer, and it had a leak in the interop layer between Ruby and V8. It was a nightmare to find, because you’d have these processes that just keep climbing, and you don’t know what’s responsible for it. It’s something random that you’re doing but how do you get to it? So we looked at that graph and start removing parts of the app and when you remove half of the app, the graph is suddenly stable. So, we put the other half of the app back in and slowly bisect it until you find the problem area and start resolving it. Luckily these days the tooling for debugging memory leaks is far more advanced making it much easier to deal with issues like this.

Do you employ any kind of performance budgeting in your products and/or libraries? If you do, what metrics do you monitor and how do you decide on a budget?

Sam Saffron: Well, one constant budget I have is that any new dependency in our gem file has to be approved by me, and people have to justify its use. So I think dependencies are a big thing which is part of performance budget. In that, it’s easy to add dependencies, but to remove them later is very hard. I need to make sure that every new dependency we add is part of a performance budget that we agree we absolutely need it.

I’m constantly thinking about our performance budget. We’ve got the budget on boot. I’m very proud of the way that I can boot Rails console in under two seconds on my laptop. So boot budget is important to me, especially for dev work. If I want to just open a Rails console, I just do it. I don’t have to think that I’m going to have to wait 20 seconds for this thing to boot up. I might as well go and browse the web.

We’ve got this constant budget, they’re the high profile pages. We can’t afford any of regression there. So, one thing that we’re looking at adding is alerts. If the query count on a topic page is now sitting on a median of 60 queries to SQL, if it goes up to 120, I want to get an alert saying, “There are 120 queries on this page, and there used to be 60 only.” So somebody will have a look at that, and it’ll open an alert topic on Discourse. So I definitely do want to get into more alerting that say, “Look, something happened at this point, look at it.”

What’s your take on the different Ruby runtimes out there? Is MRI still the “go to one” for a new project? If so, what do you think are the other ones missing to become real contenders?

Sam Saffron: We’ve always wanted Discourse to work on a wide array of platforms. That’s been a goal because when we started it was just about pure adoption. We didn’t care if people were paying us or not paying us, we just wanted the software to be adopted. So if it can run on JRuby, all power to JRuby—it makes adoption easier. The unfortunate thing that happened over the years is that we have never been able to run Discourse on JRuby, and they’ve been attempts out there but we are not quite there. Being able to host V8 in Java in JRuby is very very hard. A lot of what we do is married to the C implementation. It’s extremely hard to move to another world. I want there to be diversity, but unfortunately the only option we have at the moment is MRI, and I don’t see any other options in the next couple of years popping up that would be feasible.

Matz (Yukihiro Matsumoto) is saying that he wants Ruby 3 to be three times faster. Are you following the Ruby 3 development? Do you think they are going in the right direction?

Sam Saffron: I think there’s definitely a culture of performance at CRuby. There are a lot of improvements happening patch after patch where they are shaving this bit off and that bit off. CRuby itself, is tracking well but whether it’ll get three times faster or not, I don’t know. Where it gets complicated, the ecosystem itself is tracking its own trajectory and that’s where it gets complicated. There’s one trajectory for the engine, but the other trajectory for the ecosystem.

If you look at things like Active Record, it’s not tracking three times faster for the next version of Rails, unfortunately. And that’s where all our pain is at the moment. When you look at what CRuby is doing, the goal is not making Active Record three times faster because it’s not a goal that is even practical for them to take on. So, they’re just dealing with little micro benchmarks that may help this situation or they may not help the situation, we don’t know.

Overall, Do I think MRI is tracking well? Yes, MRI is tracking well, but I think we need to put a lot more focus around the ecosystem, if we want to the ecosystem to be 3x faster.

Is there any performance tooling that you think MRI is missing right now?

Sam Saffron: Yes. I’d say memory profiling is the big tooling piece that is missing. We have a bunch of tooling, for example, you can get full heap dumps. But the issue is how are you going to analyze it? The tooling for analysis is woeful, to say the least. If you compare Ruby on Rails to what they have in Java or .NET, we’re worlds behind. In Java and .NET, when it comes to tooling for looking at memory, you can get back traces from where something is allocated. In MRI, at best, you can get a call site of where something was allocated, you can’t get the full backtrace of where it was allocated. Having the full backtrace gives you significantly more tools to figure out and pinpoint what it is.

So, I’d say there are some bits missing of raw information that you could opt in for, that would be very handy. And a lot of tooling around visualizing and analyzing what is going on, especially when it comes to the world between managed and unmanaged because it’s very murky.

People look at a process and the process is consuming one gig of memory, and they want to know why. And if you were able at Shopify, for example, to have that picture immediately of why? You might say, well, maybe killing Unicorn workers is not what we need because all the memory looks like this and it’s coming from here. Maybe we just rewrite this little component and we don’t have to kill these Unicorns anymore because we’ve handled the root cause. I think that area is missing.

Intrigued about scaling using Ruby? Shopify is hiring and we’d love to hear from you. Please take a look at our open positions on the Engineering career page.

Continue reading

Make Great Decisions Quickly with TOMASP

Make Great Decisions Quickly with TOMASP

As technical leaders and managers, our job is to make the right decision most of the time. Hiring, firing, technology choices, software architecture, and project prioritization are examples of high impact decisions that we need to make right if our teams are to be successful.

This is hard. As humans, we're naturally bad at making those types of decisions. I’ll show you how you can consistently make great decisions quickly using a simple framework called TOMASP. I made up the acronym for the purpose of this blog post but it’s inspired from many great books (see resources) as well as my personal experience of leading engineering teams for almost 15 years.

Let’s start with a concrete real world example:

Michelle, a technical lead for a popular mobile app is agonizing about whether or not she should direct her team to rewrite the app using Flutter, a new technology for building mobile apps.

Flutter has an elegant architecture that should make development much faster without compromising quality. It was created by Google and is already in use by several other reputable companies. If Flutter delivers on its promises, Michelle’s team has a good chance of achieving their goals which seem highly unlikely with the current tech stack.

But starting a big rewrite now will be hard. It’s going to be difficult to get buy-in from senior leadership, no one on the team has experience with Flutter and Mike, one of the senior developers on the team is really not interested in trying something new and will probably quit if she decides to more forward with Flutter.

Before reading further ask yourself, what is the right decision here? What would you advise Michelle to do? Should she rewrite the app using Flutter or not?

I have asked this question many times and I bet most of you have an opinion on the matter. Now think about it, how much do you really know about Michelle and her team? How much do you know about the app and the problem they’re trying to solve? We will get back to Michelle and her difficult decision by the end of this post but first a little bit of theory.

How We Make the Wrong Decisions

“The normal state of your mind is that you have feelings and opinions about almost everything that comes your way”

Daniel Kahneman - Thinking, Fast and Slow

This ability of our mind to form opinions very quickly and automatically is what enables us to make thousands of decisions every day, but it can get in the way of making the best decision when the decision is complex and the impact is high. This is just one of the ways our brain can trick us into making the wrong decision.

Here are some other examples:

  • We are highly susceptible to cognitive biases
  • We put too much weight on short term emotion
  • We are over confident about how the future will unfold (when was the last time your project finished sooner than you anticipated?)

The good news here is that it’s possible, through deliberate practice, to counteract those biases and make great decisions quickly even in complex high impact situations.

“We can’t turn off our biases, but we can counteract them”

Chip Heath, Dan Heath - Decisive

What is a Great Decision?

Before I show you how to counteract your biases using TOMASP, we need to get on the same page as to what is a great decision. Let’s go through a couple of examples.

An example of a good decision:

In 2017 Shopify started to migrate its production infrastructure to Google Cloud ... 

Scaling up for BFCM used to take months, now it only takes a few days✌️.

In my experience this image is the mental model that most people have when they think of great decisions:

A Decision Has a Direct Link to the Impact
A decision has a direct link to the impact

In the previous example the decision is to move to google cloud and the impact is the reduced effort to prep for BFCM.

Now let’s look at an example of a bad decision:

In 2017 Shopify started to migrate its production infrastructure to Google Cloud… 2 years later, Shopify is down for all merchants due to an outage in Google Cloud 😞.

Do you notice how the previous mental model is too simplistic? The same decision often leads to multiple outcomes.

Here is a better mental model for decisions:

a decision leads to execution which leads to multiple impacts. Moreover, things outside of our control will also affect the outcomes

A decision leads to execution which leads to multiple impacts. Moreover, things outside of our control will also affect the outcomes

Some things are outside of our control and a single decision often has multiple outcomes. Moreover, we never know the alternative outcomes (i.e. what would have happened if we had taken a different decision).

Considering this, we have to recognize that a great decision is NOT about the outcomes. A great decision is about how the decision is made & implemented. More specifically a great decision is timely, considers many alternatives, recognizes biases and uncertainty.


To put that in practice think TOMASP. TOMASP is an acronym to remember those specific behaviour you can take to counteract your biases and make better decisions.

Timebox (T) the Decision

Define ahead of time how much time is this decision worth.

You Need This If…

It’s unclear when the decision should be made and how much time you should spend on it.

How to Do It

If the decision is hard to reverse, aim to make it the same week, otherwise aim for same day. One week for a “hard to reverse” decision might sound too little time, and it probably is. The intent here is to focus the attention and to prioritize. In my experience this can lead to a few different outcomes:

  1. Most likely, this is actually not a hard to reverse decision and aiming to make it on the same week will lead you to focus on risk management and identify how you can reverse the decision if needed
  2. This is truly a hard to reverse decision and it shouldn’t be made this week, however there are aspects that can be decided this week, such as how to go about making the decision (e.g who are the key stakeholders, what needs to be explored)

Multiple decisions are often made at the same time, whenever this happens make sure you’re spending the most time on the most impactful decision.

This Helps Avoid

  • Analysis Paralysis: over-analyzing (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome
  • Bikeshedding: spending a disproportionate amount of time and effort on a trivial or unimportant detail of a system, such as the color of a bikeshed for a nuclear plant

Generate More Options (O)

Expand the number of alternative you’re considering.

You Need This If…

You’re considering “whether or not” to do something.

How to Do It

Aim to generate at least 3 reasonable options.

Do the Vanishing Option Test: if you couldn’t do what you’re currently considering, what else would you do.

Describe the problem, not the solutions, and ask diverse people for ideas, not for feedback (yet).

This Helps Avoid

  • Narrow framing: making a decision without considering the whole context
  • Taking it personally: by truly considering more than 2 options you will become less personally attached to a particular course of action

Meta (M) Decision

Decide on how you want to make the decision

You Need This If…

It’s hard to build alignment with your team or your stakeholders on what is the right decision.

How to Do It

Ask what should we be optimizing for?

Define team values or principles first and then use them to inform the decision.

Look for heuristics. For instance at Shopify we have the following heuristic to quickly choose a programming language: if it’s server side business logic we default to ruby, if it’s performance critical or needs to be highly concurrent we use Go.

This Helps Avoid

  • Mis-alignment with your team or stakeholders: I have found it easier to agree on the criteria to make the decision and the criteria can then be leveraged to quickly align on many decisions.

  • Poor implementation: having explicit decision-making criteria will make it a lot easier to articulate the rationale and to give the proper context for anyone executing on it.

Analyze (A) Your Options

Make a table to brainstorm and compare the “pros” and “cons” of each options.

You Need This If…

There is consensus very quickly or (if you’re making a decision on your own) you have very weak “pros” for all but one option.

How to Do It

Make your proposal look like a rough draft to make it easier for people to disagree.

Nominate a devil’s advocate, someone whose role is to argue for the opposite of what most people are leaning towards.

Make sure you have a diverse set of people analysing the options. I’ve gotten in trouble before when there were only developers in the room and we completely missed the UX trade-off of our decision.

For each option that you are considering, ask yourself what would have to be true for this to be the right choice.

This Helps Avoid

  • Groupthink

  • Confirmation bias

  • Status-quo bias

  • Blind spots

Step (S) Back

Hold off on making the decision until the conditions are more favorable.

You Need This If…

It’s the end of the day or the end of the week and emotions are high or energy is low.

How to Do It

Go have lunch, sleep on it, wait until Monday (or until after your next break if you don’t work Monday to Friday).

Do 10/10/10 analysis: this is another trick I learned from the book Decisive (see resources). Ask yourself how you would feel about the decisions 10 mins later, 10 months later and 10 years later. The long term perspective is not necessarily the right one but thinking about those different timescales help put the short term emotion in perspective.

Ask yourself these two questions:

  1. What would you advise your best friend?
  2. What would your replacement do?

This Helps Avoid

  • Putting too much weight on short term emotions

  • Irrational decision making due to low energy or fatigue

Prepare (P) to be Wrong

Chances are, you’re over-confident about how the future will unfold.

You Need This If…

Always do this :-)

How to Do It

Set “tripwires”: systems that will snap you to attention when a decision is needed. For example a development project can be split into multiple phases with clear target dates and deliverable. At Shopify, we typically split project into think, explore, build and release phases. The transition between each phase acts as a tripwire. For example, before moving to build the team and stakeholders review the technical design (the deliverable for that phase) and have to make a conscious decision to continue the project or pause it.

Whenever a phase is expected to be over 4 weeks, I like to break it down further into milestones. Again, it’s essential that each milestone has a clear target date and deliverable (e.g 50% of the tasks are completed by Oct 10th) so that it can act as a tripwire.

You can setup additional tripwires by doing a pre-mortem analysis: imagine the worst case scenario, now brainstorm potential root causes. You now have leading indicators that you can monitor and use as tripwires.

This Helps Avoid

  • Reacting too slowly: setting tripwires will help you detect early when things are going off the rails.

TOMASP in Action

At the beginning of this post, I gave the following example:

Michelle, a technical lead for a popular mobile app is agonizing about whether or not she should direct her team to rewrite the app using Flutter, a new technology for building mobile apps.

Flutter has an elegant architecture that should make development much faster without compromising quality. It was created by Google and is already in use by several other reputable companies. If Flutter delivers on its promises, Michelle’s team has a good chance of achieving their goals which seem highly unlikely with the current tech stack.

But starting a big rewrite now will be hard. It’s going to be hard to get buy-in from senior leadership, no one on the team has experience with Flutter and Mike, one of the senior developers on the team is really not interested in trying something new and will probably quit if she decides to more forward with Flutter.

Here is how Michelle can use TOMASP to make a Great Decision Quickly:

  • Timebox (T):
    • This feels like a hard to reverse decision, so Michelle aims to make it by the end of the week.
  • Generate More Options (O):
    • Michelle uses the Vanishing Option Test to think of alternatives. If she couldn’t rewrite the whole app using Flutter what could she do?
    • Use a hybrid approach and only rewrite a section of the app in Flutter.
    • Have the iOS and Android developers systematically pair-program when implementing features.
    • Use another cross-platform framework such as React Native or Xamarin.
  • Meta (M) Decision:
    • What should Michelle optimize for? She comes up with the following hierarchy: 1) cross-platform consistency 2) performance 3) development speed
  • Analyze (A) Options:
    • Michelle concludes that for Flutter to be the right choice, a developer should be able to deliver the same level of quality in 50% or less of the time (to account for the risk and learning time of using a new technology).
  • Step (S) Back:
    • Michelle decides to make the decision first thing Friday morning and do a 10/10/10 analysis to ensure she’s not putting too much weight on short term emotion.
  • Prepare (P) to be Wrong:
    • Michelle decides to timebox a prototype: over the next 2 weeks she will pair with a developer on her team to build a section of the app using Flutter. She will then ask her team members to do a blind test and see if they can guess which part of the app has been rebuilt using Flutter.

That’s it! Even if Michelle ends up making the same decision, notice how much better she’s prepared to execute on it.

Thanks for reading, I hope you find this decision framework useful. I would be very interested in hearing how you’ve put TOMASP to use, please let me know by posting a comment below.

Some great resources:

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Five Common Data Stores and When to Use Them

Five Common Data Stores and When to Use Them

An important part of any technical design is choosing where to store your data. Does it conform to a schema or is it flexible in structure? Does it need to stick around forever or is it temporary?

In this article, we’ll describe five common data stores and their attributes. We hope this information will give you a good overview of different data storage options so that you can make the best possible choices for your technical design.

The five types of data stores we will discuss are

  1. Relational database
  2. Non-relational (“NoSQL”) database
  3. Key-value store
  4. Full-text search engine
  5. Message queue

Relational Database

Databases are, like, the original data store. When we stopped treating computers like glorified calculators and started using them to meet business needs, we started needing to store data. And so we (and by we, I mean Charles Bachman) invented the first database management system in 1963. By the mid to late ‘70s, these database management systems had become the relational database management systems (RDBMSs) that we know and love today.

A relational database, or RDB, is a database which uses a relational model of data.

Data is organized into tables. Each table has a schema which defines the columns for that table. The rows of the table, which each represent an actual record of information, must conform to the schema by having a value (or a NULL value) for each column.

Each row in the table has its own unique key, also called a primary key. Typically this is an integer column called “ID.” A row in another table might reference this table’s ID, thus creating a relationship between the two tables. When a column in one table references the primary key of another table, we call this a foreign key.

Using this concept of primary keys and foreign keys, we can represent incredibly complex data relationships using incredibly simple foundations.

SQL, which stands for structured query language, is the industry standard language for interacting with relational databases.

At Shopify, we use MySQL as our RDBMS. MySQL is durable, resilient, and persistent. We trust MySQL to store our data and never, ever lose it.

Other features of RDBMSs are

  • Replicated and distributed (good for scalability)
  • Enforces schemas and atomic, consistent, isolated, and durable (ACID) transactions (leads to well-defined, expected behavior of your queries and updates)
  • Good, configurable performance (fast lookups, can tune with indices, but can be slow for cross-table queries)

When to Use a Relational Database

Use a database for storing your business critical information. Databases are the most durable and reliable type of data store. Anything that you need to store permanently should go in a database.

Relational databases are typically the most mature databases: they have withstood the test of time and continue to be an industry standard tool for the reliable storage of important data.

It’s possible that your data doesn’t conform nicely to a relational schema or your schema is changing so frequently that the rigid structure of a relational database is slowing down your development. In this case, you can consider using a non-relational database instead.

Non-Relational (NoSQL) Database

Computer scientists over the years did such a good job of designing databases to be available and reliable that we started wanting to use them for non-relational data as well. Data that doesn’t strictly conform to some schema or that has a schema which is so variable that it would be a huge pain to try to represent it in relational form.

These non-relational databases are often called “NoSQL” databases. They have roughly the same characteristics as SQL databases (durable, resilient, persistent, replicated, distributed, and performant) except for the major difference of not enforcing schemas (or enforcing only very loose schemas).

NoSQL databases can be categorized into a few types, but there are two primary types which come to mind when we think of NoSQL databases: document stores and wide column stores.

(In fact, some of the other data stores below are technically NoSQL data stores, too. We have chosen to list them separately because they are designed and optimized for different use cases than these more “traditional” NoSQL data stores.)

Document Store

A document store is basically a fancy key-value store where the key is often omitted and never used (although one does get assigned under the hood—we just don’t typically care about it). The values are blobs of semi-structured data, such as JSON or XML, and we treat the data store like it’s just a big array of these blobs. The query language of the document store will then allow you to filter or sort based on the content inside of those document blobs.

A popular document store you might have heard of is MongoDB.

Wide Column Store

A wide column store is somewhere in between a document store and a relational DB. It still uses tables, rows, and columns like a relational DB, but the names and formats of the columns can be different for various rows in the same table. This strategy combines the strict table structure of a relational database with the flexible content of a document store.

Popular wide column stores you may have heard of are Cassandra and Bigtable.

At Shopify, we use Bigtable as a sink for some streaming events. Other NoSQL data stores are not widely used. We find that the majority of our data can be modeled in a relational way, so we stick to SQL databases as a rule.

When to use a NoSQL Database

Non-relational databases are most suited to handling large volumes of data and/or unstructured data. They’re extremely popular in the world of big data because writes are fast. NoSQL databases don’t enforce complicated cross-table schemas, so writes are unlikely to be a bottleneck in a system using NoSQL.

Non-relational databases offer a lot of flexibility to developers, so they are also popular with early-stage startups or greenfield projects where the exact requirements are not yet clear.

Key-Value Store

Another way to store non-relational data is in a key-value store.

A key-value store is basically a production-scale hashmap: a map from keys to values. There are no fancy schemas or relationships between data. No tables or other logical groups of data of the same type. Just keys and values, that’s it.

At Shopify, we use two key-value stores: Redis and Memcached.

Both Redis and Memcached are in-memory key-value stores, so their performance is top-notch.

Since they are in-memory, they (necessarily) support configurable eviction policies. We will eventually run out of memory for storing keys and values, so we’ll need to delete some. The most popular strategies are Least Recently Used (LRU) and Least Frequently Used (LFU). These eviction policies make key-value stores an easy and natural way to implement a cache.

(Note: There are also disk-based key-value stores, such as RocksDB, but we have no experience with them at Shopify.)

One major difference between Redis and Memcached is that Redis supports some data structures as values. You can declare that a value in Redis is a list, set, queue, hash map, or even a HyperLogLog, and then perform operations on those structures. With Memcached, everything is just a blob and if you want to perform any operations on those blobs, you have to do it yourself and then write it back to the key again.

Redis can also be configured to persist to disk, which Memcached cannot. Redis is therefore a better choice for storing persistent data, while Memcached remains only suitable for caches.

When to use a Key-Value Store

Key-value stores are good for simple applications that need to store simple objects temporarily. An obvious example is a cache. A less obvious example is to use Redis lists to queue units of work with simple input parameters.

Full-Text Search Engine

Search engines are a special type of data store designed for a very specific use case: searching text-based documents.

Technically, search engines are NoSQL data stores. You ship semi-structured document blobs into them, but rather than storing them as-is and using XML or JSON parsers to extract information, the search engine slices and dices the document contents into a new format that is optimized for searching based on substrings of long text fields.

Search engines are persistent, but they’re not designed to be particularly durable. You should never use a search engine as your primary data store! It should be a secondary copy of your data, which can always be recreated from the original source in an emergency.

At Shopify we use Elasticsearch for our full-text search. Elasticsearch is replicated and distributed out of the box, which makes it easy to scale.

The most important feature of any search engine, though, is that it performs exceptionally well for text searches.

To learn more about how full-text search engines achieve this fast performance, you can check out Toria’s lightning talk from StarCon 2019.

When to use a Full-Text Search Engine

If you have found yourself writing SQL queries with a lot of wildcard matches (for example, “SELECT * FROM products WHERE description LIKE “%cat%” to find cat-related products) and you’re thinking about brushing up on your natural-language processing skills to improve the results… you might need a search engine!

Search engines are also pretty good at searching and filtering by exact text matches or numeric values, but databases are good at that, too. The real value add of a full-text search engine is when you need to look for particular words or substrings within longer text fields.

Message Queue

The last type of data store that you might want to use is a message queue. It might surprise you to see message queues on this list because they are considered more of a data transfer tool than a data storage tool, but message queues store your data with as much reliability and even more persistence than some of the other tools we’ve discussed already!

At Shopify, we use Kafka for all our streaming needs. Payloads called “messages” are inserted into Kafka “topics” by “producers.” On the other end, Kafka “consumers” can read messages from a topic in the same order they were inserted in.

Under the hood, Kafka is implemented as a distributed, append-only log. It’s just files! Although not human-readable files.

Kafka is typically treated as a message queue, and rightly belongs in our message queue section, but it’s technically not a queue. It’s technically a distributed log, which means that we can do things like set a data retention time of “forever” and compact our messages by key (which means we only retain the most recent value for each key) and we’ve basically got a key-value document store!

Although there are some legitimate use cases for such a design, if what you need is a key-value document store, a message queue is probably not the best tool for the job. You should use a message queue when you need to ship some data between services in a way that is fast, reliable, and distributed.

When to use a Message Queue

Use a message queue when you need to temporarily store, queue, or ship data.

If the data is very simple and you’re just storing it for use later in the same service, you could consider using a key-value store like Redis. You might consider using Kafka for the same simple data if it’s very important data, because Kafka is more reliable and persistent than Redis. You might also consider using Kafka for a very large amount of simple data, because Kafka is easier to scale by adding distributed partitions.

Kafka is often used to ship data between services. The producer-consumer model has a big advantage over other solutions: because Kafka itself acts as the message broker, you can simply ship your data into Kafka and then the receiving service can poll for updates. If you tried to use something more simple, like Redis, you would have to implement some kind of notification or polling mechanism yourself, whereas Kafka has this built-in.

In Conclusion

These are not the be-all-end-all of data stores, but we think they are the most common and useful ones. Knowing about these five types of datastores will get you on the path to making great design decisions!

What do you think? Do you have a favourite type of datastore that didn’t make it on the list? Let us know in the comments below.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

How to Write Fast Code in Ruby on Rails

How to Write Fast Code in Ruby on Rails

At Shopify, we use Ruby on Rails for most of our projects. For both Rails and Ruby, there exists a healthy amount of stigma toward performance. You’ll often find examples of individuals (and entire companies) drifting away from Rails in favor of something better. On the other hand, there are many who have embraced Ruby on Rails and found success, even at our scale, processing millions of requests per minute (RPM).

Part of Shopify’s success with Ruby on Rails is an emphasis on writing fast code. But, how do you really write fast code? Largely, that’s context sensitive to the problem you’re trying to solve. Let’s talk about a few ways to start writing faster code in Active Record, Rails, and Ruby.

Active Record Performance

Active Record is Rails’ default Object Relational Mapper (ORM). Active Record is used to interact with your database by generating and executing Structured Query Language (SQL). There are many ways to query large volumes of data poorly. Here are some suggestions to help keep your queries fast.

Know When SQL Gets Executed

Active Record evaluates queries lazily. So, to query efficiently, you should know when queries are executed. Finder methods, calculations, and association methods all cause queries to evaluated. Here’s an example:

Here the code is appending a comment to a blog post and automatically saving it to the database. It isn’t immediately obvious that this executes a SQL insert to save the appended blog post. These kinds of gotchas become easier to spot through reading documentation and experience.

Select Less Where Possible

Another way to query efficiently is to select only what you need. By default, Active Record selects all columns in SQL with SELECT *. Instead, you can leverage select and pluck to take control of your select statements:

Here, we’re selecting all IDs in a blog’s table. Notice select returns an Active Record Relation object (that you can chain query methods off of) whereas pluck returns an array of raw data.

Forget About The Query Cache

Did you know that if you execute the same SQL within the lifetime of a request, Active Record will only query the database once? Query Cache is one of the last lines of defense against redundant SQL execution. This is what it looks like in action:

In the example, subsequent blog SELECTs using the same parameters are loaded from cache. While this is helpful, depending on query cache is a bad idea. Query cache is stored in memory, so its persistence is short-lived. The cache can be disabled, so if your code will run both inside and outside of a request, it may not always be efficient.

Avoid Querying Unindexed Columns

Avoid querying unindexed columns, it often leads to unnecessary full table scans. At scale, these queries are likely to timeout and cause problems. This is more of a database best practice that directly affects query efficiency.

The obvious solution to this problem is to index the columns you need to query. What isn’t always obvious, is how to do it. Databases often lock writes to a table when adding an index. This means large tables can be write-blocked for a long time.

At Shopify, we use a tool called Large Hadron Migrator (LHM) to solve these kinds of scaling migration problems for large tables. On later versions of Postgres and MySQL, there is also concurrent indexing support.

Rails Performance

Zooming out from Active Record, Rails has many other moving parts like Active Support, Active Job, Action Pack, etc. Here are some generalized best practices for writing fast code in Ruby on Rails.

Cache All The Things

If you can’t make something faster, a good alternative is to cache it. Things like complex view compilation and external API calls benefit greatly from caching. Especially if the resultant data doesn’t change often.

Taking a closer look at the fundamentals of caching, key naming and expiration are critical to building effective caches. For example:

In the first block, we cache all subscription plan names indefinitely (or until the key is evicted by our caching backend). The second block caches the JSON of all posts for a given blog. Notice how cache keys change in the context of a different blog or when a new post is added to a blog. Finally, the last block caches a global comment count for approved comments. The key will automatically be removed by our caching backend every five minutes after initial fetching.

Throttle Bottlenecks

But what about operations you can’t cache? Things like delivering an email, sending a webhook, or even logging in can be abused by users of an application. Essentially, any expensive operation that can’t be cached should be throttled.

Rails doesn’t have a throttling mechanism by default. So, gems like rack-attack and rack-throttle can help you throttle unwanted requests. Using rack-attack:

This snippet limits a given IP’s post requests to /admin/sign_in to 10 in 15 minutes. Depending on your application’s needs, you can also build solutions that throttle further up the stack inside your rails app. Rack-based throttling solutions are popular because they allow you to throttle bad requests before they hit your Rails app.

Do It Later (In a Job)

A cornerstone of the request-response model we work with as web developers is speed. Keeping things snappy for users is important. So, what if we need to do something complicated and long-running?

Jobs allow us to defer work to another process through queueing systems often backed by Redis. Exporting a dataset, activating a subscription, or processing a payment are all great examples of job-worthy work. Here’s what jobs look like in Rails:

This is a trivial example of how you would write a CSV exporting job. Active Job is Rails’ job definition framework which plugs into specific queueing backends like Sidekiq or Resque.

Start Dependency Dieting

Ruby’s ecosystem is rich, and there are a lot of great libraries you can use in your project. But how much is too much? As a project grows and matures, dependencies often turn into liabilities.

Every dependency adds more code to your project. This leads to slower boot times and increased memory usage. Being aware of your project’s dependencies and making conscious decisions to minimize them help maintain speed in the long term.

Shopify’s core monolith, for example, has ~500 gem dependencies. This year, we’ve taken steps to evaluate our gem usage and remove unnecessary dependencies where possible. This lead to removing unused gems, addressing tech debt to remove legacy gems, and using a dependency management service (eg. Dependabot).

Ruby Performance

A framework is only as fast as the language it’s written in. Here are some pointers on writing performant Ruby code. This section is inspired by Jeremy Evans’s closing keynote on performance at RubyKaigi 2019.

Use Metaprogramming Sparingly

Changing a program’s structure at runtime is a powerful feature. In a highly dynamic language like Ruby, there are significant performance costs associated to metaprogramming. Let’s look at method definition as an example:

These are three common ways of defining a method in Ruby. The first most common method uses def. The second uses define_method to define a metaprogrammed method. The third uses class_eval to evaluate a string at runtime as source code (which defines a method using def).

This is output of a benchmark that measures the speed of these three methods using the benchmark-ips gem. Let’s focus on the lower half of the benchmark that measures how many times Ruby could run the method in 5 seconds. For the normal def method, it was ran 10.9 million times, 7.7 million times for the define_method method, and 10.3 million times for the class_eval def defined method.

While this is a trivial example, we can conclude there are clear performance differences associated with _how_ you define a method. Now, let’s look at method invocation

This simply defines invoke and method_missing methods on an object named obj. Then, we call the invoke method normally, using the metaprogrammed send method, and finally via method_missing.

Less surprisingly, a method invoked with send or method_missing is much slower than a regular method invocation. While these differences might seem minuscule, they add up fast in large codebases, or when called many times recursively. As a rule of thumb, use metaprogramming sparingly to prevent unnecessary slowness.

Know the difference between O(n) and O(1)

What O(n) and O(1) mean is that there are two kinds of operations. O(n) is an operation that scales in time with size, and O(1) is one that is constant in time regardless of size. Consider this example:

This becomes very apparent when finding a value in an array compared to a hash. With every element you add to an array, there’s more potential data to iterate through whereas hash lookups are always constant regardless of size. The moral of the story here is to think about how your code will scale with more data.

Allocate Less

Memory management is a complicated subject in most languages, and Ruby is no exception. Essentially, the more objects you allocate, the more memory your program consumes. High-level languages usually implement Garbage Collection to automate removal of unused objects making developers’s lives much easier.

Another aspect of memory management is object mutability. For example, if you need to combine two arrays together, do you allocate a new array or mutate an existing one? Which option is more memory efficient?

Generally speaking, less allocations is better. Rubyists often classify these kinds of self-mutating methods as “dangerous”. Dangerous methods in Ruby often (but not always) end with an exclamation mark. Here’s an example:

The code above allocates an array of symbols. The first uniq call allocates and returns a new array with all redundant symbols removed. The second uniq! call mutates the receiver directly to remove redundant symbols and returns itself.

If used improperly, dangerous methods can lead to unwanted side effects in your code. A best practice to follow is to avoid mutating global state while leveraging mutation on local state.

Minimize Indirection

Indirection in code, especially through layered abstractions, can be described as both a blessing and a curse. In terms of performance, it’s almost always a curse

Merb, a web application framework that was merged into Rails has a motto: “No code is faster than no code.” This can be interpreted as “The more layers of complexity you add to something, the slower it will be.’’ While this isn’t necessarily true for performance optimizing code, it’s still a good principle to remember when refactoring.

An example of necessary indirection is Active Resource, an ORM for interacting with web services. Developers don’t use it for better performance, they use it because manually crafting requests and responses is much more difficult (and error prone) by comparison.

Final Thoughts

Software development is full of tradeoffs. As developers, we have enough difficult decisions to make while juggling technical debt, code style, and code correctness. This is why optimizing for speed shouldn’t come first.

At Shopify, we treat speed as a feature. While it lends itself to better user experiences and lower server bills, it shouldn’t take precedence over the happiness of developers working on an application. Remember to keep your code fun while making it fast!

Additional Reading

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

How Shopify Manages Petabyte Scale MySQL Backup and Restore

How Shopify Manages Petabyte Scale MySQL Backup and Restore

At Shopify, we run a large fleet of MySQL servers, with numerous replica-sets (internally known as “shards”) spread across three Google Cloud Platform (GCP) regions. Given the petabyte scale size and criticality of data, we need a robust and efficient backup and restore solution. We drastically reduced our Recovery Time Objective (RTO) to under 30 minutes by redesigning our tooling to use disk-based snapshots, and we want to share how it was done.

Challenges with Existing Tools

For several years, we backed up our MySQL data using Percona’s Xtrabackup utility, stored its output in files, and archived them on Google Cloud Storage (GCS). While pretty robust, it provided a significant challenge when backing up and restoring data. The amount of time taken to back up a petabyte of data spread across multiple regions was too long, and increasingly hard to improve. We perform backups in all availability regions to decrease the time it takes to restore data cross-region. However, the restore times for each of our shards was more than six hours, which forced us to accept a very high RTO.

While this lengthy restore time was painful when using backups for disaster recovery, we also leverage backups for day-to-day tasks, such as re-building replicas. Long restore times also impaired our ability to scale replicas up and down in a cluster for purposes like scaling our reads to replicas.

Overcoming Challenges

Since we run our MySQL servers on GCP’s Compute Engine VMs using Persistent Disk (PD) volumes for storage, we invested time in leveraging PD’s snapshot feature. Using snapshots was simple enough, conceptually. In terms of storage, each initial snapshot of a PD volume is a full copy of the data, whereas the subsequent ones are automatically incremental, storing only data that has changed.

In our benchmarks, an initial snapshot of a multi-terabyte PD volume took around 20 minutes and each incremental snapshot typically took less than 10 minutes. The incremental nature of PD snapshots allows us to snapshot disks very frequently, helps us with having the latest copy of data, and minimizes our Mean Time To Recovery.

Modernizing our Backup Infrastructure

Taking a Backup

We built our new backup tooling around the GCP API to invoke PD snapshots. This tooling takes into account the availability regions and zones, the role of MySQL instance (replica or master) and the other MySQL consistency variables. We deployed this tooling in our Kubernetes infrastructure as CronJobs, giving the jobs a distributed nature and avoiding tying them to our individual MySQL VMs allowing us to avoid having to handle coordination in case of a host failure. The CronJob is scheduled to run every 15 minutes across all the clusters in all of our available regions, helping us avoid costs related to snapshot transfer across different regions.

Backup workflow selecting replica and calling disk API to snapshot, per cron schedule
Backup workflow selecting replica and calling disk API to snapshot, per cron schedule

The backup tooling creates snapshots of our MySQL instances nearly 100 times a day across all of our shards, totaling thousands of snapshots every day with virtually no failures.

Since we snapshot so frequently, it can easily cost thousands of dollars every day for snapshot storage if the snapshots aren’t deleted correctly. To ensure we only keep (and pay for) what we actually need, we built a framework to establish a retention policy that meets our Disaster Recovery needs. The tooling enforcing our retention policy is deployed and managed using Kubernetes, similar to the snapshot CronJobs. We create thousands of snapshots every day, but we also delete thousands of them, keeping only the latest two snapshots for each shard, and dailies, weeklies, etc. in each region per our retention policy

Backup retention workflow, listing and deleting snapshots outside of retention policy
Backup retention workflow, listing and deleting snapshots outside of retention policy

Performing a Restore

Having a very recent snapshot always at the ready provides us with the benefit of being able to use these snapshots to clone replicas with the most recent data possible. Given the small amount of time it takes to restore snapshots by exporting a snapshot to a new PD volume, this has brought down our RTO to typically less than 30 minutes, including recovery from replication lag.

Backup restore workflow, selecting a latest snapshot and exporting to disk and attaching to a VM
Backup restore workflow, selecting a latest snapshot and exporting to disk and attaching to a VM

Additionally, restoring a backup is now quite simple: The process involves creating new PDs with source as the latest snapshot to restore and starting MySQL on top of that disk. Since our snapshots are taken while MySQL is online, after restore it must go through MySQL InnoDB instance recovery, and within a few minutes the instance is ready to serve production queries.

Assuring Data Integrity and Reliability

While PD snapshot-based backups are obviously fast and efficient, we needed to ensure that they are reliable, as well. We run a backup verification process for all of the daily backups that we retain. This means verifying two daily snapshots per shard, per region.

In our backup verification tooling, we export each retained snapshot to a PD volume, attached to Kubernetes Jobs and verify the following:

  • if a MySQL instance can be started using the backup
  • if replication can be started using MySQL Global Transaction ID (GTID) auto-positioning with that backup
  • if there is any InnoDB page-level corruption within the backup

Backup verification process, selecting daily snapshot, exporting to disk and spinning up a Kubernetes job to run verification steps
Backup verification process, selecting daily snapshot, exporting to disk and spinning up a Kubernetes job to run verification steps

This verification process restores and verifies more than a petabyte of data every day utilizing fewer resources than expected.

PD snapshots are fast and efficient, but the snapshots created exist only inside of GCP and can only be exported to new PD volumes. To ensure data availability in case of catastrophe, we needed to store backups at an offsite location. We created tooling which backs up the data contained in snapshots to an offsite location. The tooling exports the selected snapshot to new PD volume and runs Kubernetes Jobs to compress, encrypt and archive the data, before transferring them as files to an offsite location operated by another provider.

Evaluating the Pros and Cons of Our New Backup and Restore Solution


  • Using PD snapshots allows for faster backups compared to traditional file-based backup methods.
  • Backups taken using PD snapshots are faster to restore, as they can leverage vast computing resources available to GCP.
  • The incremental nature of snapshots results in reduced backup times, making it possible to take backups more frequently.
  • The performance impact on the donors of snapshots is noticeably lower than the performance impact of the donors of xtrabackup based backups.


  • Using PD snapshots is more expensive for storage compared to traditional file based backups stored in GCS.

  • The snapshot process itself doesn’t perform any integrity checks, for example, scanning for InnoDB page corruption, ensuring data consistency, etc. which means additional tools may need to be built.

  • Because snapshots are not inherently stored as a conveniently packaged backup, it is more tedious to copy, store, or send them off-site.

We undertook this project at the start of 2019 and, within a few months, we had a very robust backup infrastructure built around Google Cloud’s Persistent disk snapshot API. This tooling has been serving us well and has introduced us to new possibilities like, scaling replicas up and down for reads quickly using these snapshots apart from Disaster recovery.

If database systems are something that interests you, we're looking for Database Engineers to join the Datastores team! Learn all about the role on our career page. 

Continue reading

How Shopify Scales Up Its Development Teams

How Shopify Scales Up Its Development Teams

Have you clicked on this article because you’re interested in how Shopify scales its development teams and the lessons we’re learning along the way? Well, cool, you’ve come to the right place. But first, a question.

Are you sure you need to scale your team?

Really, really sure?

Are You Ready to Scale Your Team?

Hiring people is relatively straightforward, but growing effective teams is difficult. And no matter how well you do it, there will be a short-term price to pay in terms of a dip in productivity before, hopefully, you realize a gain in output. So before embarking on this journey you need to make sure your current team is operating well. Would you say that your team:

  1. Ruthlessly prioritizes its work in line with a product vision so it concentrates on the most impactful features?
  2. Maximizes the time it spends developing product, and so minimizes the time it spends on supporting activities like documentation and debates?
  3. Has the tools and methods to ship code within minutes and uncover bugs quickly?

If you can’t answer these questions positively then you can get a lot more from your current team. Maybe you don’t need to add new folks after all.

But let’s assume you’re in a good place with all of these. As we consider how to scale up a development organization, it’s fundamentally important to remember that hiring new people, no matter how brilliant they are, is a means to an end. We are striving to have more teams, each working effectively towards clear goals. So scaling up is partly about hiring great people, but mostly about building effective teams.

Hiring Great People

At Shopify we build a product that is both broad and deep to meet the needs of entrepreneurs who run many different types of business. We’ve deconstructed this domain into problem spaces and mapped them to product areas. Then we’ve broken these down into product development teams of five to nine folks, each team equipped with the skills it needs to achieve its product goals. This means a team generally consists of a product manager, back-end developers, web developers, data specialists and UX designers.

Tech Meeting with five happy adults

Develop Close Relationships with Your Talent Acquisition Team

Software development at scale is a social activity. That’s a fact that’s often underappreciated by inexperienced developers and leaders. When hiring, evaluating the technical abilities of candidates is a given, but evaluating their fit with your culture and their ability to work harmoniously with their teammates is as important. At Shopify we have a well-defined multi-step hiring process that we continually review based on its results. Technical abilities are evaluated by having the candidate participate in problem-solving and pair-programming exercises with experienced developers, and cultural fit is assessed by having them meet with their prospective teammates and leaders. These steps are time consuming, so we need to filter the candidates to ensure that only the most likely hires reach this stage. To do that, we have built close working relationships between our developers and our Talent Acquisition (TA) specialists.

I can’t overemphasize how important it is to have TA specialists who understand the company culture and the needs of each team. They make sure we meet the best candidates, making effective use of the time our leads and developers. So when scaling up, the first folks to recruit are the recruiters themselves, specialists who know your market. You must spend enough time with them so that they deeply understand what it takes to be a successful developer in your teams. They will be the face of your company to candidates. Even candidates whom you do not ultimately hire (in fact, especially those ones) should feel positive about the hiring experience. If they don’t you may find the word gets around in your market and your talent pipeline dries up.

Aim for Diversity of Experience on Teams

We aim to have teams that are diverse in many dimensions, including experience. We’ve learned that on average it takes about a year at Shopify before folks have fully on-boarded and have the business context, product knowledge and development knowhow to make great decisions quickly. So, our rule-of-thumb is that in any team the number of on-boarded folks should be greater than or equal to the number of those still onboarding. We know that the old software development model where a single subject matter expert communicates product requirements to developers leads to poor designs. Instead, we seek to have every team member empathize with entrepreneurs and for the team to have a deep understanding of the business problem they are solving. Scaling up is about creating more of these balanced and effective teams, each with ownership of a well-defined product area or business problem.

Building Effective Teams

Let’s move on from hiring and consider some other aspects of building effective teams. When talking about software development effectiveness, it’s hard to avoid talking about process. So, process! Right, with that out of the way, let’s talk about setting high standards for the craft of coding, and the tools of the craft.

Start With a Baseline

For teams to be effective quickly, they need to have a solid starting point for how they will work, how they will plan their work and track their progress, and for the tools and technologies they will use. We have established many of these starting points based on our experience so having every new team start again from first principles would be a tremendous waste of time. That doesn’t prevent folks from innovating in these areas, but the starting baseline is clear.

Be Open About Technical Design and Code Changes

I mentioned previously about having the right mix of onboarded vs. still onboarding folks and that’s partly about ensuring that in every team there is a deep empathy for our merchants and for what it means to ship code at Shopify scale. But more, we seek to share that context across teams by being extremely open about technical designs and code changes. Our norm is that teams are completely transparent about what they are doing and what they are intending to do, and they can expect to receive feedback from almost anyone in the company who has context in their area. With this approach, there’s a risk that we have longer debates and yeah, that has been known to happen here, but we also have a shared set of values that help to prevent this. Specifically we encourage behaviors that support “making good decisions quickly” and “building for the long term.” In this way, our standards are set by what we do and not by following a process.

Use Tooling to Codify Best Practices

Tooling is another effective way to codify best practices for teams, so we have folks who are dedicated to building a development pipeline for everyone with dashboards, so we can see how every team is doing. This infrastructure work is of great importance when scaling. Standards for code quality and testing are embedded in the toolset, so teams don’t waste time relearning the lessons of others—rather they can concentrate on the job of building great products. Once you start to scale up beyond a single team, you’ll need to dedicate some folks to build and maintain this infrastructure. (I use the plural deliberately because you’ll never have just one developer assigned to anything critical, right?)

You can read more about our tooling here on this blog. The Merge Queue and our Deprecation Toolkit are great examples of codified best practices, and you can read about how we combine these in to a development pipeline in Shopify’s Tech Stack.

As the new team begins its work, we must have feedback loops to re-enforce the behaviors that produce the best outcomes. From a software perspective, this is why the tooling is so important so that a team can ship quickly and respond to the feedback of stakeholders and users.

 Two people pair programming

Use Pairing to Share Experiences

Which brings me to pairing. The absolute best way for onboarded developers to share their experience with new folks is by coding together. Pairing is optional but actively encouraged in Shopify. We have pairing rooms in all our offices, and we hold retrospectives on the results to ensure it adds value. There’s an excellent article on our blog by Mihai Popescu that describes our approach: Pair Programming Explained.

Conduct Frequent Retrospectives

From a team effectiveness perspective, frequent retrospectives allow us to step back from the ongoing tasks to get a wider perspective on how well the team is doing. At Shopify, we encourage teams to have their retrospectives facilitated by someone outside the team to bring fresh eyes and ideas. In this way, a team is driven to continually improve its effectiveness.

At Shopify we understand that hiring is only a small step towards the goal of scaling up. Ultimately, we’re trying to create more teams and have them work more effectively. We’ve found that to scale development teams you need to have a baseline to build from, an openness around technical design, effective tooling, pair programming and frequent retrospectives.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Want to Improve UI Performance? Start by Understanding Your User

Want to Improve UI Performance? Start by Understanding Your User

My team at Shopify recently did a deep dive into the performance of the Marketing section in the Shopify admin. Our focus was to improve the UI performance. This included a mix of improvements that affected load time, perceived load time, as well as any interactions that happen after the merchant has landed in our section.

It’s important to take the time to ask yourself what the user (in our case, merchant) is likely trying to accomplish when they visit a page. Once you understand this, you can try to unblock them as quickly as possible. We as UI developers can look for opportunities to optimize for common flows and interactions the merchant is likely going to take. This helps us focus on improvements that are user centric instead of just trying to make our graphs and metrics look good.

I’ll dive into a few key areas that we found made the biggest impact on UI performance:

  • How to assess your current situation and spot areas that could be improved
  • Prioritizing the loading of components and data
  • Improving the perceived loading performance by taking a look at how the design of loading states can influence the way users experience load time.

Our team has always kept performance top of mind. We follow industry best practices like route-based bundle splitting and are careful not to include any large external dependencies. Nevertheless, it was still clear that we had a lot of room for improvement.

The front end of our application is built using React, GraphQL, and Apollo. The advice in this article aims to be framework agnostic, but there are some references to React specific tooling.

Assess Your Current Situation

Develop Merchant Empathy by Testing on Real-World Devices

In order to understand what needed to be improved, we had to first put ourselves in the shoes of the merchant. We wanted to understand exactly what the merchant is experiencing when they use the Marketing section. We should be able to offer merchants a quality experience no matter what device they access the Shopify admin from.

We think testing using real, low-end devices is important. Testing on a low-end device allows us to ensure that our application performs well enough for users who may not have the latest iPhone or Macbook Pro.

Moto G3
Moto G3 Device

We grabbed a Moto G3 and connected the device to Chrome developer tools via the remote devices feature. If you don’t have access to a real device to test with, you can make use of to run your application on a real device remotely.

Capture an Initial Profile

Our initial performance profile captured using Chrome Developer tools.
Our initial performance profile captured using Chrome Developer tools

After capturing our initial profile using the performance profiler included in the Chrome developer tools, we needed to break it down. This profile gives us a detailed timeline of every network request, JavaScript execution, and event that happens during our recording plus much, much more. We wanted to understand exactly what is happening when a merchant interacts with our section.

We ran the audit with React in development mode so we could take advantage of the user timings they provide. Running the application with React in production mode would have performed better, but having the user timings made it much easier to identify which components we need to investigate.

React profiler by React dev tools
React Profiler from React Dev Tools

We also took the time to capture a profile using the profiler provided by React dev tools. This tool allowed us to see React specific details like how long it took to render a component or how many times that component has been updated. The React profiler was particularly useful when we sorted our components from slowest to fastest.

Get Our Priorities in Order

After reviewing both of these profiles, we were able to take a step back and gain some perspective. It became clear that our priorities were out of order.

We found that the components and data that are most crucial to merchants were being delayed by components that could have been loaded at a later time. There was a big opportunity here to rearrange the order of operations in our favor with the ultimate goal of making the page useful as soon as possible.

We know that the majority of visits to the Marketing section are incremental. This means that the merchant navigated to the Marketing section from another page in the admin. Because the admin is a single page app, these incremental navigations are all handled client side (in our case using React Router). This means that traditional performance metrics like time to first byte or first meaningful paint may not be applicable. We instead make use of the Navigation Timing API to track navigations within the admin.

When a merchant visits the Marketing section, the following events happen:

  • JavaScript required to render the page is fetched
  • A GraphQL query is made for the data required for the page
  • The JavaScript is executed and our view is rendered with our data

Any optimizations we do will be to improve one of those events. This could mean fetching less data and JavaScript, or making the execution of the JavaScript faster.

Deprioritize Non-Essential Components and Code Execution

We wanted the browser to do the least amount of work necessary to render our page. In our case, we were asking the browser to do work that did not immediately benefit the merchant. This low-priority work was getting in the way of more important tasks. We took two approaches to reducing the amount of work that needed to be done:

  • Identifying expensive tasks that are being run repeatedly and memoize (~cache) them.
  • Identifying components that are not immediately required and deferring them.

Memoizing Repetitive and Expensive Tasks

One of the first wins here was around date formatting. The React profiler was able to identify one component that was significantly slower than the rest of the components on the page.

React Profiler Identifying <StartEndDates /> Component is Significantly Slower
React Profiler Identifying <StartEndDates /> Component is Significantly Slower

The <StartEndDates /> component stood out. This component renders a calendar that allows merchants to select a start and end date. After digging into this component, we discovered that we were repeating a lot of the same tasks over and over. We found that we were constructing a new Intl.DateTimeFormat object every time we needed to format a date. By creating a single Intl.DateTimeFormat object and referencing it every time we needed to format a date, we were able to reduce the amount of work the browser needed to do in order to render this component.

<StartEndDates /> after memoization of two other date formatting utilities
<StartEndDates /> after memoization of two other date formatting utilities

This in combination with the memoization of two other date formatting utilities resulted in a drastic improvement in this components render time. Taking it from ~64.7 ms down to ~0.5 ms.

Defer Non-Essential Components

Async loading allows us to load only the minimum amount of JavaScript required to render our view. It is important to keep the JavaScript we do load small and fast as it contributes to how quickly we can render the page on navigation.

One example of a component that we decided to defer was our <ImagePicker />. This component is a modal that is not visible until the merchant clicks a Select image button. Since this component is not needed on the initial load, it is a perfect candidate for deferred loading.

By moving the JavaScript required for this component into a separate bundle that is loaded asynchronously, we were able to reduce the size of the bundle that contained the JavaScript that is critical to rendering our initial view.

Get a Head Start

Prefetching the image picker when the merchant hovers over the activator button makes it feel like the modal instantly loads.
Prefetching the image picker when the merchant hovers over the activator button makes it feel like the modal instantly loads

Deferring the loading of components is only half the battle. Even though the component is deferred, it may still be needed later on. If we have the component and its data ready when the merchant needs it, we can provide an experience that really feels instant.

Knowing what a merchant is going to need before they explicitly request it is not an easy task. We do this by looking for hints the merchant provides along the way. This could be a hover, scrolling an element in to the viewport, or common navigation flows within the Shopify admin.

In the case of our <ImagePicker /> modal, we do not need the modal until the Select image button is clicked. If the merchant hovers over the button, it’s a pretty clear hint that they will likely click. We start prefetching the <ImagePicker /> and its data so by the time the merchant clicks we have everything we need to display the modal.

Improve the Loading Experience

In a perfect world, we would never need to show a loading state. In cases where we are unable to prefetch or the data hasn’t finished downloading, we fallback to the best possible loading state by using a spinner or skeleton content. We typically choose to use a skeleton if we have an idea what the final content would look like.

Use Skeletons

Skeleton content has emerged as a best practice for loading states. When done correctly, skeletons can make the merchant feel like they have ‘arrived’ at the next state before the page has finished loading.

Skeletons are often not as effective as they could be. We found that it’s not enough to put up a skeleton and call it a day. By including static content that does not rely on data from our API, the page will feel a lot more stable as data arrives from the server. The merchant feels like they have ‘arrived’ instead of being stuck in an in between loading state.

Animation showing how adding headings helps the merchant understand what content they can expect as the page loads.
Animation showing how adding headings helps the merchant understand what content they can expect as the page loads

Small tweaks like adding headings to the skeleton go a long way. These changes give the merchant a chance to scan the page and get a feel for what they can expect once the page finishes loading. They also have the added benefit of reducing the amount of layout shift that happens as data arrives.

Improve Stability

When navigating between pages, there are often going to be several loading stages. This may be caused by data being fetched from multiple sources, or the loading of resources such as images or fonts.

As we move through these loading stages, we want the page to feel as stable as possible. Drastic changes to the pages layout are disorienting and can even cause the user to make mistakes.

Using a skeleton to help improve stability by matching the height of the skeleton to the height of the final content as closely as possible.
Using a skeleton to help improve stability by matching the height of the skeleton to the height of the final content as closely as possible

Here’s an example how we used a skeleton to help improve stability. The key is to match the height of the skeleton to the height of the final content as closely as possible.

Make the Page Useful as Quickly as Possible

Rendering the ‘Create campaign’ button while we are still in the loading state
Rendering the Create campaign button while we are still in the loading state

In this example, you can see that we are rendering the Create campaign button while we are still in the loading state. We know this button is always going to be rendered, so there’s no sense in hiding it while we are waiting for unrelated data to arrive. By showing this button while still in the loading state, we unblock the merchant.

No Such Thing as Too Fast

The deep dive helped our team develop best practices that we are able to apply to our work going forward. It also helped us refine a performance mindset that encourages exploration. As we develop new features, we can apply what we’ve learned while always trying to improve on these techniques. Our focus on performance has spread to other disciplines like design and research. We are able to work together to build up a clearer picture of the merchants intent so we can optimize for this flow.


Many of the techniques described by this article are powered by open source JavaScript libraries that we’ve developed here at Shopify.

The full collection of libraries can be found in our Quilt repo. Here you will find a large selection of packages that enable everything from preloading, to managing React forms, to using Web Workers with React.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Building Resilient GraphQL APIs Using Idempotency

Building Resilient GraphQL APIs Using Idempotency

A payment service which isn’t resilient could fail to complete a charge or even double-charge buyers. Also, the client calling the API wouldn’t be certain of the outcome in the case of errors returned from the request reducing trust in the payment methods provided by that service. Shopify’s new Payment Service, which centralizes payment processing for certain payment methods, uses API idempotency to prevent these situations from happening in the first place.

Shopify's New Payment Service
Shopify's New Payment Service

The new Payment Service is owned by the Money Infrastructure team which is responsible for the code that moves money, handles and records the interactions with various payment providers. The service provides a GraphQL interface that’s used by Shopify and our Billing system. The Billing system charges the merchants and pays Shopify Partners, based on monthly subscriptions and usage, as well as paying application developers.

The Issues With Non-resilient Payment Services

A payment API should offer an ‘exactly once’ model of resiliency. Payments should not happen twice, and should offer a way for clients to recover in the case of an error. When an API request can’t be re-attempted and an error happens during a payment attempt, the outcome is unknown.

For example, the Payment Service has a ChargeCreate mutation which creates a payment using the buyer’s chosen payment method. If this mutation is called by the client, and that request returns an error or times-out, then without idempotency the client can’t discover what state this new payment is in.

If the error occurred before the payment was completed, and the client doesn’t retry the request, then the merchant would go unpaid. If the error occurred after the Payment was completed, and the client retries the request, which would not be associated with the first attempt, then the buyer would be double charged.

Possible Solutions

The Money Infrastructure team chose API level idempotency to create a resilient system but there are different approaches to dealing with this:

  • Fix manually: Ship maintenance tasks created one by one, to repair the data. This doesn’t scale.
  • Automatic Reconciliation: Write code to detect cases where the payment state is unknown and repair them. This would require ongoing work since introducing new payment methods and providers would require new reconciliation work. And the results of reconciliation would require API clients to react to these corrections as well to keep their data up to date.

What is API Idempotency?

An idempotent API is one where repeated requests with the same parameters will be executed only once, no matter how many times it’s retried. This strategy gives clients the flexibility to retry API requests which may have failed due to connection issues, without causing duplication or conflicts in the API provider’s system.

Creating an Idempotent API

There are some requirements when creating an idempotent API. Please note that if remote service providers APIs are not idempotent, it will be very hard to implement an idempotent API.

Name the Request: Use Idempotency Keys

One of the parameters to every mutation is an idempotency-key, which is used to uniquely identify the request. We use a randomly generated universally unique identifier (UUID), but it could be any unique identifier.

Here is an example of a mutation and input which shows the idempotency key is part of the input. The idempotency key is a ‘first class citizen’ of the API, we’re not using an HTTP header for middleware. This allows us to require the presence of the idempotency key using the same GraphQL parameter validation as the rest of the API, and return any errors in the usual way, rather than returning errors outside the GraphQL mechanism.

Lock the API Call: on the ‘name’ Client + Idempotency Key to Prevent Duplicate Simultaneous Requests

One way a request can fail is due to dropping network connections. If this happens after the API server has received the request and begun processing, the client can retry the request while the first attempt is still processing. To prevent the duplicate simultaneous request, a lock around the API call based on the client and idempotency key will allow the server to reject the request with an HTTP code of 409, meaning that the client may try again shortly.

Track Requests: Store the Incoming Requests, Uniquely Identified By Client + Idempotency Key

The Payment Service needs to keep track of these requests and stores that information in the database. The Payment Service uses a model called IncomingRequest to track information related to each request. Each model instance is uniquely identified by the client and idempotency key.

The existence of the saved IncomingRequest instance can be used to determine if any request is a new request or a retry. If the IncomingRequest model instance is loaded instead of created, then we know that the request is a retry. When the request is started it can also determine if the previous request was completed or not. If the request was previously completed, the previous response can be returned immediately.

Track Progress: The IncomingRequest Record Provides a Place to Track Progress for That Request

The IncomingRequest model includes a column where the progress for a request is stored as it is completed. The Payment Service breaks the progress for a given mutation into named steps, or recovery points. The code in each step (sometimes called recovery points) must be structured in a specific way, otherwise any errors will leave a given request in an unknown state.

Using Steps Explained

Using steps is a strategy for structuring code in a way that isolates the types of side effects a given function has. This isolation allows the progress to be recorded in a stepwise fashion, so that if an error occurs, the current state is known. There are three different kinds of side effects we need to be concerned with in this design:

  • No side effects: This step makes no http calls, or database writes. This is typically a qualifier function, ie. resolving if this handler can process these records in this way.
  • Local side effects: This step only makes writes to the database, and this step will be wrapped in a database transaction so that any errors will cause a rollback.
  • Remote side effects: Calls to service providers, loggers, analytics.

Each step is implemented as a ‘run’ function in a handler class, possibly paired with ‘recover’ version of that function. A step may not need a recover step, for example, if the run step confirms that the handler is the appropriate handler. In the case of a recovery, if the handler made it further than that step, the qualification step would have succeeded in the original request and a recovery function does not need to do anything in the retry request.

How steps are used:

  1. For each step completed in the request, record the successful completion. As the request handler successfully executes each step, the IncomingRequest record is updated to the name of that completed step.
  2. If the request is retrying, but was incomplete, then recover previously completed steps, and continue. If the request is retrying a request that was not completed on a previous attempt, the handler will recover the completed steps and then continue to run the reset of the steps. Every step may have both a ‘run’ and a `recover` function.

The flow through the steps of the initial `run`, versus a subsequent `recover`, after the initial run failed on step 3
The flow through the steps of the initial `run` and subsequent `recover` for the failed step 3

This diagram shows the flow through the steps of the initial `run`, versus a subsequent `recover`, after the initial run failed on step 3.

Here is the handler class implementation for the Sofort payment method. Each recovery_point is configured with a run function, an optional recover function and transactional boolean. The recovery points are configured in the order that they’re executed.

Ruby makes it easy to write an internal Domain Specific Language (DSL), which results in mutation handler implementations which are straightforward and clear. Separating the steps by side-effect does force a certain coding approach, which gives a uniformity to the code.

Drawbacks of API Idempotency

Storing the progress of a request requires extra database writes, this will add overhead to every API call. The stepwise structure of the request handlers forces a specific coding style, which may feel awkward for developers who are new to it. It requires the developer to approach each handler implementation in a particular way, considering which type of side effects each piece of code has, and structuring it up appropriately. Our team quickly learned this new style with a combination of short teaching sessions and example code.

Modifying the implementation of a mutation handler may change, add or remove recovery points. If that happens, the developer must take extra steps to ensure that the implementation can still recover from any already stored recovery points and ensure that any step can be correctly recovered from when the modified handler is deployed. We have a test suite for every handler which exercises every step, as well the different recovery situations the code must handle. This helps us ensure that any modification is correct, and will correctly recover from the different failures.

Remembering the Side Effects is Fundamental

When considering how to implement an idempotent API in your project, start by partitioning the code in a given API implementation into steps by the kind of side effects it has. This will let you see how the parts interact and provide an opportunity to determine how to recover each part. This is the fundamental part of implementing an idempotent API.

There are always going to be trade-offs when adding idempotency to an API, both in performance, as well as ease of implementation and maintenance. We believe that using the recovery point strategy for our mutation handlers has resulted in code that’s clear, well structured and easy to maintain, which is worth the overhead of this approach.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Living on the Edge of Rails

Living on the Edge of Rails

At Shopify, we make keeping our dependencies up to date a priority. Having outdated dependencies exposes your project to security issues and contributes towards technical debt. Upgrading a large dependency like Rails can be overwhelming, if you are lost and don’t know where to start, you can read our post explaining our journey to upgrade Rails from 4.2 to 5.0 and you can watch the Upgrading Rails at scale recording from RailsConf 2018.

Our Core monolith, used by millions of users, has been running the unreleased version of Rails 6 in production since February 2019. Going forward, our application will continuously run the latest revision of the framework. There are multiple advantages for us and for the community to be living on the Edge of Rails that we’ll cover through this blog post.

The Edge of Rails is the Rails master branch which includes everything up to the very newest commit. Living on the Edge of Rails means that anytime a change is introduced upstream it becomes available in our application. We no longer need to wait multiple months for a new release.

Targeting the HEAD of Rails

Another advantage of targeting the HEAD of Rails is to cut down the time it takes for us to upgrade. Continuously integrating Rails with a small, weekly bump instead of a big one every year, reduces the size of the diff. We also realized that developers are more inclined to contribute to Rails and implement ideas they have if they can use the feature they wrote right away.

Rafael Franca, member of the Rails at Shopify team and a Rails core contributor, as well as release manager, runs our continuous integration (CI) against the framework before any new release or before merging an important change upstream. By being able to run our massive test suite composed of more than 130,000 tests, we're able to discover edge cases, find improvements where needed and propose patches upstream to make Rails better for everyone.

Xavier Noria @fxn tweeted - shout-out to byroot (a Shopify employee), in the last weeks he has focused on adapting Shopify to use Zeitwerk (which they have in production) providing extraordinary feedback about performance relevant to applications at their scale, thank you to his work this gem is better today for everyone
Zeitwerk gives Shopify a shout out for helping improve their gem

We're already seeing the positive impact this has for the Ruby and Rails community. One example is our close contribution to Zeitwerk, the new autoloader that ship with Rails 6.

Updating to the Latest Revision

Solid Track, a bot that upgrade Rails on a weekly basis to the latest revision upstream
Solid Track, a bot that upgrade Rails on a weekly basis to the latest revision upstream

Targeting the HEAD of Rails means that we now need to periodically bump it to the latest revision. To avoid manual steps, we created Solid Track, a bot that upgrade Rails on a weekly basis to the latest revision upstream. The bot opens a Pull Request on GitHub and pings us with a diff of the changes introduced in the framework.

Every Monday, we receive this ping and go over the new commits merged upstream and check if something that our CI didn’t catch could break once in production.

If CI is green, it’s usually good to ship. It’s possible that our test suite didn’t catch a possible issue, but we mitigate the risk thanks to the way we deploy our application. Each time we deploy, only a subset of our servers get the new changes. We called those servers “canaries”. If no new exceptions happen on the canaries for ten minutes, our shipping pipeline proceed and deploy the changes to all remaining servers.

Solid Track bot triggering git bisect
Solid Track bot triggering git bisect

However, if CI is red, our bot automatically takes care of triggering a git bisect to determine which change is breaking the test. This step allows us to save time and instantly identify which commit is problematic. Then we need to determine whether the change is legit or it introduced a regression upstream

Should My Application Target the HEAD of Rails?

If targeting the HEAD of Rails is something you’d be interested in doing in your application, keep in mind that using an unreleased version of any dependency comes with a stability tradeoff. We evaluated the risk in our application and were confident in our tooling, test suite and the way we deploy to take this decision.

Here are the questions and answers we asked ourselves before moving forward:

1. How much you and your team will benefit from targeting HEAD?

We’ll get a lot out of this. Not only we’ll be able to get all bug fixes and new features quickly, we’d also save time and won’t have to dedicate a whole team and months of work to upgrade our application.

2. Do you have enough monitoring in place in case something goes wrong?

We have a lot of monitoring. Exception reporting on Slack, Datadog metrics correctly configured with threshold when a metric is too high/low and 24h on call rotation.

3. Do you have a way to deploy your application on a small subset of servers?

We use canary deploys to put in production the changes only a small subset of servers.

4. Finally, how confident are you with your test suite?

Our test suite is large and coverage is good. There is always room for improvement but we're confident with it.

Upgrading your dependencies is part of having a sane codebase. The more outdated they are the more technical debt you accumulate. If you have outdated dependencies, consider taking some time to upgrade them.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Pagination with Relative Cursors

Pagination with Relative Cursors

When requesting multiple pages of records from a server the simple implementation is to have an incremental page number in the URL. Starting at page one, each subsequent request that’s sent has a page number that’s one greater than the previous. The problem is that incremental page numbers scale poorly—the bigger the page number, the slower the query. The simple solution is relative cursor pagination because it remembers where you were and continues from that point onwards instead.

The Problem

A common activity for third-party applications on Shopify to do is to sync the full catalogue of products. Some shops have more than 100,000 products and these can’t all be loaded in a single request as it would time out. Instead, the application would make multiple requests to Shopify for successive pages of products which look like this:


This would generate a SQL query like this:

This query scales poorly because the bigger the offset, the slower the query. In the above example, the query needs to go through 2500 records and then discard the first 2400. Using a test shop with 14 million products, we ran some experiments loading pages of products at various offsets. Taking the average time over five runs at each offset, here are the results:


Time (ms)











Omitted from the table are tests with the 1,000,000th offset and above since they consistently timed out.

Not only do queries take a long time when a large offset is used, but there’s also a limited number of queries that can be run concurrently. If too many requests with large page numbers are made at the same time, they can pile up faster than they can be executed. This leads to unrelated, quick queries timing out while waiting to be run because all of the database connections are in use by these slow, large-offset queries.

It’s particularly problematic on large shops when third-party applications load all records for a particular model, be it products, collects, orders, or anything else. Such usage has ramifications outside of the shop they are being run on. Since multiple shops are run on the same database instances, a moderate volume of large-offset queries cause unrelated queries from shops that happen to share the same database instance to be slower or time out altogether. For the long-term health of our platform we couldn’t allow this situation to continue unchecked.

What is Relative Cursor Pagination?

Relative cursor pagination remembers where you were so that each request after the first continues from where the previous request left off. The downside is that you can no longer jump to a specific page. The easiest way to do this is remembering the id of the last record from the last page you’ve seen and continuing from that record, but it requires the results to be sorted by id. With a last id of 67890 this would looks like:

A good index set up can handle this query and will perform much better than using an offset, in this example, it’s the primary index on id. Using the same test shop, here’s how long it takes to get the same pages of records but this time using the last id:


Time using offset (ms)

Time using last id (ms)

Percentage improvement





















With an offset of 100,000 it’s over 400 times faster to use last id! It’s much faster, and it doesn’t matter how many pages you request, the last page takes around the same amount of time as the first.

Sorting and Skipping Records

Sorting by something other than id is possible by remembering the last value of the field being sorted on. For example, if you’re sorting by title, then the last value is the title of the last record in the page. If the sort value is not unique, then if we used it alone we would potentially be skipping records. For example, assume you have the following products:

Sorting by Title
Sorting by Title

Requesting a page size of two sorted by title would return product with ids 3 and 2. To request the next page, just querying by title > “Pants” would skip product 4 and start at product 1.

Sorting by Title - Product Skipped
Sorting by Title - Product Skipped

Whatever the use case of the client that requests these records, it’s likely to have problems if records are sometimes skipped. The solution is to set a secondary sort column on a unique value, like id, and then remembering both the last value and last id. In that case the query for the second page would look like this:

Querying in this way results in getting the expected products on the second page.

Sorting by Title - No Skipped Product
Sorting by Title - No Skipped Product

To ensure the query is performant as the number of records increases you’d need a database index set up on title and id. If an appropriate index is not set up then it could be even slower than using a page number.

Using the same test shop as before, here’s how long it takes to get the same pages of records but this time using both last value and last id:


Time using offset (ms)

Time using last id (ms)

Time using last value (ms)

Percentage improvement over offset

Percentage improvement over last id































Overall, it’s slower than using a last id alone, but still orders of magnitude faster than using an offset when the offset grows large.

Making it Easy for Clients to Use Relative Cursors

The field being sorted on might not be included in the response. For example, in the Shopify API pages of products sorted by total inventory can be requested. We don’t expose total inventory directly on the product, but it can be derived by adding up the inventory_quantity from the nested variants, which are included in the response. Rather than requiring clients do this calculation themselves we make it easy for them by generate URLs that can be used to request the next and previous page, and include them in a Link header in the response. If there’s both a next page and a previous page it looks like this: 

Conversion in Shopify

The problem of large offsets causing queries to be slow was well known within Shopify, as was the solution of using relative cursors. In our internal endpoints, we were making liberal use of them, but rolling relative cursors out to external clients is a much bigger effort. We just added API versioning to our REST API, so it’s reasonable to make such a large change as removing page numbers and switching everything to relative cursors.

As the responsibility for the different endpoints was spread across many different teams there was no clear owner of pagination as a whole. Though the problem wasn’t directly related to my team, Merchandising, our ownership of the products and collects APIs meant we were acutely aware of the problem. They’re two of the largest APIs in terms of both the volume of requests, and the number of records they deal with.

I wanted to fix the problem and no one else was tackling it, so I put together a proposal on how we could fix it across our platform and sent it to my lead and senior engineering leadership. They agreed with my solution and I got the green light to work on it. A couple more engineers joined me and together we put together the patterns all endpoints were to follow, along with the common code they would use, and a guide for how to migrate their endpoints. We made a list of all the endpoints that would need to be converted and pushed it out to the teams who owned them. Soon we had dozens of developers across the company working on it.

As third-party developers must opt in to use relative cursors for now, adoption is currently quite low and we don’t have much in the way of performance measures to share. Early usage of relative pagination on the /admin/products.json endpoint show it to be about 11 times faster on average than comparable requests using a page number. By July 2020 no endpoints will support page numbers on any API version and will need to use relative pagination. We’ll have to wait until then to see the full results of the change.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Lessons from Leading a Remote Engineering Team

Lessons from Leading a Remote Engineering Team

For my entire engineering management career, I’ve managed remote teams. At Shopify, I manage Developer Acceleration, a department with both colocated and remote teams with members spread across four Canadian offices and in six countries.

You may think that managing remote teams is hard, and it is, but there are real benefits that you can achieve by being open to remote employees and building a remote team. Let’s talk about the benefits of a remote team, how to build your remote team, and how to set your people up to succeed.

The Benefits of Remote

It’s not a matter of right and wrong with colocated and remote. Either configuration can work and both provide benefits.

Some advantages of a remote team are: 

  • expanding to a global hiring pool
  • supporting a more diverse workforce
  • improving your ability to retain top employees
  • adding a location-based team capability

Expanding to a Global Hiring Pool

Hiring well is difficult and time consuming. Recruiters and hiring managers talk about filling the top of the funnel, which means finding suitable candidates to apply or to approach about your role. For specialized roles, like a mobile tooling developer, it can be hard to fill the funnel. A challenge with colocated teams is that your hiring pool is limited to those people who live in the same city as your office and those willing to relocate. A larger pool gives you access to more talent. On my team we’ve hired people in Cyprus, Germany, and the UK, none of whom could relocate to one of our offices in Canada.

More Diverse Workforce

A willingness to hire anywhere also gives access to a more diverse talent pool. There are people who are unwilling or unable to relocate. There are also those who need to work from home. I’ve hired people with mobility issues, people with dependents, such as young children or older parents, and people with strong ties to their communities. They are highly skilled and are excellent additions to our team but wouldn’t have been options had we required them to work out of one of our offices.

Ability to Retain Top Employees

A company invests in each employee that is hired and you want to retain good employees. By being a on remote team, I have retained people who decided to relocate for personal reasons, often out of their direct control. In one case, a spouse had a location-dependent job opportunity that the family had decided to follow. In another, the person needed to be closer to their family for health reasons. I’ve successfully relocated people to Canada, France, the Netherlands, Poland, and the USA. Relocating these high-performing employees is much less expensive than it is to hire and train replacements.

Location-Based Team Capability

A team may also have specific requirements, like 24/7 support, that make it advantageous to distribute people rather than centralize them. My release engineering team supports our build and deploy pipeline for Shopify developers around the world and benefits from having a 24/7 on call schedule without needing people to be on call in the middle of the night.

Man Working at Desk

Building a Remote Team

An engineering manager’s job is to create an effective team. They do this by assembling the right people, defining the problem to solve, and focusing on execution. There’s a key piece in “effective team” that’s often overlooked. A collection of people isn’t a team. A team functions well together and is more than the sum of its parts. This is one of the reasons we don’t hire jerks. Building a team requires the establishment of relationships and trust, which relies on really good communication. Neither relationships nor trust can be mandated. To build the team you need to create an environment and opportunity for people to interact with one another on more than a superficial level.

Shopify has two key cultural values that support remote work:

  1. Default to open internally 
  2. Charge your trust battery

Default to Open Internally

Defaulting to open is about inclusion both in decisions and in results. At Shopify we encourage sharing investment plans, roadmaps, project updates, and tasks. This means writing a lot down, and making information discoverable, which provides a facility to transfer knowledge to remote workers. It also means being deliberate about when to use asynchronous and synchronous communication for discussions and decisions.

Asynchronous Communication

Asynchronous communication is a best practice and should be your default method of interaction as it decouples each person’s availability with their ability to participate in discussions and decisions. People need to be able to disconnect without missing out on key decisions. Asynchronous communication frees people by giving them time to focus on their work and on their personal life. My team has discussions via email or GitHub issues. Longer-form ideas and technical design documents are written and reviewed in Google Docs. Once we start building, day-to-day tasks are kept in GitHub issues and Project Boards. Project updates and related decisions are captured in our internal project management system. I’ve listed a number of tools and that we use, but tools won’t solve this problem for you. Your team needs to choose communication conventions and then support those conventions with tools.

Synchronous Communication

When building teams there is also a place for synchronous communication. My teams each run a weekly check-in meeting on Google Hangouts. The structure of these meetings varies but typically includes demos or highlights of what was accomplished in the last week, short planning for the next week, and a team discussion about topics relevant that week. When managing a team across multiple time zones, common advice is to share the pain by moving the meeting time around. In my experience the result is confusion and people regularly missing the meeting. Just pick one time that is acceptable to the people on the team. Set attendance as a requirement of being on the team with new people before they join even if the meeting time is outside of their regular hours.

My teams are generally built so that everyone will be working at the same time during some portion of the day. These core working hours are an opportunity to have synchronous conversations on Slack or ad hoc video meetings as needed.

Charge Your Trust Battery

Shopify is a relationship-driven company. The Trust Battery is a concept that models the health of your relationships with your co-workers. Positive interactions, like open conversations, listening to others, and following through on commitments, charge the battery. Negative interactions, like being insensitive, demanding, or doing poor work, discharge the battery. This concept brings focus to developing relationships and pushes everyone to revisit their relationships on a regular basis.

Trusted relationships don’t just happen, but they can be given a push. Be open about yourself and encourage people to share details about themselves that you’d typically get with “water cooler” conversation. To facilitate this sharing, I set up Geekbot to prompt everyone on my team in Slack each Monday to answer an optional short list of questions such as

  • What did you do this weekend?
  • What’s something that you’ve read in the last week?
  • Any upcoming travel/vacation/conferences planned?

Participation is pretty high, and I’ve learned quite a lot about the people I work with through this short list of questions. Personal details humanize the people on the other end of your chat window and give you a better, multidimensional view of the people on your team.

Lastly, get people together in person. Use this sparingly as it can be a big request for people to travel. Pick the times when your team will get together. If you have a head office, that is typically a good anchor point. If not, consider selecting different places to share the travel burden. Support people who need it to make these in person sessions possible. For example, if a support person is required for a team member to travel, the cost of their trip should include the cost of the support person. Respect people’s time and schedule by being clear about the outcome of the onsite. Relationship building should be a component of the trip but not the only component. On our team we use our two yearly onsites for alignment and to leave people inspired, appreciated, and recognized. We also carve out time that teams can use to plan and code together in person.

Team Hands in Middle

Setting People Up for Success

Remote workers benefit from support from their managers and company. Work with them to set up a healthy work environment, give them regular attention and information, and champion them whenever you can.

Healthy Work Environment

You want your people to be effective and do their best work, so work with them to ensure that they have a healthy working environment. Reinforce the benefits of having a good desk, chair, monitor, mouse and keyboard, and a reliable internet connection. Speak with them about identifying a place that they can designate as their “office” and how to create a separation between work and personal time when their office is in their home. Some people are good at separating these parts of their life. Others need a ritual, like walking around the block, as the separator. Establish their regular working hours so that you are both in agreement about the hours that they are working and when they are not.

Connection Through Communication

People outside of an office need help to maintain their connection to the company and to you. They’ll miss out on any hallway chatter at an office and other in-person conversations like those that happen at lunch. I have a weekly one-on-one with my employees to provide them with a steady stream of information to keep them informed. I try to bring relevant information to all of my one-on-ones by preparing in a shared agenda in advance. I also ping people regularly on Slack with more timely information about the people on their teams, updates about our shared work, and to keep in touch throughout the week. If you do have an office, discuss whether spending some time there each year makes sense for them and seems like a worthwhile investment for you both. One person requested to be in the office for three weeks a year. To me, saying yes to this request was an investment in them and their future with the company.

Champion Your People

Remote people can fall into the trap of being out of sight and out of mind. Be a champion for your people. Ensure you use their name and highlight good work to your manager, your peers, and to your team. Give them credit, recommend them for relevant opportunities, and speak up on their behalf. Coach them on how to be more visible. Building relationships and working with others takes time and effort. Ultimately, their visibility depends on them and you and is important for their career progression and long-term retention.

Remote Is Worth the Effort

Building and managing a remote team takes effort to keep your team engaged, provide opportunity, and ensure that each person and the team is set up for success. You need to define your methods of communication, and deliberately stay in touch throughout the week. If you’re willing to put in the work, you can benefit from the hiring, composition, retention, and strategic benefits that a remote team provides.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Componentizing Shopify’s Tax Engine

Componentizing Shopify’s Tax Engine

By Chris Inch and Vignesh Sivasubramanian

Reading Time: 8 minutes

At Shopify, we value building for the long term. This can come in many forms but within Engineering, we want to build things in a way that is easy to understand, modify, and deploy so we are confident to build without introducing bugs or unnecessary complexity. The tax engine that existed in Shopify’s codebase started out simple, but over years of development and incremental additions, it became a challenging part of code to work within. This article details how our Engineering team tackled the problems associated with complex code, and how we built for the long term by moving our tax engine to a componentized architecture within Shopify’s codebase. Oh… and we did all this without anyone noticing.

Tax Calculations: The Wild West

Tax calculations on orders are complex by nature. Many factors go into calculating the final amount charged in taxes on an order like product type, customer location, shipping origin, physical and economic nexus of a business. This complexity created a complicated system within our product where ownership of tax logic was spread far and wide to components that knew too much about how tax calculations worked. It felt like the Wild West of tax code.

Lucky for us, we have a well-defined componentization architecture at Shopify and we leveraged this architecture to implement a new tax component. Essentially, we needed to retain the complexity, but eliminate the complications. Here’s how we did it.

Educate the Team

The first step to making things less complicated was creating a team that would spend time gaining knowledge of the code base around tax. We needed to fully understand which parts of Shopify were influencing tax calculations and how they were being used. And it’s not just code! Taxes are tricky in general. To be able to create a tax component, one must not only understand the code involved, but also understand the tax domain itself. We used an in-house tax Subject Matter Expert (SME) to ensure we continued to support the many complexities of calculating taxes. We also employed different strategies to bring the team’s tax knowledge up to snuff which included weekly trivia question on taxes around the world. This allowed us to learn the domain and have a bit of fun while doing so.

Do you know the difference between zero-rated taxes and no taxes? No? Neither did we but with persistence and a tenacity for learning, the team leveled up with all the intricacies of taxation faced by Shopify merchants. We realized if we wanted to make taxes an independent component in our system, we need to be able to discern what proper tax calculations look like.

Understand Existing Tax Logic

The team figured out where tax logic was used by other systems and how it was consumed. This initial step took the most effort as we used a lot of regular expressions, scripts, and manual processes to find all of the areas that touched taxes. We found that the best way to gain expertise quickly was to work on any known bugs relating to taxes. There was some re-factoring that was beneficial to tackle up front, before componentization, but some of the tax logic was so intertwined with other systems that it would be easier to re-factor once the larger componentization change was in place.

Tax Engine Structure Before Componentization
Taxes Before Componentization

After a full understanding of the tax logic was achieved, the team devised the best strategy to isolate the tax logic into its own component. A component is an efficient way to organize large sections of code that changes together by breaking a large code base into meaningful distinct parts, each with its own defined dependencies. After this, all communication becomes explicit over the component’s architectural boundaries. For example, one of the most complicated aspects of Shopify’s code is order creation. During the creation of an order, the tax engine is invoked by three distinct parts of Shopify Cart -> Checkout -> Order. This change of context brings in more complexity to the system because each area is using taxes in its own selfish way, without consistency. When Checkout changed how it used Taxes, it might have unknowingly broken how Cart was using it.

Creating a Tax Component

Define the Interface

In order to componentize the tax logic, first we had to define a clear interface and entry point into all the tax calls being made in Shopify’s codebase. Everything that requires tax information will pass a set of defined parameters, and expect a specific response when requesting tax rates. The tax request outlines the data it requires in a clear and understandable format. Each of the complex attributes is simply a collection of simple types, this way the tax logic need not worry about the implementation of the caller.

The tax response schema is also composed of simple types that don't make any assumptions about the calling component.

Componentized Tax Engine
Componentized Tax Engine

This above diagram shows how each component interacts cleanly with the tax engine using well-defined requests and responses, TaxesRequestSchema and TaxesReponseSchema. With the new interface, the flow of execution on tax engine looks much more streamlined and easy to understand.

Executing the Plan

Once we had defined a clean interface to make tax requests, it was time to wrangle all the instances of tax-aware code throughout the entire Shopify codebase. We did this by moving all source files touching tax logic under tax component. If taxes were the Wild West, then we were the Sheriff coming to town. We couldn’t leave any rogue tax code outside of our tax component. Additionally, we wanted to make our changes future-proof so that other developers at Shopify aren’t able to accidentally add new code that reaches past our component boundaries, so we added GitHub bot triggers to notify our team on any commits pushed against source files under tax component, this allowed us to be sure that no additional dependencies were added to the system while it is undergoing change.

Updating our Tax Testing Suites

Every line of code that we moved within the component was tested and cleaned. Existing unit tests were re-checked, and new integrations tests were written. We added end-to-end scenarios for each consumer of the tax component until we were satisfied that it tested the usage of tax logic sufficiently— this was the best way to capture failures that may have been introduced to the system as a whole. The unit tests provided confidence that the individual units of our code produced the same functionality and our integration tests provided confidence that our new component did not alter the macro functionality of the system.

Slowly but surely, we completed work on the tax component. Finally, it was ready, and there was just one thing left to do: start using it.


Our code cleanup work was complete, and the only task left was releasing it. We had high confidence in the changes we introduced through componentization of this logic. Even still, we needed to ensure we did not change the behavior of the existing system for the hundreds of thousands of merchants who rely on tax calculations within Shopify while we released it. Up to this point, the code paths into the component were not yet being used in production. For our team, it was paramount that the overall calculation of taxes remained unaffected, so we took a systematic, methodological and measurable approach to releasing.

The Experimental Code Path

The first step to our release was to ensure that our shiny new component was calculating taxes the same way that our existing tax engine was already calculating these same taxes.

We accomplished this by running an “experiment” code path on the new component. When taxes were requested within our code, we allowed our old gnarly code to run, but we simultaneously kicked off the same calculations through the new tax component. Both code paths were being triggered simultaneously and taxes were calculated in both pieces of code concurrently so that we could compare the results. Once we compared the results of old and new code paths, the results from the new component were discarded. Literally, we calculated taxes twice and measured any discrepancies between the two calculations. These result comparisons helped expose some of the more nuanced and intricate portions of code that we needed to modify or test further. Through iterations and minor revisions, we solidified the component and ensured that we didn’t introduce any new problems in the process. This also gave us the opportunity to add additional tests and increase our confidence.

Once there were no discrepancies between old and new, it was time to release the component and start using the new architecture. In order to perform this Indiana Jones-style swap, we rolled out the component to a small number of Shopify shops first, then tested, observed, and monitored. Once we were sure that things were behaving properly, we slowly scaled up the number of shops whose checkouts used the new tax component. Eventually, over the course of a few days, 100% of shops on Shopify were using the new tax component. The tax component is now the only path through the code that is being used to calculate taxes.

Benefits and Impact

Through the efforts of our tax Engineering team, we have added sustainability and extensibility to our tax engine. We did this with no downtime and no merchant impact.

Many junior developers are concerned only with building the required, correct behavior to complete their task. A software engineer needs to ensure that solutions not only deliver the correct behavior, but do it in a way that is easy to understand, modify, and deploy for years to come. Through these componentization efforts, the team organized the code base in a way that is easy for all future developers to work within for years to come.

We constantly receive praise from other developers at Shopify, thanking us for the clean entry point into the Tax Component. Componentization like this reduces the cognitive load and abstract knowledge of the internals of tax calculations in our system.

Interested in learning more about Componentization? Check out It helped us define better interfaces, flow of data and software boundaries.

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Implementing Android POS Receipt Printing on Shopify

Implementing Android POS Receipt Printing on Shopify

Receipts are an essential requirement of every brick-and-mortar business. They’re the proof of purchase that allows buyers to get refunds or make returns/exchanges. Only last year, we estimate that millions of receipts were printed by merchants running Shopify Point Of Sale (POS) for iOS. This feature was only available on iOS because Shopify POS was released first for that platform and is a few development cycles ahead of its Android counterpart. Merchants that strictly needed receipt printing support had no choice but to switch to the iPad but as of March 2019, merchants using an Android device now have the option to provide printed receipts.

The receipt generation process is unique because it’s affected by most features in Shopify POS (like discounts, tips, transactions, gift cards, and refunds) and leads to over 8 billion unique receipt content combinations! These combinations also keep growing as we expand to more countries and support newer payment methods. This article presents our approach to implementing receipt printing support, starting from our goals to an overview of all the challenges involved.

Receipt Printing Support Goals

These were the main goals the Payments & Hardware team had in mind for receipt printing support:

  1. Create a Pragmatic API: printing a receipt for an order should be as simple as a method call.
  2. Be adaptive: supporting printers from different vendors, models and paper sizes should be easily achievable.
  3. Be composable: a receipt is made out of sections, like header, footer, line items, transactions, etc. Adding new sections, in the future, should be a straightforward task.
  4. Be easy to maintain: adding or changing the content of a paper receipt should be as easy as UI development that every Android developer is familiar with.
  5. Be highly testable: almost every feature in the POS app affects the content of a receipt and the combination of content is endless. We should be very confident that the content generation logic is robust enough to cover a multitude of edge cases.

The Printing Pipeline

In order to achieve our goals, first we defined the /Printing Pipeline/ by dividing the receipt printing process into multiple self-contained steps that are executed one after another:

The Printing Pipeline
The Printing Pipeline

During the Data Aggregation step, all the raw data models required to generate a receipt are gathered together. This includes information about the shop, the location where the sale is being made from, a list of payment transactions, gift cards used for payments (if applicable), etc.

In the Content Generation step, we extract all the meaningful data from the raw models to compose the receipt in a buyer-friendly way. Things that matter to the buyer, like the receipt language, date formats and currency formats are taken into account.

Now that we extracted all the meaningful data from the models, we move to the Sections Definition step. At this point, it’s time to split the receipt into smaller logical pieces that we call “receipt sections”.
Receipt Sections

Receipt Sections

After the sections are defined, the receipt is ready to be printed, so we move to the Print Request Creation step. This involves creating a print request out of the buyer-friendly receipt data and sections definition. A print request also includes other printer commands like paper cuts. Depending on the receipt being printed, there might be some paper cuts in it. For example, a gift card purchase requires paper cuts so the buyer can easily detach the printed gift card from the rest of the receipt.

Now a print request is ready to be submitted to the printer, the Content Rendering step kicks in. It’s time to render images for each section of the receipt according to the paper size and printer resolution.

The Printing Pipeline is finalized by the Receipt Printing step. At this point, the receipt images are delivered to the printer vendor SDK and the merchant finally gets a paper receipt out of their printer.

Printing Pipeline Implementation 

Data Aggregation

The very first step is to collect all the raw models required to generate a receipt. We define an interface that asynchronously fetches all these models from either the local SQLite database or our GraphQL API.

Content Generation

After all the data models are collected by the Data Aggregation step, they go through the PrintableReceiptComposer class to be processed and transformed into a PrintableReceipt object, which is a dumb data class with pre-formatted receipt content that will be consumed down the pipeline.

In this context, the use of a coroutine-based API for the Data Aggregation step presented earlier not only improves performance by running all requests in parallel, but also leverages code readability, as it can be seen in the snippet above.

The PrintableReceiptComposer class is where most of the business logic lives. The content of a receipt can drastically change depending on a lot of factors, like item purchased, payment type, credit card payment, payment gateway, custom tax rules, specific card brand certification requirements, exchanges, refunds, discounts, and tips. In order to make sure we are complying with all requirements and the proper display of all features on receipts, we took a heavily test-driven approach. By using test-driven development, we could write the requirements first in the form of unit tests and achieve confidence that data transformation covers not only all the features involved but also several edge cases.

Sections Definition

Now that we have all data put together in its own receipt model exactly like it will be on paper, it’s time to define what sections the receipt is made of:

Sections are just regular Android views that will be rendered in the Content Rendering step. In the Sections Definition step, we specify a list of ViewBinder-like classes, one per section, that is used during the receipt rendering step. Section binders are implementations of a functional interface with a fun bind(view: View, receipt: PrintableReceipt) method definition. All these binders do is bind the PrintableReceipt data model to a given view with little to no business logic in an almost one-to-one, view-to-content mapping. Here is an example of implementation for the total box section:

Print Request Creation

A PrintRequest is a printer-agnostic class composed by a sequence of receipt printer primitives (like lazily-rendered images and cut paper commands) to be executed by low-level printer integration code. It also contains the size of the paper to print on, which can be 2” or 3” wide. During this step, a PrintRequest will be created containing a list of section images and sent to our POS Hardware SDK, which integrates to every printer supported by the app.

Content Rendering

During this step, we will render each section image defined in the PrintRequest. First, the rendering process will inflate a view for the corresponding section and use the section binder to bind the PrintableReceipt object to the inflated view. Then, this bound section view will be drawn to an in-memory Bitmap at a desired scale according to the printer resolution for that paper size.

Receipt Printing

The last step happens in the Hardware SDK where the section Bitmap objects generated in the previous step will be passed down to the printer-specific SDK. At this point, a receipt will come out of the printer.

Hardware SDK Pipeline
The Hardware SDK Pipeline

The POS app will convert an Order object into a PrintRequest by executing all the aforementioned pipeline steps and then it will be sent to the ReceiptPrinterProcessManager in the POS Hardware SDK. At this point, the PrintRequest will be forwarded to a vendor-specific ReceiptPrinter implementation. Since a printer can have multiple connectivity interfaces (like Wi-Fi, Bluetooth or USB), the currently active DeviceConnection will then pass the PrintRequest down to the Printer Vendor SDK at the very last step.

The Hardware SDK is a collection of vendor-agnostic interfaces and their respective implementations that integrate with each vendor SDK. This abstraction enables us to easily add or remove support for different printers and other peripherals of different vendors in isolation, without having to change the receipt generation code.


Since receipt printing is affected by over 30 features, we wanted to make sure we had a multi-step test coverage to enforce correctness, especially when more advanced features, such as tax overrides, come into play. In order to achieve that, we heavily relied on unit tests and test-driven development for the Data Aggregation and Content Generation steps. The latter, which is the most critical one of all, has over 80 test cases stressing a multitude of extraordinary receipt data arrangements, like combinations of different payment types on custom gateways, or transactions in different countries with different currencies and credit card certification rules. Whenever a bug was found, a new test case was introduced along with the fix.

The correctness of the Sections Definition and Content Rendering steps is enforced by screenshot tests. Our continuous integration (CI) infrastructure generates screenshots out of receipt bitmaps and compare them pixel by pixel with pre-recorded baseline ones to ensure receipts look as expected. The Sections Definition benefits from these tests by making sure that each section is properly rendered in isolation and that all of them are composed together in the right order. The Content Rendering step, on the other hand, benefits from having canvas transformations asserted, so that the receipt generation engine can easily adjust to any printer/paper resolution.

Screenshot Test Sample
Baseline screenshot diff on Github after changes made to the line items receipt section

Having a componentized and reusable printing stack gives us the agility we need to focus on extending support for new printer models in the future, no matter what printing resolutions or paper sizes they operate with and it can be done in a just a couple of hours. Taking a test-driven approach not only ensures that multiple edge cases are properly handled, but also enforces a design-by-contract methodology in which the boundaries between steps in the pipeline are well-defined and easy to maintain.

If you like working on problems like these and want to be part of a retail transformation, Shopify is hiring and we’d love to hear from you. Please take a look at the open positions on the Shopify Engineering career page.

Continue reading

Mobile Release Engineering at Scale with Shipit Mobile

Mobile Release Engineering at Scale with Shipit Mobile

One of the most important phases of software development is releasing it out to the final users. At Shopify, we've invested heavily in tooling for continuous deployment for our web apps. When a developer working on a web project wants to deploy their changes to production, the process is as simple as:

Merge → Build container → Run CI → Ship to production

In contrast, uploading a new version of one of our mobile apps to Google Play or the App Store involved several manual steps and a lot of human interaction that caused various problems. We wanted to provide the same level of convenience when releasing mobile apps as we do for web apps, and also take the opportunity to define a framework for all the mobile teams to adopt. For this reason, we developed a new tool: Shipit Mobile, a platform to create, view, and manage app releases.

The Issues With Mobile Releases

Automatic deploys and continuous delivery aren’t possible in mobile for several reasons including approval wait time; coordination between developers, designers, and product managers; and because our users need to update the app. If they don’t have automatic updates enabled, finding several updates of the same app multiple times a week, or even a day, is annoying.
Moreover, a new release isn’t just deploy the latest version of the code from our repository to our infrastructure. Third-party services (the app stores) are involved, and software approval and distribution is owned by them. This means that we can’t update our apps several times a day even if we wanted to.

Uploading a new version of our mobile apps to Google Play or the App Store was fraught with problems. Releasing new apps was error prone due to the high number of manual steps involved, like selecting the commit to release from or executing the script to upload the binary to the store. Different teams had different processes to release mobile apps. The release process wasn’t transferable—knowledge couldn’t be shared within the same organization and the process was inconvenient and complex. Each team had variants of their release scripts, and those scripts were complex and untested. Release version and build numbers had to be managed manually. Finally, there was a lot of responsibility and burden on the release manager. They had to make decisions, fix bugs found along the way, communicate with stakeholders, and coordinate other side tasks.

Our Solution: Shipit Mobile

Mobile Release Flow
Mobile Release flow

Releasing a new mobile app requires performing multiple steps:
Pick a commit to release from

  1. Increment build and version numbers
  2. Run CI
  3. Manually test the app (QA)
  4. Iterate on testing until all the bugs and regressions are fixed
  5. Update screenshots and release notes
  6. Upload it to the store
  7. Wait for app submission to be approved

Our existing tools for releasing web apps weren't suitable for the mobile release process, so we decided to build something new, Shipit Mobile.

Shipit Mobile Home
Shipit Mobile Home

Creating a New Release in Shipit Mobile

Create New Release in Shipit Mobile
Creating a new release in Shipit Mobile

A release starts with a new branch in the repository. We follow a trunk-based development approach in which the release branch is branched off of master. We opted for trunk-based development instead of doing the release directly from the master branch because it reduces the risk of including code that doesn’t belong to the current release by mistake if both new features and bug fixes are pushed to the branch where the release is taking place. Release branches are never merged back to the master branch, and bug fixes are pushed to master and cherry-picked to the release branch.

Branching Model
The branching model

This branching model allows us to automatically manage the release branches and avoid merge conflicts when a release is completed.

Testing and Building the Release Candidate

Release Page in Shipit Mobile
Release page in Shipit Mobile

When a new release is created, a candidate is built. A candidate in Shipit Mobile corresponds with a releasable unit of work. Every new commit in the release branch creates a new candidate in Shipit Mobile. For every candidate, the build number is incremented. A new candidate triggers two continuous integration (CI) pipelines. One is a test pipeline that ensures that the app works as expected, and the other pipeline builds the app for release. We decided to decouple these two pipelines because, if there is an emergency, we want to allow developers to release the app even if the test pipeline has not finished or has failed.

Distributing the App to Different Channels

Distributing the App to Different Channels
Distributing the App to Different Channels

Once the app has been built, tested, and CI is passing, it’s time to upload it to the store. In Shipit Mobile we have the concept of distribution channels. A distribution channel is a platform from which the app can be downloaded. The same release binary can be uploaded to different channels. This is useful if we want to publish the app to Google Play and send the same version of the app internally to our support team so they can quickly access the right version of the app when a support request comes in, and we will know that they have the same app our user downloaded from the store.

The two main channels we support are Google Play and the App Store. For Google Play, we have used the Google Play’s official API. For the App Store, Fastlane makes it easy to upload iOS apps to AppStore Connect. We use them alongside with Apple’s app store connect API to communicate with the App Store, upload the apps and metadata, and check if the apps have been approved and are ready to release.

Configuring Shipit Mobile

Every project that releases using Shipit Mobile needs a configuration file. This file tells Shipit Mobile some basic information about the project, such as the platform, the channels to where we want to upload the app, and optionally the Slack channels that need to be notified with the state of the release. We went with a simple configuration favouring convention over configuration. This can be seen in the way we manage the metadata (release notes, screenshots, app description…). If the configuration file tells Shipit Mobile to upload the metadata to the app store, it knows where to find it and how to upload it as long as it is located in the expected folder.

The Release Captain

A mobile release is a discrete process. Every release contains new features and bugfixes, and we need a person responsible for making decisions along the way. This person is the release captain.

Our previous release process was complex, and the role of the release captain wasn’t easy to transfer to others. Moreover, the release captain had to communicate the state of the release to all the people involved in it.

With Shipit Mobile we wanted to make the release captain role transferable. We made this a reality by building the platform to guide the release captain through the release process and notify the right people when the status of the release changes. It’s now easier for others to see who’s in charge of the release.

Useful Notifications

Providing useful notifications was a request we got from our users since we started to work on Shipit Mobile. Before Shipit Mobile, only the person in charge of the release knew its exact state and was responsible for communicating it to others. With Shipit Mobile we offload this responsibility from the release captain by sending notifications to Slack channels, so every developer in the team can know the release status, removing the burden of communicating the status of the release to stakeholders.

Shipit Mobile Notification

Shipit Mobile Notification

Shipit Mobile Notification
Shipit Mobile Notifications

Emergency Release and Rollout

At Shopify we have trust in our developers and their decisions. Although we strongly recommend a passing CI for both creating a release and uploading the app to the store, we have some mechanisms in place to bypass this restriction. A developer can decide to start a release or upload the app to the store without waiting for CI. This is useful if one of the services we rely on is misbehaving and the status of the builds aren’t being received. At the end of the day, we build tools to make our developers’ lives easier, not to get in their way.

At Shopify, we work on our infrastructure to build better products quickly. Last year, the Mobile Tooling team spent time working on a scalable CI system for Android and iOS, pioneering the use of new technologies like Anka.

Also, we worked on providing a reliable CI system resilient to test and infrastructure flakiness and built tools on top of this system to improve how our mobile developers test apps.

Defining standards through tools across mobile teams and bringing good practices and conventions makes it easier for developers to share knowledge and jump between projects. This is one goal of the Mobile Tooling team.

Shipit Mobile has been used in production for over six months now. During this time we have changed and evolved the platform to accommodate our users’ needs, add new release channels and improve the user interface. It’s shown to be a useful and stable product our developers can trust to release their apps. It’s reduced the complexity that is incurred during a mobile app release, enabling us to speed up our release cadence from three weeks to one week, and we’ve seen more people take on the role of Release Captain for the first time.

Interested in Mobile Tooling? Shopify's Production Engineering team is hiring and we’d love to hear from you. Please take a look at the open positions on the Engineering career page.

Continue reading

A New Kubectl Plugin for Kubernetes Ingress Controller ingress-nginx

A New Kubectl Plugin for Kubernetes Ingress Controller ingress-nginx

Shopify makes extensive use of ingress-nginx, an open source Kubernetes Ingress controller built upon NGINX. Nearly every request Shopify serves is handled at one point by ingress-nginx, and we are active contributors to the project. Since we make use of ingress-nginx and its many features (like annotations and configmap) so heavily, the quality of its debugging experience is very important to us. While debugging ingress-nginx was always possible using a complex litany of kubectl commands, the experience was frequently frustrating. To help solve this issue, I recently contributed a kubectl plugin to the project. It provides a number of features that make ingress-nginx much easier to upgrade and debug, saving us time and increasing our confidence while working with it.

Easier Upgrades

ingress-nginx is a fast moving project. New releases happen every few weeks, and usually contain one or two breaking changes. When running a very large cluster, it can be difficult to know whether or not any configuration changes need to be made to remain compatible with the new version. Our usual process for upgrading ingress-nginx before the plugin existed was to read the CHANGELOG for the new version, export every single ingress in a cluster as YAML, then manually grep through those YAMLs to find anything that would be broken by the new version. With the plugin, it's as simple as running:

to get a nice, formatted list of everything you might need to change.

Improved Ingress Listing

When running the vanilla kubectl get ingresses, a fairly minimal amount of information is returned:


Often you don’t just care about the hosts and addresses of a whole ingress, you care about the individual paths inside the ingress, as well as the services they point to. Using the plugin, you can get a more detailed view of the contents of those ingresses without inspecting each one:

Using the plugin, it is much easier to answer questions like “what service is this path hitting?” or “does this site have TLS configured?”.

Better Debugging

There are many common debugging strategies for ingress-nginx that can become tedious to carry out manually. Usually, you are required to find and select an ingress pod to inspect, and you are required to filter the output of whatever commands you run in order to find the information you’re looking for. The plugin provides convenient wrappers for many kubectl commands that make it quicker and easier to perform these tasks, selecting a single ingress pod automatically to run the command in. As an example:

can be replaced by a single command:

Likewise, inspecting the internal state of the controller is much easier. In the case of reading the controller’s generated nginx.conf for a particular host:

can be replaced by:

ingress-nginx stores some of its configuration state dynamically, making use of the openresty lua-nginx-module to add additional request handling logic. The plugin can be used to inspect this state as well. As an example, if you are using the session affinity annotations to add a session cookie, but the cookie doesn’t seem to be applied to requests, you can first use the plugin to check if the controller is registering the annotation correctly:

This shows that the annotation is correctly reflected in the controller’s dynamic configuration.

My Experience

Since the addition of the plugin, I have found that the time it takes me to upgrade or debug ingress-nginx has decreased substantially. When I first arrived as an intern here at Shopify in January 2019, I was tasked with upgrading our ingress-nginx deployments to version 0.22.0. I spent a long time grepping through ingress manifest dumps looking for breaking changes. I spent time trying to come up with the kubectl invocation that would allow me to inspect nginx.conf inside the controller. I didn’t even know that the dynamic configuration information existed, let alone the arcane incantations that would allow me to read it. There existed no way at all to read certificate information. It took me days to fully roll out the new version.

Near the end of my internship, I upgraded our deployments to ingress-nginx version 0.24.1. Finding breaking changes required only a few invocations of the lint subcommand. Debugging the controller configuration was similarly quick. I had the confidence to ship the new version much more quickly, and did so in a fraction of the time it had taken me a few months ago. Much of this can be attributed to the fact that there was now a single tool that allowed me to easily perform every debugging function I had previously been doing, plus a few more that I hadn’t even known existed. In addition, having all of these previously unintuitive and little-documented debugging tricks collected together in one easily usable tool will make it far easier to get started with debugging ingress-nginx for those who are unfamiliar with the project, as I was.

It’s also true that much of this time difference is due to my growth in both confidence and competence at Shopify. I’ve learned a great deal about Kubernetes, especially the nuts and bolts of how requests to Shopify services get from client to server and back. I’ve had the privilege of being paid to work on an open-source project. I’ve learned the skills, both technical and interpersonal, to function as part of a team in a large organization. Changes that I’ve made, code that I’ve written, have helped to process literally billions of web requests during my time here. This has been an extremely productive four months.

The plugin was released as part of ingress-nginx version 0.24.0, but should be compatible with version 0.23.0 as well. You can find the full plugin documentation and install instructions in the ingress-nginx docs. To get involved with the ingress-nginx project, or to ask questions, drop into the #ingress-nginx channel on the Kubernetes Slack.

If solving problems like these interests you, consider interning at Shopify. The intern application window is now open and closes on Wednesday, May 15th at 9:00 AM EST.  Applications will not be accepted after this date.

Continue reading

Building Shopify’s Application Security Program

Building Shopify’s Application Security Program

Shopify builds products for an industry based on trust. From product discovery to purchase, we act as a broker of trust between the 800,000+ merchants who run their business on our platform and their customers, who come from anywhere in the world. That’s why it’s critical that everyone at Shopify understands the importance of trust in everything we build.

Security is a non-negotiable priority, and we’ve purposefully built a security mindset into our culture. It gives our security team a huge advantage because we start with engaged, talented, and security-minded members across our product teams. But, we also know how important it is that every business on our platform has access to the latest and most innovative features to help them be successful. So, the question is: How do we build an application security program that encourages safety at high speed, removes complexities, and fosters an environment for creative problem solving so that everyone can focus on delivering amazing products to our merchants?

There are three parts to our program that I will outline in this post: scaling secure applications, scaling security teams, and scaling security interactions. When I started at Shopify 7 years ago, I was the lone employee focused on security. Since then, we have grown to a team of dozens of security engineers, covering the breadth of Shopify’s applications, infrastructure, integrations, and hardware platform.

Scaling Secure Applications

As your company grows, the number of different applications and services that will be deployed will inevitably increase. For a small team, it can be daunting to think about providing security for many more services than there are team members, but there are ways to wrangle this sprawl and set your company up with trust at scale.

The first recommendation is to work across R&D disciplines (engineering, data, and UX) and decide on a homogeneous technical baseline you’ll use for your services. There are a lot of non-security advantages to doing this, so the appetite for standardization should be present already. For Shopify, deciding that we would default to all of our products being built in Ruby on Rails meant that our security tooling could go deep on the security concerns for Rails, without thinking about any other web application frameworks. We made similar technical choices up and down the stack (databases, routing, caching, and configuration management) which simplifies the developer experience but also allowed us to ignore security concerns anywhere other than in the things we knew we ran.

Knowing what you are running is a lot harder than it sounds, but it is key to achieving security success at high speed. The way this is done will look different in every organization, but the objective will be the same: visibility. When a new vulnerability is announced, you need visibility into what needs to be patched and the ability to notify the responsible team or automatically kickstart the patching process for every affected service. At Shopify, our security team joined our Production Engineering team’s service tracking project and got a massive head start into having observability of the services, dependencies, and code of everything running in our environment, including the ability to automatically update application dependencies.

Additionally, every new application gets to start with the best defaults we have come up with to this point because we have collectively started hundreds of new projects with the same framework, in the same environment, and using the same technology.

Scaling Security Teams

In a start-up, product direction must be fluid and adapt quickly based on the discovery of new information to keep the company growing. Unless security features are differentiating your product from competitors, investing in a security team isn’t usually the top growth priority. For me, it took over a year before we hired our second security team member. This meant I wore a lot of hats and used some of the tactics described above to ensure a security foundation was included in all new product development.

Growing our security team meant carving off specializations to the first few people we hired. Fraud, application security, infrastructure security, networking, and anti-abuse all started as one-person teams going deep into a particular aspect of the overall security program and feeding their lessons back into the teams across the company.

You also need to understand your options for targeted activities and where third-party services can be used to advance your security agenda. Things like penetration testing, bug bounty programs, and auditing can be used as external validation on a time- and budget-limited basis.

No matter the size of the security team, any security incident is everyone’s responsibility to respond to. Having relationships with teams across the organization will help get the right people quickly moving when you’re faced with an urgent situation or a high severity risk to mitigate. It should never happen that the security team is left with only their team members to fix high priority issues. But there are always ways that security priorities can be embedded within other projects being worked on. Maintaining a list of long-term security enhancements that are ready to be worked on is an invaluable way to make things better without the overhead of staffing an entire team.

Scaling Security interactions

Security teams are renowned for being slow, inconsistent, and risk-averse. In trying to defeat each of those stereotypes, the path to success is to be fast, automated, and risk-aware. The way your security team interacts with the rest of the company is the most important part of consistently building secure products for the long-term.

Deploying security tripwires at the testing and code repository levels allows your team to define dangerous methods and detect unwanted patterns as they are committed. The time when a developer is writing code is the best time to course-correct towards a more secure implementation. To make this effective, flagging a security risk should be designed to be like any good production alert: timely, high-fidelity, actionable, and bring a low false positive rate.

Helped by the success of all the approaches discussed so far, we can build these tactics once and deploy across all of our codebases. With these tactics in place, you gain confidence that even when an application is totally off your radar, you know that it’s being built in line with your security standards. An example of this approach at Shopify is how we handle html_safe. In Rails, html_safe is a confusing function that renders a given string as unescaped HTML, which can be quite unsafe and lead to cross-site scripting vulnerabilities. Our approach to solving this problem consists of renaming this method to dangerously_output_as_html so it’s clear what it does, adding a comment to any pull requests using this method that links to our training materials on mitigating XSS, and triggering an alert to our Application Security (appsec) team so they can review the proposed code change and suggest an alternative and safer approach. This allows our application security team to focus on the exponential benefit of automation rather than the linear benefit of human reviews.

Finally, our best security interactions are the ones we don’t need to have. For example, by making risk decisions at the infrastructure level, we can provide a trustworthy security baseline with our built-in safeguards and tripwires to the teams deploying applications running in that infrastructure without them even knowing those protections are there.

These are just a few of the ways we are tackling the problem of security at scale. Our team is always on the lookout for new ideas and people to join our team to help protect the hundreds of thousands of businesses running on our platform. If these sound like the kinds of problems you want to solve, check out these available positions: Director of Security EngineeringSecurity Engineering Manager, and Lead Software Engineer - Mobile Security.

Continue reading

One Million Dollars in Bug Bounties

One Million Dollars in Bug Bounties

Today, we’re excited to announce that we’ve awarded over $1M USD in bounties through our bounty programs. At Shopify, bounty programs complement our security strategy and allow us to leverage a community of researchers who help secure our platform. They each bring their perspective and specialties and are can evaluate our platform from thousands of different viewpoints to create a better Shopify product and a better user experience for the 800,000+ businesses we safeguard. Our ongoing investment is a clear indication that we are committed to security and making sure commerce is secure for everyone.

Some Bug Bounty Stats

Shopify is the fifth public program, out of 176, to reach the $1M USD milestone on HackerOne, our bug bounty platform. We’ve had some amazing reports and worked with awesome hackers over the last four years, here are some stats to put it into perspective:

Shopify's Bug Bounty Program Stats: Highest Bounty Award $25K. Over 400+ Hackers Thanked. Over 950+ Bugs Resolved. 750+ Bounties Awarded. 375+ Public Disclosures. Held 2 Live Events
Statistics about Shopify's Bug Bounty Programs Since Inception

Top Three Interesting Bugs

Shopify is dedicated to publicly disclosing all vulnerability reports discovered through our program to propel industry education and we strongly encourage other companies to do the same. Three of our most interesting resolved bugs over the years are:

1. SSRF in Exchange leads to ROOT access in all instances - Bounty: $25,000 

Shopify infrastructure is isolated into subsets of infrastructure. @0xacb reported it was possible to gain root access to any container in one particular subset by exploiting a server-side request forgery bug in the screenshotting functionality of Shopify Exchange. Within an hour of receiving the report, we disabled the vulnerable service, began auditing applications in all subsets and remediating across all our infrastructure. The vulnerable subset did not include Shopify core. After auditing all services, we fixed the bug by deploying a metadata concealment proxy to disable access to metadata information. We also disabled access to internal IPs on all infrastructure subsets.

2. Shopify admin authentication bypass using - Bounty: $20,000

@uzsunny reported that by creating two partner accounts sharing the same business email, it was possible to be granted “collaborator” access to a store. We tracked down the bug to incorrect logic in a piece of code that was meant to automatically convert an existing normal user account into a collaborator account. The intention was that, when a partner had a valid user account on the store, their collaborator account request could be accepted automatically, with the user account converted into a collaborator account. We fixed this issue by properly verifying that the existing account is in fact a user account.

3. Stored cross site scripting in Shopify admin and partner pages - Bounty $5,000

@bored-engineer found we were incorrectly sanitizing sales channel icon SVG files uploaded by Partner accounts. During our remediation, we noted the XSS would execute in and the Shopify admin panel, which increased the impact of this bug. The admin functionality was not required, so it was removed. Additionally, we verified that the bug had not been exploited by any other users.

Shopify x HackerOne H1-514
Shopify x HackerOne H1-514

Having reached the $1M in awarded bounties, we’re still looking for ways to ensure our program remains competitive and attractive to hackers. This year we’ll be experimenting with new ways to drive hacker engagement and make Shopify’s bug bounty program more lucrative and attractive to hack on.

Happy Hacking!

If you’re interested in helping to make commerce more secure, visit Shopify on HackerOne to start hacking or our career page to check out our open Trust and Security positions.

Continue reading

Shopify Developers Share Lessons on Self-Advocacy and Dealing with Adversity in the Technology Industry

Shopify Developers Share Lessons on Self-Advocacy and Dealing with Adversity in the Technology Industry

Behind The Code is an ongoing series where we share the stories of Shopify developers and how they’re solving meaningful problems at Shopify and beyond.

In celebration of International Women’s Day, we’re featuring three female developers from various backgrounds sharing their experiences navigating a career in the tech industry and as well as their work and accomplishments at Shopify.

Developer Stella Lee on The Importance of Female Mentorship and Dealing with Adversity in the Technology Industry

Stella Lee

Stella Lee is a developer working on the Shopify Checkout Experience team. The team is responsible for converting a merchants customer’s intent to purchase a product into a successful purchase, and making the buying experience as smooth as possible. She primarily works on building features aiming to eliminate friction for the buyer by allowing them to reuse their information and reduce the number of fields they need to fill out manually, using Typescript and Ruby. When she’s not busy writing code, she’s the founder and co-lead for Shopify’s internal women’s Employee Resource Group (ERG), f(empower), which is supported by our Employee Experience, Diversity and Belonging team and has an executive sponsor. The group works with teams across the business to enable a work environment with equal access to opportunity and supports women to achieve their full potential. The ERG is open to all employees of Shopify and allows a safe space to vocalize the collective experiences and difficulties women in tech face.

When asked what does International Women’s Day mean to her, Stella mentions she never saw the value of needing a specific day to celebrate what should be celebrated every day, but she’s warmed up to it. “The reality is, we still have a long way to go and days like today give people the opportunity to celebrate and learn how each of us can make a positive impact for women. This year, I want to take the opportunity to celebrate the achievements of my fellow ladies in tech, reflect on the current state of gender parity in the industry, and outline concrete ways to advocate for women.”

Dealing with Imposter Syndrome and Learning the Importance of Self-Advocacy

One of the reasons Stella feels so strongly about empowering other women is because she grew up with a mother who she describes as a true inspiration. After leaving behind a great education and career, her mother uprooted the family from their native home in South Korea to Canada in hopes of providing a better life for her family. “She’s the most selfless, strong and caring person that I’ve ever met. She’s the only person I’ve ever met that acts in the best interest of her children without ever expecting anything (not even an ounce of recognition in return). Her unrelenting positivity and resilience in the face of adversity time and time again is truly inspiring.”

Growing up with such a strong role model has played a part in her development and ability to navigate the workplace. She expresses the importance of women advocating for themselves in a way where they can achieve their career goals. “You can't expect anyone else to take charge of your professional development, so you need to own that and figure out what it is you need to grow.” She believes the best way to do this is to ask for the opportunities needed to develop your skills like asking to own a feature of a project, or even to champion one.

Aside from learning how to advocate for herself, Stella has had to learn how to maneuver through her feelings of imposter syndrome, an inability to internalize accomplishments and a persistent fear of being exposed as a fraud.  Imposter syndrome is something that countless people face in many different professions, no matter how far one is in their career. For Stella, these feelings began when she switched to computer science halfway through completing her Bachelor degree. “I didn’t have the full educational background nor the internship experiences that many of my colleagues or other developers had and with the emphasis of gender parity and the importance of diversity in tech, there are times when I’ve wondered if the only reason I’ve occupied a space filled with male developers was because I was a woman.” She goes on further to describe her experience attending various external engineering events where people assume she’s in a non-engineering role or comment that she doesn’t look like a developer. “Previously, I would just internalize these experiences as validation that I didn’t belong within this space. I definitely still struggle with imposter syndrome, but the consistent practice of recognizing and transforming my negative thinking patterns through thought work has helped immensely for my confidence.”

Transforming “Stupid” Questions into Good Questions

Learning how to ask good questions has been her bread and butter since working as a developer, but she believes you can only do this by asking stupid questions first because they’ll eventually become better questions. “Figure out what you don’t know by asking yourself what questions you have, set a timebox, and then take a stab at asking them to yourself or researching the question. I’m a very independent person and I find it hard to ask for help when I know that I’ll eventually find the answer, but I learned later on that doing this wasn’t the best use of my time.” A key to asking questions is to start by stating what you already know, then diving deeper into investigating how to solve that problem. “This shows you’ve done your homework, and it helps the person formulate a more intentional response that isn’t too basic or too advanced.”

Remote Developer Lead Helen Lin on Effective Ways to Manage People and Teams

Helen Lee

Helen Lin works remotely in Vancouver, BC as a Lead Web Developer on the Themes team for the Shopify Online Store. The Themes team is responsible for helping our product lines to integrate features into the online store, using Javascript, Nodejs, Liquid, Ruby, and SQL. Aside from providing free themes for our merchants, her team also establishes proper standards for making features more accessible and improving web performance.

Managing People Through Alignment and Stakeholder Management

As one of the few remote technical leads at Shopify, Helen shares with us some insights into how she manages her team,I think the makings of a good lead mainly lies on your ability to understand your different reports and effectively give direction as to how a project needs to be run. If you don’t understand the long term vision of your company and don’t know how to map out what tasks need to be accomplished, then misalignment can occur leading to low team morale and poor communication.” She periodically flies down to see her team and connect and with all of her reports and stakeholders on the current project she’s working on, allowing her to build strong relationships with the people she works with. She also explained that managing people is one of the most challenging things she’s ever done but she finds it very rewarding. “ I won’t say it hasn’t come with some difficulties, because it definitely has, but learning how to empower people and communicate with people with different personalities and disciplines, has taught me the importance of staying connected and aligned.”

Self-Advocacy and Sharing Accomplishments

Some people struggle with getting aligned with their managers about their personal development and career goals; advocating for themselves is a struggle. For someone like Helen, who’s been working for a number of years now, she feels when advocating for yourself it’s important to directly ask for what it is that you want. For her, self-advocacy is when she can push past her boundaries and do things she thought impossible. “Fighting through that internal fear and challenge is far more powerful than anything I have ever experienced. Now, I take that experience and find ways to help others to advocate themselves through sharing my stories of perseverance in the face of adversity.” She also invests time helping people narrow down their aspirations and figure out what they’re passionate about. “When there is a clear vision of what you want then self-advocacy becomes easy because it's what you want to do and not what other people want you to do.”

Asking Questions and Staying Curious

For those interested in pursuing a career as a developer, she stresses how important it is to understand that asking questions and staying curious is a positive thing. “ One of the best advice I’ve gotten was that I should understand that as human beings we all make mistakes, period. Regardless of if you’re a junior or senior developer, you're bound to mess up at some time. It’s not about trying not to make mistakes, but about how you can fix the mistake and move forward from it.” Building resiliency is a great muscle to flex and not being afraid to ask questions and make mistakes are only going to make you a better developer, so don’t be afraid to speak up.

Web Developer Cathryn Griffiths on Making Career Pivots and Creating an Inclusive Space in The Technology Industry

Cathryn Griffith

Cathryn is a web developer working on the Checkout Experience team, which is responsible for making a customer’s purchase experience as smooth as possible. She’s currently acting as a champion on a project and is in charge of making sure decisions are made, deadlines get met, and work gets done. This is her first time championing a technical project, so it comes with certain challenges, but she has a strong network within her team to help her navigate this new experience.

As someone who has pivoted careers a number of times, Cathryn knows all about how difficult it can be to switch careers and find what you’re passionate about. She’s gone from pursuing a career in academia to working in the private sector as a clinical trial project manager. “After realizing I didn’t have a passion for working in health sciences, I decided to go back to school and gave programming a shot after hearing about the exciting and challenging work that a programmer does. I enrolled in a Bachelor of Computer Science at Concordia University and by my third semester, I had secured a role as a Front-End Developer so I stopped pursuing my BCompSc and started working full-time.”

Finding Her Passion and Changing Careers

Maneuvering through different industries was not an easy thing to do, but she managed by always being open to discovering what work she actually enjoyed doing. However,  making the switch to the technology industry specifically comes with its own challenges, especially as a woman pursuing a career in a male-dominated field. “When I left my first programming job to go to my second one, I hesitated a lot about moving on to the new job because I was afraid I might get stuck in a toxic work environment where my gender would be a problem. That same feeling, that reluctance, hesitation, and nervousness happened again when I thought about leaving my previous job for Shopify. Luckily, in both cases, I ended up in fantastic, supportive work environments.”

We also asked Cathryn how the technology industry can make this space more inclusive for all especially with her experience with making the switch into tech. “On a larger scale: the more diverse our industry can be, the more inclusive it can be too. We have to hire more minorities, and have a workforce with a diverse array of races, ages, genders, sexualities, and ethnicities.” Diversity in the workplace has been proven to be very beneficial to companies, and various companies have initiatives in place to promote a more inclusive workplace and have a more diverse workforce. When asked what companies can do on a day-to-day basis to promote inclusivity, she said “On a smaller scale, something I love that Shopify does is that every meeting room has a paper pyramid, that sits right on the meeting table to help guide a more inclusive discussion. Each face of the pyramid poses a question or statement to ensure people’s voices are being heard during meetings, whether online or in person. For example, ‘Overtalking is real. Go back to the person who was interrupted and let them finish.’” These pyramids are about cultivating a space where people feel comfortable speaking up and people become mindful of speaking too much. So for someone like Cathryn who is newer to the company, she feels included in the conversation and supported by her fellow coworkers.

Advice for People Looking to Switch Careers

As someone who spent years establishing herself in different career paths, she began asking herself questions like, “What are my goals?”, “What do I want to accomplish at the end of the day?” and “I’m I enjoying my work?”. Asking herself these types of questions were pivotal in helping to discover which career she was the most passionate about.

Final Thoughts

“Work hard and embrace feedback. Own your accomplishments, be proud of them, and don’t be afraid to tell others about them. Additionally, own your failures and don’t be afraid to acknowledge them — it’s only by acknowledging them that you can grow from them.”

At Shopify, we’re committed to designing a workplace that challenges and supports employees to hone their craft and make meaningful impact for entrepreneurs around the world. We know that in order to build a platform that will ‘make commerce better for everyone’, we need to have a diverse team building that product and are committed to fostering an inclusive work environment that harnesses differences and brings the best out of each and every individual.  

We’re always looking for awesome people of all backgrounds and experiences to join our team. Visit our Engineering career page to find out what we’re working on.

Continue reading

Deconstructing the Monolith: Designing Software that Maximizes Developer Productivity

Deconstructing the Monolith: Designing Software that Maximizes Developer Productivity

Shopify is one of the largest Ruby on Rails codebases in existence. It has been worked on for over a decade by more than a thousand developers. It encapsulates a lot of diverse functionality from billing merchants, managing 3rd party developer apps, updating products, handling shipping and so on. It was initially built as a monolith, meaning that all of these distinct functionalities were built into the same codebase with no boundaries between them. For many years this architecture worked for us, but eventually, we reached a point where the downsides of the monolith were outweighing the benefits. We had a choice to make about how to proceed.

Microservices surged in popularity in recent years and were touted as the end-all solution to all of the problems arising from monoliths. Yet our own collective experience told us that there is no one size fits all best solution, and microservices would bring their own set of challenges. We chose to evolve Shopify into a modular monolith, meaning that we would keep all of the code in one codebase, but ensure that boundaries were defined and respected between different components.

Each software architecture has its own set of pros and cons, and a different solution will make sense for an app depending on what phase of its growth it is in. Going from monolith to modular monolith was the next logical step for us.

Monolithic Architecture

According to Wikipedia, a monolith is a software system in which functionally distinguishable aspects are all interwoven, rather than containing architecturally separate components. What this meant for Shopify was that the code that handled calculating shipping rates lived alongside the code that handled checkouts, and there was very little stopping them from calling each other. Over time, this resulted in extremely high coupling between the code handling differing business processes.

Advantages of Monolithic Systems

Monolithic architecture is the easiest to implement. If no architecture is enforced, the result will likely be a monolith. This is especially true in Ruby on Rails, which lends itself nicely to building them due to the global availability of all code at an application level. Monolithic architecture can take an application very far since it’s easy to build and allows teams to move very quickly in the beginning to get their product in front of customers earlier. 

Maintaining the entire codebase in one place and deploying your application to a single place has many advantages. You’ll only need to maintain one repository, and be able to easily search and find all functionality in one folder. It also means only having to maintain one test and deployment pipeline, which, depending on the complexity of your application, may avoid a lot of overhead. These pipelines can be expensive to create, customize, and maintain because it takes concerted effort to ensure consistency across them all. Since all of the code is deployed in one application, the data can all live in a single shared database. Whenever a piece of data is needed, it’s a simple database query to retrieve it. 

Since monoliths are deployed to one place, only one set of infrastructure needs to be managed. Most Ruby applications come with a database, a web server, background jobs capabilities, and then perhaps other infrastructure components like Redis, Kafka, Elasticsearch and much more. Every additional set of infrastructure that is added, increases the amount of time you will have to spend with your DevOps hat on rather than your building hat. Additional infrastructure also increases the possible points of failure, decreasing your applications resiliency and security.

One of the most compelling benefits of choosing the monolithic architecture over multiple separate services is that you can call into different components directly, rather than needing to communicate over web service API’s. This means you don’t have to worry about API version management and backward compatibility, as well as potentially laggy calls.

Disadvantages of Monolithic Systems

However, if an application reaches a certain scale or the team building it reaches a certain scale, it will eventually outgrow monolithic architecture. This occurred at Shopify in 2016 and was evident by the constantly increasing challenge of building and testing new features. Specifically, a couple of things served as tripwires for us.

The application was extremely fragile with new code having unexpected repercussions. Making a seemingly innocuous change could trigger a cascade of unrelated test failures. For example, if the code that calculates our shipping rate called into the code that calculates tax rates, then making changes to how we calculate tax rates could affect the outcome of shipping rate calculations, but it might not be obvious why. This was a result of high coupling and a lack of boundaries, which also resulted in tests that were difficult to write, and very slow to run on CI. 

Developing in Shopify required a lot of context to make seemingly simple changes. When new Shopifolk onboarded and got to know the codebase, the amount of information they needed to take in before becoming effective was massive. For example, a new developer who joined the shipping team should only need to understand the implementation of the shipping business logic before they can start building. However, the reality was that they would also need to understand how orders are created, how we process payments, and much more since everything was so intertwined. That’s too much knowledge for an individual to have to hold in their head just to ship their first feature. Complex monolithic applications result in steep learning curves.

All of the issues we experienced were a direct result of a lack of boundaries between distinct functionality in our code. It was clear that we needed to decrease the coupling between different domains, but the question was how

Microservice Architecture

One solution that is very trendy in the industry is microservices. Microservices architecture is an approach to application development in which a large application is built as a suite of smaller services, deployed independently. While microservices would address the problems we experienced, they’d bring another whole suite of problems. 

We’d have to maintain multiple different test & deployment pipelines and take on infrastructural overhead for each service while not always having access to the data we need when we need it. Since each service is deployed independently, communicating between services means crossing the network, which adds latency and decreases reliability with every call. Additionally, large refactors across multiple services can be tedious, requiring changes across all dependent services and coordinating deploys.

Modular Monoliths

We wanted a solution that increased modularity without increasing the number of deployment units, allowing us to get the advantages of both monoliths and microservices without so many of the downsides.

Monolith vs Microservices by Simon Brown
Monolith vs Microservices by Simon Brown

A modular monolith is a system where all of the code powers a single application and there are strictly enforced boundaries between different domains.

Shopify’s Implementation of the Modular Monolith: Componentization

Once it was clear that we had outgrown the monolithic structure, and it was affecting developer productivity and happiness, a survey was sent out to all the developers working in our core system to identify the main pain points. We knew we had a problem, but we wanted to be data-informed when coming up with a solution, to ensure it was designed to actually solve the problem we had, not just the anecdotally reported one.

The results of that survey informed the decision to split up our codebase. In early 2017, a small but mighty team was put together to tackle this. The project was initially named “Break-Core-Up-Into-Multiple-Pieces”, and eventually evolved into “Componentization”.

Code Organization

The first issue they chose to address was code organization. At this time, our code was organized like a typical Rails application: by software concepts (models, views, controllers). The goal was to re-organize it by real-world concepts (like orders, shipping, inventory, and billing), in an attempt to make it easier to locate code, locate people who understand the code, and understand the individual pieces on their own. Each component would be structured as its own mini rails app, with the goal of eventually namespacing them as ruby modules. The hope was that this new organization would highlight areas that were unnecessarily coupled.

Reorganization By Real World Concepts Before And After Snapshots
Reorganization By Real World Concepts - Before And After

Coming up with the initial list of components involved a lot of research and input from stakeholders in each area of the company. We did this by listing every ruby class (around 6000 in total) in a massive spreadsheet and manually labeling which component it belongs in. Even though no code changed in this process, it still touched the entire codebase and was potentially very risky if done incorrectly. We achieved this move in one big bang PR built by automated scripts. Since the changes introduced were just file moves, the failures that might occur would result from our code not knowing where to find object definitions, resulting in runtime errors. Our codebase is well tested, so by running our tests locally and in CI without failures, as well as running through as much functionality as possible locally and on staging, we were able to ensure that nothing was missed. We chose to do it all in one PR so we’d only disrupt all developers as little as possible. An unfortunate downside of this change is that we lost a lot of our Git history in Github when file moves were incorrectly tracked as deletions and creations rather than renames. We can still track the origins using the git `-follow` option which follows history across file moves, however, Github doesn’t understand the move.

Isolating Dependencies

The next step was isolating dependencies, by decoupling business domains from one another. Each component defined a clean dedicated interface with domain boundaries expressed through a public API and took exclusive ownership of its associated data. While the team couldn’t achieve this for the whole Shopify codebase since it required experts from each business domain, they did define patterns and provide tools to complete the task. 

We developed a tool called Wedge in-house, which tracks the progress of each component towards its goal of isolation. It highlights any violations of domain boundaries (when another component is accessed through anything but its publicly defined API), and data coupling across boundaries. To achieve this, we wrote a tool to hook into Ruby tracepoints during CI to get a full call graph. We then sort callers and callees by component, selecting only the calls that are across component boundaries, and sending them to Wedge. Along with these calls, we send along some additional data from code analysis, like ActiveRecord associations and inheritance. Wedge then determines which of those cross-component things (calls, associations, inheritance) are ok, and which are violating. Generally:

  • Cross-component associations are always violating componentization
  • Calls are ok only to things that are explicitly public
  • Inheritance will be similar but isn’t yet fully implemented

Wedge then computes an overall score as well as lists violations per component.

Shopify's Wedge - Tracking the Progress of Each Component Towards its Goal of Isolation
Shopify's Wedge - Tracking the Progress of Each Component Goal

As a next step, we will graph score trends over time, and display meaningful diffs so people can see why and when the score changed.

Enforcing Boundaries

In the long term, we’d like to take this one step further and enforce these boundaries programmatically. This blog post by Dan Manges provides a detailed example of how one app team achieved boundary enforcement. While we are still researching the approach we want to take, the high-level plan is to have each component only load the other components that it has explicitly depended upon. This would result in runtime errors if it tried to access code in a component that it had not declared a dependency on. We could also trigger runtime errors or failing tests when components are accessed through anything other than their public API. 

We’d also like to untangle the domain dependency graph by removing accidental and circular dependencies. Achieving complete isolation is an ongoing task, but it’s one that all developers at Shopify are invested in and we are already seeing some of the expected benefits. As an example, we had a legacy tax engine that was no longer adequate for the needs of our merchants. Before the efforts described in this post, it would have been an almost impossible task to swap out the old system for a new one. However, since we had put so much effort into isolating dependencies, we were able to swap out our tax engine for a completely new tax calculation system.

In conclusion, no architecture is often the best architecture in the early days of a system. This isn’t to say don’t implement good software practices, but don’t spend weeks and months attempting to architect a complex system that you don’t yet know. Martin Fowler’s Design Stamina Hypothesis does a great job of illustrating this idea, by explaining that in the early stages of most applications you can move very quickly with little design. It’s practical to trade off design quality for time to market. Once the speed at which you can add features and functionality begins to slow down, that’s when it’s time to invest in good design. 

The best time to refactor and re-architect is as late as possible, as you are constantly learning more about your system and business domain as you build. Designing a complex system of microservices before you have domain expertise is a risky move that too many software projects fall into. According to Martin Fowler, “almost all the cases where I’ve heard of a system that was built as a microservice system from scratch, it has ended in serious trouble… you shouldn’t start a new project with microservices, even if you’re sure your application will be big enough to make it worthwhile”.

Good software architecture is a constantly evolving task and the correct solution for your app absolutely depends on what scale you’re operating at. Monoliths, modular monoliths, and Service Oriented Architecture fall along an evolutionary scale as your application increases in complexity. Each architecture will be appropriate for a different sized team/app and will be separated by periods of pain and suffering. When you do start experiencing many of the pain points highlighted in this article, that’s when you know you’ve outgrown the current solution and it’s time to move onto the next.

Thank you to Simon Brown for permission to post his Monolith vs Microservices image. For more information on Modular Monolith's please check out Simon's talk from GOTO18.

We're always on the lookout for talent and we’d love to hear from you. Please take a look at our open positions on the Engineering career page.

Continue reading

Unifying Our GraphQL Design Patterns and Best Practices with Tutorials

Unifying Our GraphQL Design Patterns and Best Practices with Tutorials

Read Time: 5 minutes

In 2015, Shopify began a journey to put mobile first. The biggest undertaking was rebuilding our native Shopify Mobile app and improving the tools and technology to accomplish this. We experimented with GraphQL to build the next generation of APIs that power our mobile apps and give our 600,000+ merchants the same seamless experience when using Shopify. There are currently hundreds Shopify developers across teams and offices contributing to our GraphQL APIs including our two public APIs: Admin and Storefront.

Continue reading

Engineering a Historic Moment: Shopify Gets Ready for Cannabis in Canada

Engineering a Historic Moment: Shopify Gets Ready for Cannabis in Canada

On October 17th, 2018, Canada ended a 95-year history of cannabis prohibition. For Shopify, the legalization of cannabis marked a new industry entering the Canadian retail market and we worked with governments and licensed sellers across the country to provide a safe, reliable and scalable platform for their business. For our engineering team, it meant significant changes to our platform to meet the strict requirements for this newly regulated industry.

The biggest change for Shopify was the requirement to store personal information in Canada. This required Canadian-specific infrastructure that we were able to develop through our recent move to the Cloud and Google Cloud Platform’s new region in Montreal. Using this platform as our foundation, we created a new instance of Shopify, in an entirely new region, to meet the needs of this industry. In our migration, we built several new Google Cloud Platform projects (all based in the Montreal region) which included key projects housing Shopify’s core infrastructure such as PCI compliant payment processing infrastructure and a regional data warehouse.

The core infrastructure, which runs on a mixture of Google Kubernetes Engine and Google Compute Engine, already existed in our other regions which meant adding another region was relatively straightforward. We used Terraform to declare and configure all parts of the underlying infrastructure, like networks and Kubernetes Engine clusters. We also took advantage of improved resiliency features in Google Cloud Platform, such as regional clusters. We structured our compute node clusters to segregate workloads, minimizing the noisy neighbour problem to ensure maximum stability and reliability. After a few months of building out this infrastructure, configuring and testing it, we had the first working version of this new regional infrastructure running test shops with a functional storefront and admin. That’s when we faced our next major challenge: scaling.

A major factor in our scaling ability were social factors—particularly, determining the behavior of cannabis consumers, an area with little to no available research. Most research focused on cannabis producers, whereas Shopify needed to figure out the behavior of cannabis consumers. We modeled a number of different traffic scenarios and provisioned enough infrastructure to ensure we could handle the peak traffic from each one. Some of the possibilities we considered included:
  • A strong initial, worldwide surge of interest on storefront pages as curiosity about a government-run online cannabis store peaks
  • Waves of traffic based on multiple days of media coverage across the world, with local timezone spikes
  • Very strong initial sales in the first minutes and hours of store openings as Canadians rush to be one of the first to legally purchase recreational cannabis
  • Possible bursts of denial of service attacks from malicious actors 

We went through multiple cycles of load testing using a mix of different storefront traffic patterns, varying the relative percentages of search, product browsing, collection browsing and checkout actions to stress the system in different ways. Each cycle included different fixes and configuration changes to improve the performance and throughput of the system until we were satisfied that we would be able to handle all possible traffic scenarios. In addition, we modeled and tested different types of bot attacks to ensure our platform defenses were effective. Finally, we conducted multiple pre-mortem discussions and built out mitigation plans to address any scenario which would cause downtime for our merchants.

Sell Cannabis In-store and Online

At the same time, we were solving how to keep personal information contained in Canada. This was extremely challenging as Shopify was built from day one with a number of storage and communications systems located outside of Canada, such as our data warehouse and network infrastructure. We examined each system for personal information to ensure that this information remains stored in Canada.

We ensured there were protections for regional storage in multiple places: inside the application, within the hosts, and at the network/infrastructure level. For our main Ruby on Rails application, we:

  • Built a library which captured network requests and verified the requested host belonged to a list of known safe endpoints.
  • Utilized strict network firewall rules and minimized interconnections to ensure that data wouldn’t accidentally traverse into other jurisdictions.
  • Deployed the containers which house the main application with the absolute minimum number of secrets necessary for the service to function in order to ensure that any service outside the jurisdiction reached in error would simply reject the request due to insufficient credentials.
  • Ensured the infrastructure used unique SSL certificates so data would not cross-pollinate between internal pieces of the system.
  • Deployed all these protections, in combination with monitoring and alerting, ensuring the teams involved were notified of potential issues. 
In the vast majority of cases, more than one of these protections applies to a particular piece of data or system, meaning there are multiple layers of protection in place to ensure that personal information does not migrate outside of Canada.

As launch day neared, we reduced the amount of change we applied to the environment to minimize risk. While the merchants were in their final testing cycles, we continued to perform load testing to ensure that the environment was optimally configured and ready. Having a successful launch day was critical for our merchants and we decided to scale the environment to handle five times the traffic and sales volume projections for launch day. Internally, we ran a series of game days (a form of fault injection where we test our assumptions about the system by degrading its dependencies under controlled conditions) for core infrastructure teams to validate that system performance and alerting was sufficient.

On launch day, merchants chose to take full advantage of the excitement and opened their stores one minute after midnight in their local time zones. That meant we’d see both retail and online launches starting at 10:31 PM EDT on October 16th (Newfoundland and Labrador) and continue through every hour until 3:01 AM EDT on October 17th (British Columbia). And at 12:01 NST, the first legal sale of cannabis in Canada was made on Shopify’s point of sale in Newfoundland followed by successful launches in Prince Edward Island, Ontario and British Columbia — all with zero downtime, excellent performance and secure storage and transmission of personal information within Canada.

Being part of launching a new retail industry and acting as a trusted partner with multiple licensed sellers while building infrastructure with regional data storage requirements, all on a strict deadline, was quite a challenge which required coordination across all Shopify departments. We learned a lot about what it takes to support regulated industries and restricted markets, knowledge which will help us support similar markets in the future, both in Canada and throughout the world. A number of the technologies and processes we developed during this project will continue to be improved and reused to support future deployments with similar requirements. Overall, it was incredibly rewarding to be part of a historic launch by contributing to and supporting the success of licensed recreational cannabis retailers throughout the country.

Intrigued? Shopify is hiring and we’d love to hear from you. Please take a look at the Production Engineer and Senior Technical Security Analyst roles available.

Continue reading

Attracting Local Talent And Building Mobile Apps: A Developer Hiring Initiative

Attracting Local Talent And Building Mobile Apps: A Developer Hiring Initiative

Shopify is a commerce platform that serves over 600,000+ merchants and employs over 3000 people across the globe. We’re always on the lookout for highly skilled individuals with diverse backgrounds to join our team, and that requires us to connect with them outside traditional recruitment channels. That’s why last year we ran the “Build Things, Show Shopify” initiative inviting developers outside of Shopify to build an app and showcase their finished product to a multidisciplinary panel of Shopify employees as well as in front of hiring managers and VCs. The outcome? Not only did we build a local developer community in Ottawa, but we added a number of potential hires to our recruiting pipeline.

Continue reading

How Shopify Uses Recommender Systems to Empower Entrepreneurs

How Shopify Uses Recommender Systems to Empower Entrepreneurs

Authors: Dóra Jámbor and Chen Karako 

There is a good chance you have come across a “recommended for you” statement somewhere in our data-driven world. This may be while shopping on Amazon, hunting for new tracks on Spotify, looking to decide what restaurant to go to on Yelp, or browsing through your Facebook feed — ranking and recommender systems are an extremely important feature of our day-to-day interactions.

This is no different at Shopify, a cloud-based, multi-channel commerce platform that powers over 600,000 businesses of all sizes in approximately 175 countries. Our customers are merchants that use our platform to design, set up, and manage their stores across multiple sales channels, including web, mobile, social media, marketplaces, brick-and-mortar locations, and pop-up shops.

Shopify builds many different features in order to empower merchants throughout their entrepreneurial lifecycle. But with the diversity of merchant needs and the variety of features that Shopify provides, it can quickly become difficult for people to filter out what’s relevant to them. We use recommender systems to suggest personalized insights, actions, tools and resources to our merchants that can help their businesses succeed. Every choice a merchant makes has consequences for their business and having the right recommendation at the right time can make a big difference.

In this post, we’ll describe how we design and implement our recommender system platform.


Collaborative Filtering (CF) is a common technique to generate user recommendations for a set of items. For Shopify, users are merchants, and items are business insights, apps, themes, blog posts, and other resources and content that merchants can interact with. CF allows us to leverage past user-item interactions to predict the relevance of each item to a given user. This is based on the assumption that users with similar past behavior will show similar preferences for items in the future.

The first step of designing our recommender system is choosing the right representation for user preferences. One way to represent preferences is with user-item interactions, derived from implicit signals like the user’s past purchases, installations, clicks, views, and so on. For example, in the Shopify App Store, we could use 1 to indicate an app installation and 0 to represent an unobserved interaction with the given app.

User-item Interaction
User-item interaction

These user-item interactions can be collected across all items, producing a user preference vector.

User Preference Vector
User preference vector

This user preference vector allows us to see the past behavior of a given user across a set of items. Our goal is now to predict the relevance of items that the user hasn’t yet interacted with, denoted by the red 0s. A simple way of achieving our goal is to treat this as a binary classification problem. That is, based on a user’s past item interactions, we want to estimate the probability that the user will find an item relevant.

User Preference (left) and Predicted Relevance (right)

User Preference (left) and Predicted Relevance (right)

We do this binary classification by learning the relationship between the item itself and all other items. We first create a training matrix of all user-item interactions by stacking users’ preference vectors. Each row in this matrix serves as an individual training example. Our goal is to reconstruct our training matrix in a way that predicts relevance for unobserved interactions.

There are a variety of machine learning methods that can achieve this task including linear models such as Sparse Linear Methods (SLIM), linear method variations (e.g., LRec), autoencoders, and matrix factorization. Despite the differences in how these models recover item relevance, they can all be used to reconstruct the original training matrix.

At Shopify, we often use linear models because of the benefits they offer in real-world applications. For the remainder of this post, we’ll focus on these techniques.

Linear methods like LRec and its variations solve this optimization problem by directly learning an item-item similarity matrix. Each column in this item-item similarity matrix corresponds to an individual item’s model coefficients.

We put these pieces together in the figure below. On the left, we have all user-item interactions, our training matrix. In the middle, we have the learned item-item similarity matrix where each column corresponds to a single item. Finally, on the right, we have the predicted relevance scores. The animation illustrates our earlier discussion of the prediction process.

User-item Interactions (left), Item-item Similarity (middle), and Predicted Relevance (right)
User-item Interactions (left), Item-item Similarity (middle), and Predicted Relevance (right)

To generate the final user recommendations, we take the items that the user has not yet interacted with and sort their predicted scores (in red). The top scored items are then the most relevant items for the user and can be shown as recommendations as seen below.

Personalized App Recommendations on the Shopify App Store
Personalized app recommendations on the Shopify App Store

Linear methods and this simple binary framework are commonly used in industry as they offer a number of desired features to serve personalized content to users. The binary aspect of the input signals and classification allows us to maintain simplicity in scaling a recommender system to new domains, while also offering flexibility with our model choice.

Scalability and Parallelizability

As shown in the figure above, we train one model per item on all user-item interactions. While the training matrix is shared across all models, the models can be trained independently from one another. This allows us to run our model training in a task-parallel manner, while also reducing the time complexity of the training. Additionally, as the number of users and items grows, this parallel treatment favors the scalability of our models.


When building recommender systems, it’s important that we can interpret a model and explain the recommendations. This is useful when developing, evaluating, and iterating on a model, but is also helpful when surfacing recommendations to users.

The item-item similarity matrix produced by the linear recommender provides a handy tool for interpretability. Each entry in this matrix corresponds to a model coefficient that reflects the learned relationship of two items. We can use this item-item similarity to derive which coefficients are responsible for a produced set of user recommendations.

Coefficients are especially helpful for recommenders that include other user features, in addition to the user-item interactions. For example, we can include merchant industry as a user feature in the model. In this case, the coefficient for a given item-user feature allows us to share with the user how their industry shaped the recommendations they see. Showing personalized explanations with recommendations is a great way of establishing trust with users.

For example, merchants’ home feeds, shown below, contain personalized insights along with explanations for why those insights are relevant to them.

Shopify Home Feed: Showing Merchants how Their Business is Doing, Along With Personalized Insights
Shopify Home Feed: Showing Merchants how Their Business is Doing, Along With Personalized Insights


Beyond explanations, user features are also useful for enriching the model with additional user-specific signals such as shop industry, location, product types, target audience and so on. These can also help us tackle cold-start problems for new users or items, where we don’t yet have much item interaction data. For example, using a user feature enriched model, a new merchant who has not yet interacted with any apps could now also benefit from personalized content in the App Store.


A recommender system must yield high-quality results to be useful. Quality can be defined in various ways depending on the problem at hand. There are several recommender metrics to reflect different notions of quality like precision, diversity, novelty, and serendipity. Precision can be used to measure the relevance of recommended items. However, if we solely optimize for precision, we might appeal to the majority of our users by simply recommending the most popular items to everyone, but would fail to capture subtleties of individual user preferences.

For example, the Shopify Services Marketplace, shown below, allows merchants to hire third-party experts to help with various aspects of their business.

Shopify Services Marketplace, Where Merchants can Hire Third-party Experts
Shopify Services Marketplace, Where Merchants can Hire Third-party Experts

To maximize the chance of fruitful collaboration, we want to match merchants with experts who can help with their unique problems. On the other hand, we also want to ensure that our recommendations are diverse and fair to avoid scenarios in which a handful of experts get an overwhelming amount of merchant requests, preventing other experts from getting exposure. This is one example where precision alone isn’t enough to evaluate the quality of our recommender system. Instead, quality metrics need to be carefully selected in order to reflect the key business metric that we hope to optimize.

While recommendations across various areas of Shopify optimize different quality metrics, they’re ultimately all built with the goal of helping our merchants get the most out of our platform. Therefore, when developing a recommender system, we have to identify the metric, or proxy for that metric that allows us to determine whether the system is aligned with this goal.


Having a simple and flexible base model reduces the effort needed for Shopify Data Science team members to extend into new domains of Shopify. Instead, we can spend more time deepening our understanding of the merchant problems we are solving, refining key model elements, and experimenting with ways to extend the capabilities of the base model.

Moreover, having a framework of binary input signals and classification allows us to easily experiment with different models that enrich our recommendations beyond the capabilities of the linear model we presented above.

We applied this approach to provide recommendations to our merchants in a variety of contexts across Shopify. When we initially launched our recommendations through A/B tests, we observed the following results:

  • Merchants receiving personalized app recommendations on the Shopify App Store had a 50% higher app install rate compared to those who didn’t receive recommendations
  • Merchants with a personalized home feed were up to 12% more likely to report that the content of their feed was useful, compared to those whose feeds were ranked by a non-personalized algorithm.
  • Merchants who received personalized matches with experts in the Expert Marketplace had a higher response rate and had overall increased collaboration between merchants and third-party experts.
  • Merchants who received personalized theme recommendations on the Shopify Theme Store, seen below, were over 10% more likely to launch their online store, compared to those receiving non-personalized or no recommendations.

Shopify Theme Store: Where Merchants can Select Themes for Their Online Store
Shopify Theme Store: Where Merchants can Select Themes for Their Online Store

This post was originally published on Medium.

This post was edited on Feb 6, 2019

We’re always working on challenging new problems on the Shopify Data team. If you’re passionate about leveraging data to help entrepreneurs, check out our open positions in Data Science and Engineering.

Continue reading

iOS Application Testing Strategies at Shopify

iOS Application Testing Strategies at Shopify

At Shopify, we use a monorepo architecture where multiple app projects coexist in one Git repository. With hundreds of commits per week, the fast pace of evolution demands a commitment to testing at all levels of an app in order to quickly identify and fix regression bugs.

This article presents the ways we test the various components of an iOS application: Models, Views, ViewModels, View Controllers, and Flows. For brevity, we ignore the details of the Continuous Integration infrastructure where these tests are run, but you can learn more from the Building a Dynamic Mobile CI System blog post.

Testing Applications, Like Building a Car

Consider the process of building a reliable car, base components like cylinders and pistons are individually tested to comply with design specifications (Model & View tests). Then these parts are assembled into an engine, which is also tested to ensure the components fit and function well together (View Controller tests). Finally, the major subsystems like the engine, transmission, and cooling systems are connected and the entire car is test-driven by a user (Flow tests).

The complexity and slowness of a test increases as we go from unit to manual tests, so it’s important to choose the right type and amount of tests for each component hierarchy. The image below shows the kind of tests we use for each type of app component; it reads bottom-up like a Model is tested with Regular Unit Tests.

Types of Tests Used for App Components
Types of Tests Used for App Components

Testing Models

A Model represents a business entity like a Customer, Order, or Cart. As the foundation of all other application constructs, it’s crucial to test that the properties and methods of a model ensure conformance with their business rules. The example below shows a unit test for the Customer model where we test the rule for a customer with multiple addresses, the billingAddress must be the first default address.

A Word on Extensions

Changing existing APIs in a large codebase is an expensive operation, so we often introduce new functionality as Extensions. For example, the function below enables two String Arrays to be merged without duplicates.

We follow a few conventions. Each test name follows a compact and descriptive format test<Function><Goal>. Test steps are about 15 lines max otherwise the test is broken down into separate cases. Overall, each test is very simple and requires minimal cognitive load to understand what it’s checking.

Testing Views

Developers aim to implement exactly what the designers intend under various circumstances and avoid introducing visual regression bugs. To achieve this, we use Snapshot Testing to record an image of a view, then subsequent tests compare that view with the recorded snapshot and fails if different.

For example, consider a UITableViewCell for Ping Pong players with the user’s name, country, and rank. What happens when the user has a very long name? Does the name wrap to a second line, truncate, or does it push the rank away? We can record our design decisions as snapshot tests so we are confident the view gracefully handles such edge cases.

UITableViewCell Snapshot Test
UITableViewCell Snapshot Test

Testing View Models

A ViewModel represents the state of a View component and decouples business models from Views—it’s the state of the UI. So, they store information like the default value of a slider or segmented control and the validation logic of a Customer creation form. The example below shows the CustomerEntryViewModel being tested to ensure its taxExempt property is false by default, and that its state validation function works correctly given an invalid phone number.

Testing View Controllers

The ViewController is the top hierarchy of component composition. It brings together multiple Views and ViewModels in one cohesive page to accomplish a business use case. So, we check whether the overall view meets the design specification and whether components are disabled or hidden based on Model state. The example below shows a Customer Details ViewController where the Recent orders section is hidden if a customer has no orders or the ‘edit’ button is disabled if the device is offline. To achieve this, we use snapshot tests as follows.

Snapshot Testing the ViewController
Snapshot Testing the ViewController

Testing Workflows

A Workflow uses multiple ViewControllers to achieve a use case. It’s the highest level of functionality from the user’s perspective. Flow tests aim to answer specific user questions like: can I login with valid credentials?, can I reset my password?, and can I checkout items in my cart?

We use UI Automation Tests powered by the XCUITest framework to simulate a user performing actions like entering text and clicking buttons. These tests are used to ensure all user-facing features behave as expected. The process for developing them is as follows.

  1. Identify the core user-facing features of the app—features without which users cannot productively use the app. For example, a user should be able to view their inventory by logging in with valid credentials, and a user should be able to add products to their shopping cart and checkout.
  2. Decompose the feature into steps and note how each step can be automated: button clicks, view controller transitions, error and confirmation alerts. This process helps to identify bottlenecks in the workflow so they can be streamlined.
  3. Write code to automate the steps, then compose these steps to automate the feature test.

The example below shows a UI Test checking that only a user with valid credentials can login to the app. The testLogin() function is the main entry point of the test. It sets up a fresh instance of the app by calling setUpForFreshInstall(), then it calls the login() function which simulates the user actions like entering the email and password then clicking the login button.

Considering Accessibility

One useful side effect of writing UI Automation Tests is that they improve the accessibility of the app, and this is very important for visually impaired users. Unlike Unit Tests, UI Tests don’t assume knowledge of the internal structure of the app, so you select an element to manipulate by specifying its accessibility label or string. These labels are read aloud when users turn on iOS accessibility features on their devices. For more information about the use of accessibility labels in UI Tests, watch this Xcode UI Testing - Live Tutorial Session video.

Manual Testing

Although we aim to automate as much flow tests as possible, the tools available aren’t mature enough to completely exclude manual testing. Issues like animation glitches and rendering bugs are only discovered through manual testing…some would even argue that so long as applications are built for users, manual user testing is indispensable. However, we are becoming increasingly dependant on UI Automation tests to replace Manual tests.


Testing at all levels of the app gives us the confidence to release applications frequently. But each test also adds a maintenance liability. So, testing each part of an app with the right amount and type of test is important. Here are some tips to guide your decision.

  • The speed of executing a test decreases as you go from Unit to Manual tests.
  • The human effort required to execute and maintain a test increases from Unit tests to Manual tests.
  • An app has more subcomponents than major components.
  • Expect to write a lot more Unit tests for subcomponents and fewer, more targeted tests as you move up to UI Automation and Manual tests...a concept known as the Test Pyramid.

Finally, remember that tests are there to ensure your app complies with business requirements, but these requirements will change over time. So, developers must consistently remove tests for features that no longer exist, modify existing tests to comply with new business rules, and add new tests to maintain code coverage.

If you'd like to continue talking about application testing strategies, please find me on Medium at @u.zziah

If are passionate about iOS development and excellent user experience, the Shopify POS team is hiring a Lead iOS Developer! Have a look at the job posting

Continue reading

The Unreasonable Effectiveness of Test Retries: An Android Monorepo Case Study

The Unreasonable Effectiveness of Test Retries: An Android Monorepo Case Study

At Shopify, we don't have a QA team, we have a QA culture which means we rely on automated testing to ensure the quality of our mobile apps. A reliable Continuous Integration (CI) system allows our developers to focus on building a high-quality system, knowing with confidence that if something is wrong, our test suite will catch it. To create this confidence, we have extensive test suites that include integration tests, unit tests, screenshot tests, instrumentation tests, and linting. But every large test suite has an enemy: flakiness.

A flaky test can exhibit both a passing and failing result with the same code and requires a resilient system that can recover from those failures. Tests can fail for different reasons that aren’t related to the test itself: network or infrastructure problems, bugs in the software that runs the tests, or even cosmic rays.

Last year, we moved our Android apps and libraries to a monorepo and increased the size of our Android team. This meant more people working in the same codebase and more tests executed when a commit merged to master (we only run the entire test suite on the master branch. For other branches only the tests related to what have changed are run). It’s only logical that the pass rate of our test suites took a hit.

Let’s assume that every test we execute is independent of each other (events like network flakiness affect all tests, but we’re not taking that into account here) and passes 99.95% of the time. We execute pipelines that each contain 100 tests. Given the probability of a test, we can estimate that the pipeline will pass 0.9995100 = 95% of the time. However, the entire test suite is made up of 20 pipelines with the same pass probability so it will pass 0.9520 = 35% of the time.

This wasn’t good and we had to improve our CI pass rate.

Developers lose trust in the test infrastructure when CI is red most of the time due to test flakiness or infrastructure issues. They’ll start assuming that every failure is a false positive caused by flakiness. Once this happens, we’ve lost the battle and gaining that developer’s trust back is difficult. So, we decided to tackle this problem in the simplest way: retrying failures.

Retries are a simple, yet powerful mechanism to increase the pass rate of our test suite. When executing tests, we believe in a fail-fast system. The earlier we get feedback, the faster we can move and that’s our end goal. Using retries may sound counterintuitive, but almost always, a slightly slower build is preferable over a user having to manually retry a build because of a flaky failure.

When retrying tests once, the chances of failing CI due to a single test would require that test to fail twice. Using the same assumptions as before, the chances of that happening are 0.05% · 0.05% = 0.000025% for each test. That translates to a 99.999975% pass rate for each test. Performing the same calculation as before, for each pipeline we would expect a pass rate of 0.99999975100 = 99.9975%, and for the entire CI suite, 0.99997520 = 99.95%. Simply by retrying failing tests, the theoretical pass rate of our full CI suite increases from 35% to 99.95%.

In each of our builds, many different systems are involved and things can go wrong while setting up the test environment. Docker can fail to load the container, bundler can fail while installing some dependencies, and so can git fetch. All of those failures can be retried. We have identified some of them as retriable failures, which means they can be retried within the same job, so we don’t need to initialize the entire test environment again.

Some other failures aren’t as easy to retry in the same job because of its side effects. Those are known as fatal failures, and we need to reload the test environment altogether. This is slower than a retriable failure, but it’s definitely faster than waiting for the developer to retry the job manually, or spend time trying to figure why a certain task failed to finally realize that the solution was retrying.

Finally, we have test failures. As we have seen, a test can be flaky. They can fail for multiple reasons, and based on our data, screenshot tests are flakier than the rest. If we detect a failure in a test, that single test is retried up to three times.

The Message Displayed When a Test Fails and It’s Retried.
The message displayed when a test fails and it’s retried.

Retries in general and test retries, in particular, aren’t ideal. They work but make CI slower and can hide reliability issues. At the end of the day, we want our developers to have a reliable CI while encouraging them to fix test flakiness if possible. For this reason, we detect all the tests that pass after a retry and notify the developers so the problem doesn’t go unnoticed. We think that a test that passes in a second attempt shouldn’t be treated like a failure, but as a warning that something can be improved. To reduce the flakiness of builds these are the tips we recommend besides retry mechanisms:

  • Don't depend on unreliable components in your builds. Try to identify the unreliable components of your system and don’t depend on them if possible. Unfortunately, most of the time this is not possible and we need those unreliable components.
  • Work on making the component more reliable. Try to understand why the component isn’t reliable enough for your use case. If that component is under your control, make changes to increase reliability.
  • Apply caching to invoke the unreliable component less often. We need to interact with external services for different reasons. A common case is to download dependencies. Instead of downloading them for every build, we can build a cache to reduce our interactions with this external service and therefore gaining in resiliency.
These tips are exactly what we did from an infrastructure point of view. When this project started, the pass rate in our Android app pipeline was 31%. After identifying and applying retry mechanisms to the sources of flakiness and adding some cache to the gradle builds we managed to increase it to almost 90%.

Pass rate plot from March to September
Pass rate plot from March to September

Something similar happened in our iOS repository. After improving our CI infrastructure, adding the previously discussed retry mechanisms and applying the tips to reduce flakiness, the pass rate grew from 67% to 97%.

It may sound counterintuitive, but thanks to retries we can move faster having slower builds.

We love to talk about mobile tooling. Feel free to reach out to us in the comments if you want to know more or share your solution to this problem.

Intrigued? Shopify is hiring and we’d love to hear from you. Please take a look at our open positions on the Engineering career page

Continue reading

Preparing Shopify for Black Friday and Cyber Monday

Preparing Shopify for Black Friday and Cyber Monday

Making commerce better for everyone is a challenge we face on a daily basis. For our Production Engineering team, it means ensuring that our 600,000+ merchants have a reliable and scalable platform to support their business needs. We need to be able to support everything our merchants throw at us—including the influx of holiday traffic during Black Friday and Cyber Monday (BFCM). All of this needs to happen without an interruption in service. We’re proud to say that the effort we put in to deploying, scaling, and launching new projects on a daily basis gives our merchants access to a platform with 99.98% uptime.

Black Friday Cyber Monday 2018 by the numbers
Black Friday Cyber Monday 2018

To put the impact of this into perspective, Black Friday and Cyber Monday is what we refer to as our World Cup. Each year, our merchants push the boundaries of our platform to handle more traffic and more sales. This year alone, merchants sold over $1.5 billion USD in sales throughout the holiday weekend.

What people may not realize is that Shopify is made up of many different internal services and interaction points with third-party providers, like payment gateways and shipping carriers. The performance and reliability of each of this dependencies can potentially affect our merchants and buyers in different ways. That’s why our Production Engineering teams preparations for BFCM run the entire gamut.

To increase the chances of success on BFCM Production Engineering run “game days” on our systems and their dependencies. Game days are a form of fault injection where we test our assumptions about the system by degrading its dependencies under controlled conditions. For example, we’ll introduce artificial latency into the code paths that interact with shipping providers to ensure that the system continues working and doing something reasonable. That could be, for instance, falling back to another third party or hard-coded defaults if a third party dependency were to become slow for any reason, or verifying that a particular service responds as expected to a problem with their main datastore.

Besides fault injection work, Production Engineering also run load testing exercises where volumes similar to what we expect during BFCM are created synthetically and sent to the different applications to ensure that the system and its components behave well under the onslaught of requests they’ll serve on BFCM.

At Shopify, we pride ourselves on continuous and fast deploys to deliver features and fixes as fast as we can; however, the rate of change on a system increases the probability of issues that can affect our users. During the ramp-up period for BFCM, we manage the normal cadence of the company by establishing both a feature freeze and a code freeze. The feature freeze starts several weeks before BFCM and means no meaningful changes to user-facing features are deployed to prevent changes on merchant’s workflows. At that point in the year, changes, even improvements can have an unacceptable learning curve for merchants that are diligently getting ready for the big event.

A few days before BFCM and during the event an actual code freeze is in effect, means that only critical fixes can be deployed and everything else must remain in stasis. The idea is to reduce the possibility of introducing bugs and unexpected system interactions that could cause the service to be compromised during the peak days of the holiday season.

Did all of our preparations work out? With BFCM in the rearview mirror, we can say, yes. This BFCM weekend was a record breaker for Shopify. We saw nearly 11,000 orders created per minute and around 100,000 requests per second being served for extended periods during the weekend. All and all, most system metrics followed a pattern of 1.8 times what they were in 2017.

The somewhat unsurprising conclusion is that running towards the risk by injecting faults, load testing, and role-playing possible disaster scenarios pays off. Also, reliability goes beyond your “own” system most complex platforms these days have to deal with third parties to provide the best service possible. We have learned to trust our partners but also understand that any system can have downtime and in the end, Shopify is responsible to our merchants and buyers.

Continue reading

Bug Bounty Year in Review 2018

Bug Bounty Year in Review 2018

With 2018 coming to a close, we thought it a good opportunity to once again reflect on our Bug Bounty program. At Shopify, our bounty program complements our security strategy and allows us to leverage a community of thousands of researchers who help secure our platform and create a better Shopify user experience. This was the fifth year we operated a bug bounty program, the third on HackerOne and our most successful to date (you can read about last year’s results here). We reduced our time to triage by days, got hackers paid quicker, worked with HackerOne to host the most innovative live hacking event to date and continued contributing disclosed reports for the bug bounty community to learn from.

Our Triage Process

In 2017, our average time to triage was four days. In 2018, we shaved that down to 10 hours, despite largely receiving the same volume of reports. This reduction was driven by our core program commitment to speed. With 14 members on the Application Security team, we're able to dedicate one team member a week to HackerOne triage.

When someone is the dedicated “triager” for the week at Shopify, that becomes their primary responsibility with other projects becoming secondary. Their job is to ensure we quickly review and respond to reports during regular business hours. However, having a dedicated triager doesn't preclude others from watching the queue and picking up a report.

When we receive reports that aren't N/A or Spam, we validate before triaging and open an issue internally since we pay $500 when reports are triaged on HackerOne. We self-assign reports on the HackerOne platform so other team members know the report is being worked on. The actual validation process we use depends on the severity of the issue:

  • Critical: We replicate the behavior and confirm the vulnerability, page the on-call team responsible and triage the report on HackerOne. This means the on-call team will be notified immediately of the bug and Shopify works to address it as soon as possible.
  • High: We replicate the behavior and ping the development team responsible. This is less intrusive than paging but still a priority. Collaboratively, we review the code for the issue to confirm it's new and triage the report on HackerOne.
  • Medium and Low: We’ll either replicate the behavior and review the code, or just review the code, to confirm the issue. Next, we review open issues and pull requests to ensure the bug isn't a known issue. If there are clear security implications, we'll open an issue internally and triage the report on HackerOne. If the security implications aren't clear, we'll err on the side of caution and discuss with the responsible team to get their input about whether we should triage the report on HackerOne.

This approach allows us to quickly act on reports and mitigate critical and high impact reports within hours. Medium and Low reports can take a little longer, especially where the security implications aren't clear. Development teams are responsible for prioritizing fixes for Medium and Low reports within their existing workloads, though we occasionally check in and help out.


Shopify x HackerOne H1-514
H1-514 in Montreal

In October, we hosted our second live hacking event and it was the first hacking event in our office in Montreal, Quebec, H1-514. We welcomed over 40 hackers to our office to test our systems. To build on our program's core principles of responsiveness, transparency and timely payouts, we wanted to do things differently than other HackerOne live hacking events. As such, we worked with HackerOne to do a few firsts for live hacking events:

  • While other events opened submissions the morning of the event, we opened submissions when the target was announced to be able to pay hackers as soon as the event started and avoid a flood of reports
  • We disclosed resolved reports to participants during the event to spark creativity instead of leaving this to the end of the event when hacking was finished
  • We used innovative bonuses to reward creative thinking and hard work from hackers testing systems that are very important to Shopify (e.g. GraphQL, race conditions, oldest bug, regression bonuses, etc.) instead of awarding more money for the number of bugs people found
  • We gave hackers shell access to our infrastructure and asked them to report any bugs they found. While none were reported at the event, the experience and feedback informed a continued Shopify infrastructure bounty program and the Kubernetes product security team's exploration of their own bounty program.

Shopify x HackerOne H1-514
H1-514 in Montreal

When we signed on to host H1-514, we weren't sure what value we'd get in return since we run an open bounty program with competitive bounties. However, the hackers didn't disappoint and we received over 50 valid vulnerability reports, a few of which were critical. Reflecting on this, the success can be attributed to a few factors:

  • We ship code all the time. Our platform is constantly evolving so there's always something new to test; it's just a matter of knowing how to incentivize the effort for hackers (You can check the Product Updates and Shopify News blogs if you want to see our latest updates).
  • There were new public disclosures affecting software we use. For example, Tavis Ormandy's disclosure of Ghostscript remote code execution in Imagemagick, which was used in a report during the event by hacker Frans Rosen.
  • Using bonuses to incentivize hackers to explore the more complex and challenging areas of the bounty program. Bonuses included GraphQL bugs, race conditions and the oldest bug, to name a few.
  • Accepting submissions early allowed us to keep hackers focused on eligible vulnerability types and avoid them spending time on bugs that wouldn't be rewarded. This helped us manage expectations throughout the two weeks, keep hackers engaged and make sure everyone was using their time effectively.
  • We increased our scope. We wanted to see what hackers could do if we added all of our properties into the scope of the bounty program and whether they'd flock to new applications looking for easier-to-find bugs. However, despite the expanded scope, we still received a good number of reports targeting mature applications from our public program.

H1-514 in Montreal. Photo courtesy of HackerOne
H1-514 in Montreal. Photo courtesy of HackerOne

Stats (as of Dec 6, 2018)

2018 was the most successful year to date for our bounty program. Not including the stats from H1-514, we saw our average bounty increase again, this time to $1,790 from $1,100 in 2017. The total amount paid to hackers was also up $90,200 compared to the previous year, to $155,750 with 60% of all resolved reports having received a bounty. We also went from one five-figure bounty awarded in 2017, to five in 2018 marked by the spikes in the following graph.

Bounty Payouts by Date
Bounty Payouts by Date

As mentioned, the team committed to quick communication, recognizing how important it is to our hackers. We pride ourselves on all of our timing metrics being among the best in the category on HackerOne. While our initial response time slipped by 5 hours to 9 hours, our triage time was reduced by over 3 days to 10 hours (it was 4 days in 2017). Both our time to bounty and resolution times also dropped, time to bounty to 30 days and resolution to 19 days, down from about a month.

Response Time by Date
Response Time by Date

Report Submitted by Date
Report Submitted by Date

In 2018 we received 1,010 reports. 58.7% were closed as not applicable compared to 63.1% in 2017. This was accompanied by an almost one percent increase in the number of resolved reports, 11.3%, up from 10.5% in 2017. The drop in not applicable and rise in informatives (reports which contain useful information but don't warrant immediate action) is likely the result of the team's commitment to only close bugs as not applicable when the issue reported is in our tables of known issues and ineligible vulnerabilities types or lacks evidence of a vulnerability.

Types of Bugs Closed
Types of Bugs Closed

We also disclosed 24 bugs on our program, one less than the previous year, but we tried to maintain our commitment to requesting disclosure for every bug resolved in our program. We continue to believe it’s extremely important that we build a resource library to enable ethical hackers to grow in our program. We strongly encourage other companies to do the same.

Despite a very successful 2018, we know there are still areas to improve upon to remain competitive. Our total number of resolved reports was down again, 113 compared to 121 despite having added new properties and functionality to our program. We resolved reports from only 62 hackers compared to 71 in 2017. Lastly, we continue to have some low severity reports remain in a triaged state well beyond our target of 1-month resolution. The implications of this are mitigated for hackers since we changed our policy earlier in the year to pay the first $500 of a bounty immediately. Since low severity reports are unlikely to receive an additional bounty, most low-severity reports are paid entirely up-front. HackerOne also made platform changes to award the hackers their reputation when we triage reports versus when we resolve them, as was previously the case.

We're planning new changes, experiments and adding new properties in 2019 so make sure to watch our program for updates.

Happy hacking!

If you're interested in helping to make commerce more secure, visit Shopify on HackerOne to start hacking or our career page to check out our open Trust and Security positions

Continue reading

How an Intern Released 3 Terabytes Worth of Storage Before BFCM

How an Intern Released 3 Terabytes Worth of Storage Before BFCM

Hi there! I’m Gurpreet and currently finishing up my second internship at Shopify. I was part of the Products team during both of my internships. The team is responsible for building and maintaining the products area of Shopify admin. As a developer, every day is another opportunity to learn something new. Although I worked on many tasks during my internship, today I will be talking about one particular problem I solved.

The Problem

As part of the Black Friday Cyber Monday (BFCM) preparations, we wanted to make sure our database was resilient enough to smoothly handle increased traffic during flash sales. After completing an analysis of our top SQL queries, we realized that the database was scanning a large number of fixed-size storage units, called innoDB pages, just to return a single record. We identified the records, historically kept for reporting purposes, that caused this excess scanning. After talking among different teams and making sure that these records were safe to delete, the team decided to write a background job to delete them.

So how did we accomplish this task which could have potentially taken our database down, resulting in downtime for our merchants?

The Background Job

I built the Rails background job using existing libraries that Shopify built to avoid overloading the database while performing different operations including deletion. A naive way to perform deletions is sending either a batch delete query or one delete query per record. It’s not easy to interrupt MySQL operations and doing the naive approach would easily overload the database with thousands of operations. The job-iteration library allows background jobs to run in iterations and it’s one of the Shopify libraries I leveraged to overcome the issue. The job runs in small chunks and can be paused between iterations to let other higher priority jobs run first or to perform certain checks. There are two parts of the job; the enumerator and the iterator. The enumerator fetches records in batches and passes one batch to the iterator at a time. The iterator then fetches the records in the given batch and deletes them. While this made sure that we weren’t deleting a large number of records in a single SQL query, we still needed to make sure we weren’t deleting the batches too fast. Deleting batches too fast results in a high replication lag and can affect the availability of the database. Thankfully, we have an existing internal throttling enumerator which I also leveraged writing the job.

After each iteration, the throttling enumerator checks if we’re starting to overload the database. If so, it automatically pauses the job until the database is back in a healthy state. We ensured our fetch queries used proper indexes and the enumerator used a proper cursor for batches to avoid timeouts. A cursor can be thought of as flagging the last record in the previous batch. This allows fetching records for the next batch by using the flagged record as the pivot. It avoids having to re-fetch previous records and only including the new ones in the current batch.

The Aftermath

We ran the background job approximately two weeks before BFCM. It was a big deal because not only did it free up three terabytes of storage and resulted in large cost savings, it made our database more resilient to flash sales.

For example, after the deletion, as seen in the chart below, our database was scanning around ~3x fewer pages in order to return a single record. Since the database was reading fewer pages to return a single record, it meant that during flash sales, it can serve an increased number of requests without getting overloaded because of unnecessary page scans. This also meant that we were making sure our merchants get the best BFCM experience with minimal technical issues during flash sales.

Database Scanning After Deletion
Database Scanning After Deletion

Truth to be told, I was very nervous watching the background job run because if anything went wrong, that meant downtime for the merchants, which is the last thing we want and man, what a horrible intern experience. At the peak, we were deleting approximately six million records a minute. The Shopify libraries I leveraged helped to make deleting over 🔥5 billion records🔥 look like a piece of cake 🎂.

🔥5 billion records🔥
5 billion Records Deleted

What I Learned

I learned so much from this project. I got vital experience with open source projects when using Shopify’s job-iteration library. I also did independent research to better understand MySQL indexes and how cursors work. For example, I didn’t know about partial indexes and how they worked. MySQL will pick a subset of prefix keys, based on the longest prefix match with predicates in the WHERE clause, to be used by the partial index to evaluate the query. Suppose we have an index on (A,B,C). A query with predicates (A,C) in the WHERE clause will only use the key A from the index, but a query with predicates (A,B) in the WHERE clause will use the keys A and B. I also learned how to use SQL EXPLAIN to analyze SQL queries. It shows exactly which indexes the database considered using, which index it ended up using, how many pages were scanned, and a lot of other useful information. Apart from improving my technical skills, working on this project made me realize the importance of collecting as much context as one can before even attempting to solve the problem. My mentor helped me with cross-team communication. Overall, context gathering allowed me to identify any possible complications ahead of time and make sure the background job ran smoothly.

Can you see yourself as one of our interns? Applications for the Summer 2019 term will be available at from January 7, 2019. The deadline for applications is Monday, January 21, 2019, at 9:00 AM EST!

Continue reading

Director of Engineering, Lawrence Mandel Talks Road to Leadership, Growth, and Finding Balance.

Director of Engineering, Lawrence Mandel Talks Road to Leadership, Growth, and Finding Balance.

Lawrence Mandel is a Director of Production Engineering leading Shopify’s Developer Acceleration team and has been at Shopify for over a year. He previously worked at IBM and Mozilla where he started as a software developer before transitioning into leadership roles. Through all his work experience, he’s learned to understand the meaning of time management and to prioritize the most important things in his life, which are his family, health, and work.  

Continue reading

Developer Talks: How the Command Line Can Empower You (Webinar)

Developer Talks: How the Command Line Can Empower You (Webinar)

On Tuesday, 27 November 2018, Eric Fung, Senior Data Scientist presented "How the Command Line Can Empower You." 

You can watch this presentation on and download the speaker notes at

Developer Talks: How the Command Line Can Empower You - November 27, 2018. 1-2pm EST

As a developer, you probably use a modern IDE that lets you write, debug, test, and deploy your code quickly and easily. However, your job often includes activities performed outside your IDE, such as working with APIs, creating screenshots, or massaging data. Eric wants to show you how the command-line can simplify, improve, or even automate some of these tasks.

Eric provides an overview of utilities that gives you more ways to get your work done and get it done faster. Using real-world examples, you'll learn how to type less in the terminal, search your files with ease, manipulate images and JSON files, write code automatically, and more. All without needing to point, click, or swipe!

If you are curious about command-line tools or want to learn more about their impressive capabilities, this talk is for you. This presentation focuses on software available for macOS computers, but Linux and Windows users can benefit, as many of the tools mentioned are cross-platform.

This presentation is a 45-minute talk with 15 minutes dedicated to Q&A.

Couldn't make the presentation? Here as is a link to the view the presentation

About Eric Fung

Since 2010, Eric has worked on many mobile apps and games and spent five years at Shopify as an Android developer before recently transitioning to a data scientist role. At the beginning of his career, he spent a lot of time in Linux and the command-line. Eric caught the public speaking bug a few years ago and is an organizer of GDG Toronto Android. In addition to coding, he enjoys making and eating elaborate desserts.

Continue reading

Handling Addresses from All Around the World

Handling Addresses from All Around the World

Four months ago, I joined the International Growth team at Shopify. The mission of the INTL team (as we call it) is to help Shopify conquer international markets. Our team builds tools, services and enhances Shopify’s platform to make it scale to different markets where we need to tailor the experience locally to a country: add new shipping patterns, new payment paradigms, and be compliant with local laws.

As a senior web developer, the first problem I tackled was to make sure addresses were formatted correctly for everyone, everywhere. Addresses are core parts of our merchant’s business; crucial when delivering products and dealing with customers. At the same time, they are also a crucial part of a customer's journey. Entering an address in a form seems obvious, but there are essential details that you need to get right when going international. Details that might not seem obvious if you haven't thought about it or never lived abroad.

I’m going to take you through some of the problems the team encountered when dealing with addresses and how we solved some of those problems.

The Problem with Addresses


Let’s start with a simple definition. At Shopify, we describe an address with the following fields:

  • First name
  • Last name
  • Address line 1
  • Address line 2
  • Zone code
  • Postal code
  • City
  • Country code
  • Phone

Zones are administrative divisions by country (see Wikipedia’s article), they are States in the US, provinces in Canada, etc. Some of these fields may be optional or required depending on the country.


When looking at the fields listed above, I’m assuming that for some readers, the order of the fields listed make sense. Well, it’s not the case for most people of the world. For example:

  • In Japan, people start their address by entering their postal code. Postal codes are very precise, so with just seven digits, a whole address can be auto-completed. The last name is first, otherwise, it’s considered rude
  • In France, the postal code comes before the city while in Canada it’s the opposite

As you can imagine, the list goes on and on. All of these details can’t be overlooked for a proper localized experience for customers connecting from everywhere in the world. At the same time, creating one version of the form for every country leads to unnecessary code duplication— something to avoid for the code to scale and remain maintainable.


Let's talk about wording. What is address1? What is zone? Parts of an address aren’t the same around the world, so how to name the labels of forms when building them? The tough part of these differences, from a developer’s perspective, is that we had variations per country, as well as, variations per locale. For example:

  • Zone can refer to "states", "provinces", "regions" or even "departments" in certain countries (such as Peru) 
  • Postal code can be called "ZIP code" or "postcode" or even "postal code"
  • address2 might refer to "apartment number", "unit number" or "suite"
  • In Japan, when displaying an address, the symbol 〒 is prepended to the postal code so, if a user enters 153-0062, it displays as 〒153-0062


Translation is the most obvious problem, form labels need translation but so do country and zone names. Canada is written the same way in most languages, it’s カナダ in Japanese or كندا in Arabic. Canada is bilingual, so provinces labels are language specific: British Columbia in English becomes Colombie-Britannique in French, etc.

Our Solution (So Far)

We’re at the beginning of our journey to go international. Solutions we come up with are never finished; we iterate and evolve as we learn more. That being said, here’s what we're doing so far.

A Database for Countries

The one thing we needed was a database storing all the data representing every country. Thankfully, we already built it at the beginning of our Internationalization journey (phew!) and had every country represented with YAML files in a GitHub repository. The database stored every country’s basic information such as country code, name, currency code, and a list of zones, where applicable.


The same way we have formats to represent dates, we created formats to describe addresses per country. With our database for countries, we can store these formats for every country.

Form Representation

What is the order we want to show input fields when presenting an address form? We came up with the following format to make it easier for reuse:

  • {fieldName}: Name of the field
  • _: line break

Here’s an example with Canada and Japan:


Form Representation Japan




Form Representation Canada
Form Representation Canada

Now, with a format for every country, we dynamically reorder the fields of an address form based on the selected country. When the page loads, we already know which country the shop is located and where the user is connecting from, so we can prepopulate the form with the country and show the fields in the right order. If the user changes the country, we also reorder on the fly the form. And since we store the data on provinces, we can also prepopulate the zone dropdown on the fly. 

Display Representation

We’ve used the same representation to show an address as above and the only difference here is that extra characters used to represent an address for different locales are displayed. Here’s another example with Japan and Canada:

{country}_〒{zip}{province}{city}{address1}{address2}_{company}_{lastName} {firstName}様_{phone}
{firstName} {lastName}_{company}_{address1} {address2}_{city} {province} {zip}_{country}_{phone}

The thing to note here is that for Japan, we add characters such as 〒 to indicate that what follows is a postal code or we add 様 (“sama”) after the first name which is the formal and respectful form of Miss/Mr/Mrs. And for other countries, we can add commas if necessary and account for spaces.

Labels and Translations

The other problem to resolve was the name of the labels we use to display address data. Remember, the label for postal code can be different in different countries. To solve this, we created a list of keys for certain fields. Our implementation approach is to make changes incrementally instead of taking on the enormous task (it would probably take forever!) of having our address forms work for all countries from the get-go. Based on our most popular countries, we came up with specific label keys that we translate in our front end.

So, as in our previous example, zones are Provinces in Canada and in Japan they’re Prefectures. So in our YAML file for Canada, we’ve added zone_key: province and in Japan’s we’ve added zone_key: prefecture. We translate these keys in our front end. We’ve applied this same logic to other countries and fields when needed. For example, we have zip_key: postcode for certain countries and zip_key: pincode for others. We include default values for all our label keys since we don’t have a value for all countries yet.

Screenshot of the checkout in Japanese and English



As mentioned earlier, country names and province names need translation so we store them per language for most of them. We translate country names in all of our supported locales, but we only translate zones when necessary and based on the usage and the locale. So, for example, Canada has translations for French and English for now. So by default, the provinces will be rendered in English unless your locale is fr. We’ll evolve our translations over time.

API endpoint

Shopify is an ecosystem where many apps live. To ensure our data is up to date everywhere at the same time we created an API endpoint to access it. This way, our iOS, Android and front-end applications will be in sync when introducing new formats for new countries. No need to update the information everywhere since every app will be using the endpoint. The advantage of this approach is in the future we might realize that some formatting isn't only country related but also locale related, e.g. firstName and lastName are reversed when the locale is Japanese no matter if the address in Japan or Canada. Since the endpoint receives the locale for each request, this problem will be transparent from the client.

Creating Abstraction / Libraries

To make the life of developers easier, we’ve created abstraction libraries. Yes, we want to localize our apps, but we also want to keep our developers happy. And asking them to query a graph endpoint and parse the formats we came up with is… maybe a bit much. So we’ve created abstractions to overcome this:

  • Other non-public components built on top of @shopify/address such as an AddressForm and an Address to add another easy abstraction for developers which displays the address form as easily as doing:

The Future

This is the current state of how we’re solving these problems. There are drawbacks that we’re tackling, such as overcoming the fact that we need to fetch information to render an address. Implementing caching solution to prevent from having to do a network call every time we want to render an address or an address form for instance. But this will evolve as we gain more context, knowledge and we grow our tooling around going international.

Intrigued by Internationalization? Shopify is hiring and we’d love to hear from you. Please take a look at our open positions on the Engineering career page.

Continue reading

Running Apache Kafka on Kubernetes at Shopify

Running Apache Kafka on Kubernetes at Shopify

In the Beginning, There Was the Data Center

Apache KafkaShopify is a leading multi-channel commerce platform that powers over 600,000 businesses in approximately 175 countries. We first adopted Apache Kafka as our data bus for reliable messaging in 2014 and mainly used it for collecting events and log aggregation across the systems.

In that first year, our primary focus was building trust in the platform with our data analysts and developers by automating all aspects of cluster management, creating the proper in-house tooling needed for our daily operations, and helping them use it with minimum friction. Initially, our deployment was a single regional Kafka cluster in each of our data centers and one aggregate cluster for our data warehouse. The regional clusters would mirror their data to the aggregate using Kafka’s mirrormaker.

Apache Kafka deployment in the data center

Fast forward to 2016, and we’re managing many multi-tenant clusters in all our regions. These clusters are the backbone of our data superhighway — delivering billions of messages every day to our data warehouse and other application-specific Kafka consumers. Chef provisioned, configured and managed our Kafka infrastructure in the data center. We deploy a configuration change to all clusters at once by updating one or more files in our Chef GitHub repository.

Moving to the Cloud

In 2017, Shopify started moving some services from our data centers to the cloud. We took on the task of migrating our Kafka infrastructure to the cloud with zero downtime. Our target was to achieve reliable cluster deployment with predictable and scalable performance and do all this without sacrificing ease of use and security. Migration was a three-step process:

  1. Deploy one regional Kafka cluster in each cloud region we use, and deploy an aggregate Kafka cluster in one of the regions.
  2. Mirror all regional clusters in the data center and in the cloud to both aggregate clusters in the data center and in the cloud. This guarantees both aggregate clusters will have the same data.
  3. Move Kafka clients (both producers and consumers) from the data center clusters and configure them to point to the cloud clusters.

Apache Kafka deployment during our move to the cloud
Apache Kafka deployment during our move to the cloud

By the time we migrated all clients to the cloud clusters, the regional clusters in the data center had zero incoming traffic and we could safely shut them down. That was followed by a safe shutdown of the aggregate Kafka cluster in the data center as no more clients were reading from it.

Virtual-Machines or Kubernetes?

We compared running Kafka brokers in Google Cloud Platform (GCP) as Virtual Machines (VM) vs. running it in containers managed by Kubernetes and we decided to use Kubernetes for the following reasons. 

The first option using GCP VMs is closer in concept to how we managed physical machines in the data center. There, we have full control of the individual servers, but we also need to write our own tooling to monitor, manage the state of the cluster as a whole, and execute deployments in a way that we do not impact Kafka availability. For example, we can’t perform a configuration change and restart all Kafka brokers at once —this results in a service outage.

Kubernetes, on the other hand, offers abstract constructs to manage a set of containers together as a stateless or stateful cluster. Kubernetes manages a set of Pods. Each Pod is a set of functionally related containers deployed together on a server called a Node. To manage a stateful set of nodes like a Kafka cluster, we used Kubernetes StatefulSets to control deployment and scaling of containers with an ordered and graceful deployment of changes including guarantees to prevent compromising the overall service availability. And to implement our own custom behavior that’s not provided by Kubernetes, we extended it using Custom Resources and Controllers, an extension for Kubernetes API to create user-defined resources and implement actions when these resources are updated.

This is an example of a Kubernetes StatefulSet template used to configure a Kafka cluster of 30 nodes:

Kubernetes StatefulSet template
Kubernetes StatefulSet template

Containerizing Kafka

Running Kafka in a docker container is straightforward, the simplest setup is for the Kafka server configuration to be stored in a Kubernetes ConfigMap and to mount the configuration file in the container by referencing the proper configMap key. But… pulling a third party Kafka image is risky since depending on a Kafka image from an external registry risks application failure if the image is changed or removed! We highly recommend hosting your own container registry and building your own Kafka image. In a critical software environment where you want to minimize sources of failures, it’s more reliable to build the image yourself and host it in your own registry, giving you more control on its content and availability.

Best Practices

Our Kafka Pods contain the Kafka container itself and another resource-monitoring container. Kafka isn’t friendly with frequent server restarts because restarting a Kafka broker or container means terabytes of data shuffling around the cluster. Restarting many brokers at the same time risks having offline-partitions and consequently data-loss. These are some of the best practices we learned and implemented to tune the cluster availability:

  • Node Affinity and Taints: Schedules Kafka containers on nodes with the required specifications. Taints guarantees that other applications can’t use nodes required for Kafka containers. 
  • Inter-pod Affinity and Anti-Affinity prevents the Kubernetes scheduler from scheduling two Kafka containers on the same node.
  • Persistent Volumes is persistent storage for Kafka pods and guarantees that a Pod always mounts the same disk volume when it restarts.
  • Kubernetes Custom Resources extends Kubernetes functionality; we use to automate and manage Kafka Topic provisioning, cluster discovery, and SSL certificate distribution.
  • Kafka broker’s rack-awareness reduces the impact of a single Kubernetes zone failure by mapping Kafka containers to multiple Kubernetes zones
  • Readiness Probe guarantees how fast we roll configuration changes to cluster nodes.

We successfully migrated all our Kafka clusters to the cloud. We run multiple regional Kafka clusters and an aggregate one to mirror all other clusters before feeding its data into our data warehouse. Today, we stream billions of events daily across all clusters — these events are key to our developers, data analysts, and data scientists to build a world-class, data-driven commerce platform.

If you are excited about working on similar systems join our Production-Engineering team at Shopify here: Careers at Shopify 


Continue reading

Building Shopify POS for Android Using MVVM

Building Shopify POS for Android Using MVVM

There are many architectures out there to structure your app. The one we use in Shopify’s Point of Sale (POS) for Android app is the Model-View-ViewModel (MVVM) pattern based on Google’s App Architecture Guide which was announced last year at Google I/O 2017.

Shopify’s Point of Sale (POS) for Android app
Shopify POS


Our POS app is three and a half years old, and we didn’t build it using MVVM from scratch. Before the move to MVVM, we had two competing architectures in our codebase: Model View Controller (MVC) and Model View Presenter (MVP). Both did the job, but they created inconsistency within the codebase. The developers on the team had difficulty switching between the two options, and we didn’t have good answers for questions about which architecture to use when developing new screens and features. The primary advantages for adopting MVVM are consistent architecture, automatic retention of state across configuration changes, and a clearer separation of concerns that lead to easier testing. MVVM helped new members of the team get up to speed during onboarding as they now can find consistent functional examples throughout the codebase and consult the official Android documentation which the team uses as a blueprint. Google is actively maintaining the Android Architecture Components, so we get peace of mind knowing that we’ll continue to reap the benefits as this library is improved.

With a significant amount of code using legacy MVC and MVP architectures, we knew we couldn’t make the switch all at once. Instead, the team committed to writing all new screens using MVVM and converting older screens when making significant changes. Though we still have a few screens using MVC and MVP, there isn’t confusion anymore because everyone now knows there is one standard and how to incorporate it into our existing and future codebase.


I’ll explain the basic idea and flow of this architecture by describing the following components of MVVM.

Flows in a Model-View-ViewModel Architecture
Flows in a Model-View-ViewModel Architecture

View: View provides an interface for the user to interact with the app. In Shopify’s POS app, Fragment holds the View and the View holds different sub-views which handle all the user interface (UI) interactions. Any actions that happen on the UI by the user (for example, a button click or text change), View tells ViewModel about those actions via an interface callback. All of our MVVM setups use interfaces/contracts to interact with one another. We never hold references to the actual instance, for example, View won’t keep a reference to the actual ViewModel object, but instead to an instance of the contract object (I’ll describe it below in the example). Another task for View is to listen to LiveData changes posted by the ViewModel and then update its UI by receiving the new data content from LiveData.

ViewModel: ViewModel is responsible for fetching data and providing the updated data back to the UI. The ViewModel gets notified of UI actions via events generated by View, for example, onButtonPressed(). Based on a particular action, it fetches the data state, mutates it as per the business logic and tells View about the new data changes by posting it to LiveData. The ViewModel instance survives configuration changes, such as screen rotations, so when re-creating the Activity or Fragment instance, they re-connect to the existing ViewModel instance. So, the data that’s held by the ViewModel object remains available to the re-created Activity or Fragment instance. ViewModel dies when the associated Activity dies, or the Fragment is detached.

ViewModelProvider: This is the class responsible for providing ViewModel to the UI component and retaining that ViewModel instance while the scope of the given Activity or Fragment is alive.

Model: The component that represents the data source (e.g., the persistent model, web service, and cache). They’re responsible for handling the data for the app. For example, if our app needs to get a list of users, it would fetch it from a local database, if available. Otherwise, it would fetch the data from the network and save it in the database for later use.

LiveData: LiveData is an observable class that acts as a container for holding data. View subscribes to LiveData objects to get notified of any data updates. LiveData respects the lifecycle states of the app components, and it only passes the updates about data when the Fragment is in the active state, i.e., only the active observers get the updates.

Let me run through a simple example to demonstrate the flow of MVVM architecture:

1. The user interacts with the View by pressing Add Product button.

2. View tells ViewModel that a UI action happened by calling onAddProductPressed() method of ViewModel.

3. ViewModel fetches related data from the DB, mutates it as per the business logic and then posts the new data to LiveData.

4. The View which earlier subscribed to listen for the changes in LiveData now gets the updated data and asks other sub-views to update their UI with the new data.

Benefits of Using MVVM Architecture

Since Shopify moved to MVVM, we’ve taken advantage of the benefits this architecture has to offer. MVVM offers separation of concerns. View is only responsible for UI related logic like displaying UI data and reacting to user actions. ViewModel handles data preparation and mutation tasks. Using contracts between the View and ViewModel provide a strong separation of concerns and well-defined responsibilities. Driving UI from a ViewModel makes our data survive configuration changes, i.e., our data state is retained due to ViewModel caching.

Testing the business logic and UI interactions is efficient and easier with MVVM because of the strong separation of concerns, we can test the business logic and different view states of the app independently. We can perform screenshot testing on the View to check the UI since it has no business logic, and similarly, we can unit test the ViewModel without having to create Fragments and Views. You can read more about it in this article about creating verifiable Android apps on Shopify Mobile’s Medium page.

LiveData takes care of complex Android lifecycle issues that happen when the user navigates through, out of, and back to the application. When updating the UI, LiveData only sends the update when the app is in an active state. When the app is in an inactive state, it doesn’t send any updates, thus saving app from crashes or other lifecycle issues.

Finally, keeping UI code and business logic separate makes the codebase easier to modify and manage for developers as we follow a consistent architecture pattern throughout the app.

Intrigued? Shopify is hiring and we’d love to hear from you. Please take a look at our open positions on the Engineering career page.

Continue reading

Creating Locale-aware Number and Currency Condensing

Creating Locale-aware Number and Currency Condensing

It’s easy to transform a long English number into an abbreviated one. Two thousand turns into 2K, 1,000,000 becomes 1M and 10,000,000,000 is 10B. But when multiple languages are involved, condensing numbers stops being so straightforward.

I discovered that hard truth earlier this year as Shopify went multilingual, allowing our 600,000+ merchants to use Shopify admin in six additional languages (French, German, Japanese, Italian, Brazilian Portuguese, and Spanish). 

My team is responsible for the front-end web development of Shopify Home and Analytics within the admin, which merchants see when they’re logged in. Shopify Home and Analytics are the windows into every merchant's customers and sales. One of the internationalization challenges we faced was condensing numbers worldwide for graphs displaying essential information, including sales, visits and customer data. Without shortening numbers, many merchants would see long numbers taking up too much space on a graph’s axis, throwing off the design of Shopify’s Admin.

Without condense-number
Without condense-number

With condense-number
With condense-number

Team member Andy Mockler and I wrapped up most of the project in June, over Shopify’s quarterly Hack Days, which allows Shopifolk to take a two-day break from regular work to hack uninterrupted on a project of their choice. We realized that Hack Days presented the ideal opportunity to deliver this functionality and make it available to other developers in Shopify working on their internationalization goals.

Initially, we looked around to see if there was an existing JavaScript solution that worked for us. (Spoiler alert: there wasn’t.) There’s a built-in JavaScript Intl API for language-sensitive formatting, but a proposal to add number condensing isn’t implemented. We found a couple existing libraries that do a range of international formatting, but they either did more than we needed or were incompatible with our stack.

Ideally, we wanted to be able to take a number, like 3,000, and display an abbreviated version according to the audience’s locale. While 3,000 becomes 3K in English, it’s 3 mil in Portuguese, for example. Another consideration was different counting systems; India uses lakhs (1,00,000) and crores (1,00,00,000) instead of some Western increments like millions.

Through our research ahead of Hack Days, we stumbled across a treasure trove of international formatting data: the Unicode Common Locale Data Repository (CLDR). Unicode describes CLDR as the “largest and most extensive standard repository of locale data available.” It’s used by companies including Apple, Google, IBM, and Microsoft. It contains information about how to format dates, times, timezones, numbers, currencies, places and time periods. Most importantly, for Andy and I, it contained almost all the information we needed about abbreviating numbers. Once we combined that data with currency information from Intl.js, we were able to write a small set of functions to condense both numbers and currencies, according to locale.

Andy has more experience with open source packages than I do and he quickly realized our code would be useful to other developers. Since our solution could help across Shopify and beyond, we decided to open it up for others to use. In July 2018, we released our package, condense-number, on npm. If you have any international number formatting needs, we’d love for you to give it a try. If we’re missing a language or feature you’d like us to support, file an issue in the condense-number repository.
Intrigued? Shopify is hiring and we’d love to hear from you. Please take a look at our open positions on the Engineering career page.

Continue reading

Building a Data Table Component in React

Building a Data Table Component in React

I’m a front-end developer at Shopify, the leading commerce platform for over 600,000 merchants across the globe. I started in web development when the industry used tables for layout (nearly 20 years ago) and have learned my way through different web frameworks and platforms as web technology evolved. I now work on Polaris, Shopify’s design system that contains design guidelines, content guidelines, and a React component library that Shopify developers use to build the main platform and third-party app developers use to create apps on the App store.

When I started learning React its main advantage (especially for the component library of a design system) was obvious because everything in React is a component and intended to be reused. React props make it possible to choose which component attributes and behaviors to expose and which to hard-code. So, the design system can both standardize design while making customization easier.

But when it came to manipulating the DOM in React, I admit I initially felt frustrated because my background was heavy in jQuery. It’s easy to target an element in jQuery using a selector, pull a value from that element using a baked-in method, and then use another method to apply that value. My initial opinion was that React over-engineered DOM manipulation until I understood the bigger picture.

As developers, we tend to read more code than we write and I’ve inherited my fair share of legacy code. I’ve wasted many hours searching through jQuery files for that elusive piece of code that’s creating that darn animation I need to change. jQuery event listeners are often in different files than the files containing the markup of the elements they’re targeting, making it all too easy to hide the source of animations or style changes.

However, a React component controls its behavior, so you can predict exactly what it’s meant to do. There are no surprises because there is no indirection. It’s also easier to tear down event listeners in React, resulting in better performance.

The first component I worked on with the Polaris team was the data table component, and it helped me realize what makes React such a powerful library. React’s component approach made it easy to create a stateful data table component and a stateless functional cell subcomponent. Its built-in lifecycle methods also provided more control over when to re-render the data table's cell heights.

Here are the basic steps we took to build the Polaris data table component in React.

The Challenge

Building a good data table is a common design challenge most of us have had to solve at least once. By nature, a table has an inflexible grid shape with a nearly infinite potential to grow both vertically and horizontally, but it still needs to be flexible to work well on all screen sizes and orientations. The data table needs to fulfill a few requirements at once: it must be responsive, readable, contextual, and accessible.

Must Be Responsive

For a data table to fit all screen sizes and orientations, it needs to accommodate the potential for several columns of data that surpass the horizontal edges of the screen. Typically, responsive designs either stack or collapse elements at narrow widths, but these solutions break the grid structure of a data table, so it requires a different design solution.

Responsive Design Stacking
Responsive Design Stacking

Responsive Design Collapsing
Responsive Design Collapsing

Must Be Readable

A typical use case for a data table is presenting product data to a merchant who wants to see which of their products earned the most income. The purpose of the data table is to organize the information in a way that makes it easy for the merchant, in Shopify’s case, to compare and analyze— so proper alignment is important. A flexible data table solution can account for long strings of data without creating misalignment or compromising readability.

Must Be Contextual

A good experience for the user is a well-designed data table that provides context around the information, preventing the user from getting confused by seemingly random cell values. This means keeping headings visible at all times so that whichever data a user is seeing, it still has meaning.

Must Be Accessible

Finally, to accommodate users with screen readers a data table needs to have proper semantic markup and attributes.

Building a Data Table

Here’s how to create a stripped down version of the data table we built for Polaris using React (note: This post requires polaris-react. Polaris uses TypeScript, and for this example, I’ve used JavaScript). I’ve left out some features like a totals row, a footer, and sortable columns for the sake of simplicity.

Start With a Basic React Data Table

First, create a basic data table component that receives as props an array of headings and an array of rows. Map over these two arrays to extract cell content then break <Cell /> up into its subcomponent and pass content to it.

Basic Data Table Component
Basic Data Table Component

You can see the first problem in the image. With this many columns, the width of the table exceeds the screen width and scrolls the entire document horizontally, which isn’t ideal.

Basic Data Table Component Scrolling
Basic Data Table Component Scrolling

One way to handle a wide table is to collapse the columns and make them expandable, but this solution only works with a limited number of columns. Beyond a certain number, the collapsed width of each column still exceeds the total screen width, especially in portrait orientation. The columns are also awkward to expand and collapse, which is a poor experience for users. To solve this, restrict the width of the table.

Making it Responsive: Add Max-width

Wrap the entire table in a div element with max-width: 100vw and give the table itself width: 100%.

Unfortunately, this doesn’t work properly at very narrow screen widths when the cell content contains long words. The longest word forces the cell width to expand and pushes the table width beyond the screen’s right edge.

Basic Data Table Component - Max Width
Basic Data Table Component - Max Width

Sure, you can solve this with word-break: break-all, but that violates the design requirements to keep the data readable.


Basic Data Table Component - word-break: break-all
Basic Data Table Component - word-break: break-all

So, the next thing to do is force only the table to scroll instead of the entire document.

Making it Responsive and Readable: Create a Scroll Container

Wrap the table in a div element with overflow-x: auto to cause a scrolling behavior for the overflow content.

Scroll all the way right to the last column, and you see the next problem. The data is difficult to understand without the context of the first column, which are the product names in this example.

Basic Data Table Component - Missing First Column Context
Basic Data Table Component - Missing First Column Context

With several rows of data to compare, it’s difficult to remember which row corresponds to which product and repeatedly scrolling left and right is a terrible experience for the user. We chose to keep the first column visible at all times by fixing it in place and preventing it from scrolling along with the other columns as a solution.

Adding Context: Create a Fixed First Column

Give each cell in the first column an explicit width, then position them with position: absolute and left: 0. Then add margin-left: 145px to the remaining columns’ cells (the value must be equal to the width of the first column cells).

Add className=”Cell-fixed” to the first cell of each row. The component maps through each row (and not each column) so, for simplicity, we pass a boolean prop called fixed to the cell component. It’s set to true if the current item is first in the array being mapped over. The cell component then adds the class name Cell-fixed to the cell it renders if fixed is true.

Basic Data Table Component - Fixed Column
Basic Data Table Component - Fixed Column

Using an absolute position on each cell gives us a fixed first column, but creates another problem.

Basic Data Table Component- Fixed Column Issue
Basic Data Table Component- Fixed Column Issue

Typically, the DOM renders each cell height to match the height of the tallest cell in the same row, but this behavior breaks when the cells are positioned absolutely, so cell heights need to be adjusted manually.

Fixing a Bug: Adjust Cell Heights

Create a state variable called cellHeights.

Set a ref on the table element that calls a function called setTable.

Then write a function called getTallestCellHeights() that targets the table ref and creates an array of all of its <tr> elements, using getElementsByTagName.

Absolute positioning converts the fixed column to a block and breaks the natural behavior of the table, so the cell heights no longer adjust according to the height of the other cells in their row. To fix this, pull the clientHeight value from both the fixed cell and the remaining cells for each row in the array. Write a function that uses Math.max to find the highest number (the tallest height) of each cell in each row and return an array of those values.

Create a function called handleCellHeightResize() that calls getTallestCellHeights() to set the state of heights from the returned array.

The table needs to render first for the DOM to have clientHeight values to fetch, so place the call to handleCellHeightResize() in the componentDidMount() lifecycle method and re-render the component.

When mapping over the headings and rows arrays use the same index to target the correct value in the heights array to retrieve a height value for each <Cell /> and pass it as height prop. Because the heights array contains all heights and there are two separate calls to <Cell /> (one for headings and one for the table body) you need to increment the row index by 1 in renderRow() to skip the value for the headings cells.

We’re close now, and there’s one final bug to solve. The handleCellHeightResize() is called after the component is mounted and is never called again unless the page is refreshed. This means the height values for each cell remain the same even if the window is resized.


Set up an event listener and call the function any time the window is resized, so the cell heights readjust. In this example, I’ve used the event listener component already in Polaris.

Making it Accessible

Two important attributes make a data table accessible. Add a caption that a screen reader will read and a scope tag for each cell. For more details, the a11y project has an article about how to create accessible data tables.

A Responsive, Accessible Data Table Component
A Responsive, Accessible Data Table Component

And there you have it, a responsive, accessible data table component in React that can be used to compare and analyze a data set. Check out the full version of the Polaris React data table component in the Polaris component library that includes a full set of features like truncation, a totals row, sortable columns, and navigation.

We're always on the lookout for talent and we’d love to hear from you. Visit our Engineering career page to find out about our open positions.

Continue reading

Lost in Translations: Bringing the World to Shopify

Lost in Translations: Bringing the World to Shopify

At Shopify, the leading multi-channel commerce platform that powers over 600,000 businesses in approximately 175 countries, we aim at making commerce better for everyone, everywhere. Since Shopify’s early days, it’s been possible to provide customers with a localized translated experience, but merchants had to understand English to use the platform. Fortunately, things have started to change. For the past few months, my team and I focused on international expansion bringing new shipping patterns, new payment paradigms, compliance with local laws and much more to explore. However, the biggest challenge is preparing the platform for our translation efforts.

I speak French. Growing up, I learned that things have genders. A pencil is masculine, but a feather is feminine. A wall is a he, but a chair is a she. Even countries have genders, too — le Canada, but la France. It’s a construct of the language native speakers learned to deal with. It’s so natural, one can usually guess the gender of unknown or new things without even knowing what they are.

Did you know that in English, zero dictates the plural form? We’d say zero cars, car being plural. But in French, zero is always singular, as in, zéro voiture. Singular, no s. Why? I don’t know but each language has their quirks. Sometimes it might be obvious, like genders, or more subtle like a special pluralization rule.

Shopify employs hundreds of developers working on millions of lines of code. For the past twelve years, we collectively hardcoded thousands and thousands of English strings scattered across all our products oblivious to our future international growth. It would be great if we could simply replace words from one language with another, but unfortunately, differences like gender and pluralizations force us to rethink established patterns.

We had to educate ourselves, build new tools, and refactor entire parts of our codebase. We made mistakes, tried different things, and failed many times. But now, six months after we started, Shopify is available in a variety of languages. What you’ll find below is a small collection of thoughts and patterns that have helped us succeed.

Stop the Bleeding

The first step, like with any significant refactoring effort, is to stop the bleeding. At Shopify, we deploy hundreds of hardcoded English words daily. If we were to translate everything that exists today, we’d have to do it again tomorrow and again the day after because we’re always deploying new hardcoded words. As brilliantly explained by my colleague Simon Hørup Eskildsen, it’s unrealistic to think you can align everyone with an email or to fix everything with a single pull request.

Fortunately, Shopify relies on automated tooling (cops, linters, and tests) to communicate best practices and correct violations. It’s the perfect medium to tell developers about new patterns and guide them with contextual insights as they learn about new practices. We built cops and linters to detect a variety of violations:

  • Hardcoded strings in HTML files
  • Hardcoded strings in specific method arguments
  • Hardcoded date and time formats

How we built the cops and linters could be a post on its own, but the concept is what matters here: we knew a pattern to avoid, so we built tools to inform and correct. These tools gave developers a strong feedback loop, prevented the addition of new violations, and gave an estimate of the size of the task we had in front of us.

Automate the Mundane

Shopify has, relatively speaking, quite a big codebase. Due to our cops and linters, we build all new features with translation in mind. However, all the hard-coded content that existed before our intervention still had to be extracted and moved to dictionaries. So we did what any engineer would do; we built tools.

Linters made identifying violations easy. We ran them against every single file of our application and found a significant number of items in need of translation. After identification, we opted for the simplest approach; create a file named after the current module, move the actual content in there, and reference it through a key created from a combination of file path and the content itself. Slowly but surely, all the content was moved to dictionaries. The results weren’t perfect. There was duplicated content and the reference names weren’t always intuitive, but despite this, we extracted most of the basic and mundane stuff, like static content and documentation. What was left were edge cases like complex interpolations — I like to call them, fun challenges.

Pseudolocalization to the Rescue

Identifying the extracted content from everything else immediately became a challenging issue. Yes, some sentences were now in dictionaries, but the product looked exactly the same as before. We needed to distinguish between hardcoded and extracted content, all while keeping the product in a usable state so that translators, content writers, and product managers could stay informed about our progress. Enter pseudolocalization.

Pseudolocalization (or pseudo-localization, or pseudo-translation) is a software testing method used for examining internationalization aspects of software. Instead of translating the text of the software into a foreign language, as in the process of localization, an altered version of the original language replaces the textual elements of an application.

We created a new backend built on top of Rails I18n, the default Rails framework for internationalization, that hijacked all translation calls and swapped resulting characters with an altered yet similar alternative: a became α, b became ḅ, and so on.

Word lengths differ from one language to another. On average, German words are 30% longer and has the potential to seriously mess up a UI built without this knowledge. In French, a simple word like “Save” translates to “Sauvegarder”, which is almost 200% longer. Our pseudotranslation module intercepted all translation calls, so we took the opportunity to double all vowels in an attempt to mimic languages with longer words. The end result was a remarkable achievement in readability. We easily distinguished between content and performed visual testing on the UI against longer words.

Pseudotranslation in Action on Shopify
ASCII is Dead, Long Live UTF8

Character sets also prove to be a fun challenge. Shopify runs on MySQL. Unfortunately, MySQL’s default utf8 isn’t really UTF-8. It only stores up to three bytes per code point, which means no support for hentaigana, emoji, and other characters outside of the Basic Multilingual Plane. This means that unless explicitly told otherwise, most of our tables didn’t support emoticons characters, and thus needed migration.

On the application side, Rails isn’t perfect neither. Popular methods such as parameterize and ordinalize don’t come with international support built-in.

Identifying and fixing all of these broken behaviors wasn’t an easy task, and we’re still finding occurrences here and there. There is no secret sauce or real generic approach. Some bugs were fixed right away, others were simply deprecated, and some were only rolled out to new customers.

If anything, one trick to try is to introduce UTF8 characters in your fixtures and other data seeds. The more exposed you are to other character sets, the more likely you are to stumble on broken behavior.

Translation Platform

Preparing content for translation is one thing, but getting it actually translated is another. Now that everything was in dictionaries, we had to find a way for developers and product managers to request new translations and to talk to translators in a lean, simple, and automated way.

Managing translations isn’t part of our core expertise and other companies do this more elegantly than we ever could. Translators and other linguists rely on specialized tools that empower them with glossaries, memories, automated suggestions, and so on.

So, on one side of this equation, we have Github and our developers, and on the other are translators and their translation management system. Could GitHub’s API, coupled with our translation management system API help bridge the gap between developers and translators? We bet that it could.

Leveraging APIs from both sides, we built an internal tool called “Translation Platform”. It’s a simple and efficient way for developers and translators to collaborate in a streamlined and automated manner. The concept is quite simple; each repository defines their configuration file that indicates where to find the language files, what’s the source language, and what are the targeted languages. A basic example would look as follows:

Once the configuration file in place, the Translation Platform starts listening to Github’s webhooks and automatically detects if a change impacts one of the repository’s language file. If it does, it uses the translation management system API to issue a new translation request, one per targeted language. From a translator standpoint, the tool works similarly. It listens to the translation management system webhooks, detects when translations are ready and approved, then automatically creates a new commit or a new pull request with the newly translated content.

Shopify's Translation Platform

Translation Platform made gathering translations a seamless process, similar to running tests on our continuous integration environment. It gives us visibility of the entire flow while allowing us to gather logs, metrics, and data we can later use to provide SLAs and guarantees on translation requests. The simplicity of the Translation Platform was key to successfully introducing our new translation processes across the company.

Future Challenges

Localization challenges don’t stop with words. Every single UX element needs examination through an international lens. For example, shipping and payment are two concepts that vary significantly from one market to another. The iconography that accompanies them must acknowledge these differences and cultural gaps that may exist. A mailbox doesn’t look the same in Japan as it does in France. A credit card isn’t used as much in India as it is in North America.

Maps and geography represent another intriguing challenge. Centering a world map over Japan instead of the UK can go a long way with our Japanese merchants. The team needs to take special care of regions like Taiwan and Macau, which can lead to important conflicts if not labeled correctly, especially when what is considered “correct” changes depending on whom we ask.

Number formatting, addresses, and phone numbers are all things that change from one region or language to another. If something requires formatting for display purposes, the format will change with the country or the language.

We’re only at the beginning of our journey. The internationalization and globalization of a platform isn’t a small task but an ongoing effort. The same way our security experts never sleep, we expect to always be around, informing our peers about language specificities, market subtleties, and local requirements.

My name is Christian and I lead the engineering team responsible for internationalization and localization at Shopify. If these types of challenges are appealing to you, feel free to reach out to me on twitter or through our career page.

Continue reading

Mohammed Ridwanul Islam: How Mentorship, the T Model and a Pen Are the Keys to His Success

Mohammed Ridwanul Islam: How Mentorship, the T Model and a Pen Are the Keys to His Success

Mohammed Ridwanul Islam: How Mentorship, the T Model and a Pen Are the Keys to His Success
Mohammed’s feature is part of our series called Behind The Code, where we share the stories of our employees and how they’re solving meaningful problems at Shopify and beyond.

Mohammed Ridwanul is a software engineer on the Eventscale team and joined Shopify a year and a half ago.

Mohammed grew up in Dubai but was born in Noakhali, a small village in Bangladesh before moving when he was five. The village was far-removed from technology — most of the areas had no electricity, and you could count the number of TVs with one hand. The people of Noakhali were extremely practical and had ingenious solutions to the problems that would arise. Adults who had an engineering education or background were highly-regarded for how they improved the quality of life in the village. This inspired and motivated Mohammed to pursue a career in engineering, and he hopes eventually, to impact communities the way those individuals did to his.

What has your career path looked like?
I’ve had the opportunity to work in different industries including sales, advertising, and design. Also, I’m an avid musician and love making my own music and doing shows with my band. With all these different skills, I thought perhaps I could make my own game. While trying to learn everything I could about game development, I wrote my first line of code which was in C#.

All my experiences have one thing in common; I love to face tough challenges and see a rapid manifestation of the things I do or build. So, I studied engineering and got an internship working at Shopify during my undergrad which turned into my current full-time role.

What type of Engineering did you study?
I went to the University of Waterloo and took a Bachelor of Applied Sciences in Electrical Engineering.

What does your team do at Shopify?
The Eventscale team is part of the Data Platform Engineering organization. Shopify receives an immense amount of data. Acquiring such large amounts of data so that we can clean, process, reliably store, and provide easy access for analysis, requires highly performant specialized tools and infrastructure. The Data Platform Engineering team are responsible for building these tools.

The Eventscale team builds the tools, libraries, and infrastructure to collect event-oriented streaming data. This data is used for both internal and merchant analytics and other operational needs. We build for all platforms at Shopify including web, backend, and mobile.

What was something difficult for you to learn, and how did you go about acquiring it?
During my first time leading a team project, I had some challenges learning useful team management principles. Like understanding the needs of each team member, aligning everyone to a shared vision and goal to get the work done, required a different set of skills which took time and experience to learn. Luckily my senior co-workers consistently mentored me and taught me concepts such as project cost estimates, team management strategies, success metrics, and other fundamental project management principles. My team lead also guided me towards several books and whitepapers from other companies which have helped me develop strong opinions related to project management and strategy. Check out my Goodreads profile for a list of those books and read Ben Thomson's work on

How does your daily routine help you cultivate a good work ethic?
Mohammed Ridwanul Islam's Daily Routine
Habits, in my opinion, are useful in navigating life. I believe humans are creatures of habits; it’s challenging to have a constant cognitive load to tell yourself to do x, y and z tasks that are good for you. Instead, by building a habit, you reduce the load as your body and mind start to realize that this is a way of life. My daily routine helped me achieve this habit formation.

What’s your favorite dev tool?
VIM. It has a learning curve, but you can have so much fun with it once you learn it. VIM is an editor you can mold into your own little product; personalized for you with custom configurations using dotfiles. You can pretty much make it behave however you want. I love it! If you’re interested, feel free to check out my custom VIM settings.

What’s your favorite language and why?
Java, mostly because it’s a strongly typed language, and to this day I prefer explicitly defining types without having the language make assumptions on types.

Are you working on any side projects?
Yes, I’m working on an enterprise project management software that can be used by a consulting team to manage a large number of projects in parallel. Essentially, it’s a centralized repository for all the current projects that the consultants are handling, along with the cost breakdown and timeline details. Also, it allows the user to dig into each project further and keep records of how human resources are applied. The software tries to enforce a framework of thinking about resource management and project strategy which I have developed over the years.

What are some ways you think through challenging work?
Writing things down on paper has been my go-to method to work through challenging things. I don’t start writing code until I’ve designed the overall larger components on paper. Similarly, for any other situations in life, writing has always helped me tackle challenges.

What book(s) are you currently reading?
Designing Data-Intensive Applications by Martin Kleppmann and The Essential Rumi by Rumi.

What is the best career advice you’ve gotten?
It doesn’t matter what you do as long as it meets two criteria: 1. It positively impacts society and is aligned with your values, and 2. It allows you to push and grow yourself by doing work to the best of your abilities.

What kind of advice would you give to someone trying to break into the technology industry?
I’m a big fan of the “T” model of learning, which essentially states that you should try and be competent in a few different things (small horizontal line), but you should strive to be the authoritative figure for at least one thing (longer vertical line). Programming might be the tool used to solve tough engineering problems, but the ability to solve problems is the more critical skill. So focus on chiseling that ability which comes with exposure and specialization in one specific area.

If you’d like to get in touch with Mohammed, check out his website

We’re hiring! If you’re interested in joining our team, check out our Engineering career page for available opportunities.

Continue reading

Dev Degree - A Big Bet on Software Education

Dev Degree - A Big Bet on Software Education

“Tell me and I forget, teach me and I may remember, involve me and I learn.”
- Benjamin Franklin

When I decided to study computer science at university, my parents were skeptical. They didn’t know anyone who had chosen this as a career. Computer science was, and still is, in its infancy. Software development isn’t pure science or pure engineering — it’s a combination of the two, mixed with a remarkable amount of artistic flare. It's a profession where you grow by learning the theory and then doing. A lot of doing. It’s a profession that’s increasingly in demand. And it’s a profession so new that schools are still learning how to teach it. The supply isn’t matching the demand; not even close.

Our industry is fraught with critical shortages of skills and diversity — software developers are more valuable to companies than money [1]. It’s pretty obvious, we have to aggressively invest in growing and developing software professionals more than ever.

Shopify has figured out an important part of how to solve these problems. We call it Dev Degree — a work-integrated learning (WIL) program that combines an accredited university degree with double the experience of a traditional co-op. The program is already in its 3rd year, and it’s time to talk about why it’s a big deal to us.

The Beginnings of Dev Degree

While living and working in Australia, my company invested in hiring hundreds of graduate developers. The graduates were intelligent and knew their theory, but they lacked the fundamental skills and experience required for software development. This held them back in making quick impacts to our small but growing company.

To fill in the gaps, we developed an internal training program for new graduates. It helped them level up faster and transitioned best practices they learned in school into practical skills for the world of software development. It wasn’t long before I recognized that this knowledge gap wasn't an isolated incident. There wasn’t just one university churning out students ill-prepared for the workforce, it was a systemic issue.

I decided to tour Australian universities and talk to their Computer Science departments. I pitched the idea of adding pieces of our training program to their curriculum to better prepare students for their careers. My company even offered to pay to develop the program. The universities loved the idea, but they didn't know how to make it a reality within their academic frameworks. I saw many nods of agreement on that tour, but no action.

Dev Degree started, in earnest, when I returned to Canada and joined Shopify. The main lesson I learned from Australia was that universities couldn’t implement a WIL curriculum without industry partners in a true long-term arrangement. Shopify seemed born to step into that role. When I approached Tobi with this embryo of an idea, he was on board to make it a reality. Tobi had his own positive experience with apprenticeships in Germany. Our shared passion for software development and Canada motivated us to give this idea another shot, and we started searching for a university partner.

Canadian universities were eager to get involved, but again, most weren’t sure how to make it happen. For many, the question was: how is this different from our co-op program?

The co-op model is straightforward. Students alternate between a school term and a work term throughout their program. In this structure, students are thrown over the wall of academia into an industry with no connection to their curriculum. WIL, on the other hand, requires a structural change to the education system that creates a fully integrated and deep learning experience for the students. To do this properly, we needed to make changes to the curriculum and assessments, fully integrate universities and companies, launch new learning programs, and provide additional student support. This was a multi-dimensional problem.

Carleton University rose to the challenge, becoming the first and founding university partner of Dev Degree. Their team understood the value of WIL and were already exploring ways to incorporate this style of learning when we met. It was clear to both sides that we had found the perfect partner to make WIL a successful reality. We were both eager to innovate and weren’t afraid to make huge structural changes to our operations.

Carleton didn’t just talk about being involved, they developed an entirely separate stream of their Bachelor of Computer Science program that allocated over 20% of credits to student practicums. This required Carleton’s Senate approval, which was granted after thoughtful debate. Our first strong partnership was formed and we were ready to get started.

Inside Dev Degree

The Dev Degree FamilyThe Dev Degree Family

The core of the Dev Degree model is building tighter feedback loops between theory and practice while layering programming and personal growth skills early on. Each semester students take 3 courses at University and spend 25 hours a week at Shopify.

Because K-12 software education is lacking, we wanted to turbo-boost students to be able to write and deploy production software, solving real problems, before they even graduate. Our bet was that this model would better engage a more diverse set of students, empower deeper understanding, and foster more critical thought when building software.

Dev Degree - Hand-On Learning

These types of challenges are not part of the university curriculum — students can only get this experience in an industry setting. Thomas Edison said innovation is 1% inspiration and 99% perspiration. By that measure, Dev Degree is a real-time training program in experimental perspiration.

But there’s also a strong link to validating that competencies are acquired. The partner university allocates at least 20% of the degrees credits for their work done with Shopify development teams. Students write a practicum report at the end of every term (every four months) and submit the practicum report to the university. In the practicum, the student describes how they have achieved specific learning outcomes. The learning outcomes used in the Dev Degree program were influenced by standards from the Association for Computing Machinery (ACM) and the IEEE Computer Society.

During the first two years, we learned a lot. It wasn’t a smooth ride as we ironed out how best to deliver this program with the University, Students, and teams in Shopify. Here are some of the most important lessons we’ve learned.

Key Lesson #1: Re-Learn True Collaboration

During our school career, we learn that the final mark is most important. We strive to deliver the perfect assignment to get that A+. This is the complete opposite of how to get good results in the real world. The best students, and the most successful people, are the ones who share their ideas early, get feedback, experiment, explore, re-compose, and iterate. They embrace failure and keep trying.

The end result is important, but you have to cheat to get the best version of it. Sounds counterintuitive, I know. But by “cheating,” I mean asking people for help and incorporating the lessons they teach you into your own work. Collaboration is a prerequisite for true learning and growth. The Lone Wolf mentality instilled in students from years of schooling is more difficult to change than we anticipated, but working directly alongside other developers, pairing regularly, allowed us to break down those habits over time.

Key Lesson #2: Start with Development Skills

Our first cohort joined Shopify after three months of Developer Skills Training, based on the ACM framework I mentioned. This was quite ambitious on our end, but we hoped it was enough time to prepare them for the real-world work they would do with our teams.

It wasn’t. After the three months, our students still didn’t have enough knowledge to make a strong impact at Shopify. To better support them, our Dev Degree team hosted additional workshops on various developer tools and technologies to get them up to speed, but we knew there was more to be done.

It was clear that we needed to pivot the first year of our program to focus more heavily on Developer Skills Training. Our students needed to be better prepared to enter a fast-paced team building impactful products. Now, Dev Degree students participate in Developer Skills Training for their entire first year at Shopify. By tripling the amount of time they spend in training, we’ve seen Dev Degree students create earlier and more positive impacts on Shopify teams.

Key Lesson #3: Mentorship Comes in Many Forms

In 2016, students were paired with technical mentors once they joined a development team. The technical mentor is a software developer who guides their mentee on a daily basis by giving direction, reviewing work, offering feedback, and answering questions. While this was successful, we identified a gap where we weren’t equipping students with the tools and support they needed to transition into the workforce. We were giving them tons of technical support, but that didn’t necessarily help them conquer the social aspects of the job.

Now, Dev Degree students receive an additional layer of mentorship. Each student is paired with two people: a technical mentor and a Life@Shopify mentor. The Life@Shopify mentor is a trusted supporter, friend, and guide who provides a listening ear and supports the student’s growth. It’s a big leap to go from high school to being a trusted member of a company. We’ve found that this combination provides students with a diverse range of support throughout their time at Shopify.

The Results

To put it bluntly, the Dev Degree model works.

We see above average retention rates compared to traditional academia. Generally, 20-50% of students dropout of their initial program or from postsecondary programs completely. In Dev Degree, our retention rate is 95%. We’ve increased gender diversity in the program, with women accounting for over 50% of Shopify Dev Degree students — a dramatic rise from the 19% of women graduating with a computer science degree.

Companies have been focusing 66% of their philanthropic tech education on K-12 programs, with only 3% on post-secondary programs. But we need to look at the entire education system to solve the skills shortage and lack of diversity in STEM programs. And it needs to happen faster.

Traditionally, new graduates hired at Shopify take anywhere from six months to two years to fully complete onboarding and start making an impact on development teams. Skill acquisition in our WIL program happens three times faster than the average developer education: Dev Degree students become productive members of their teams after only nine months into the program, instead of up to two years after graduation.

We have a lot more to learn, and we’re not done yet. While we’re excited by our early results, a true measure of success will be seeing more universities and industry partners adopt this model. We’re working to scale the program with our partners so that the Dev Degree model starts popping up all over Canada.

That’s why we’re excited to announce the expansion of our Dev Degree program to York University’s Lassonde School of Engineering! Our first Toronto-based students have started their journey with Dev Degree, and we’re excited to see what challenging problems they’ll solve.

None of this would be possible without our academic partners at Carleton and York who worked relentlessly to get Senate approval for new WIL computer science streams and design the model itself. We truly believe that if more universities worked hand-in-hand with industry to better prepare students for the workforce, Canada would become the leader in talent development for years to come.

Continue reading

Introducing the Deprecation Toolkit

Introducing the Deprecation Toolkit

Shopify is happy to announce that we’ve open sourced the Deprecation Toolkit, a ruby gem that keeps track of deprecations in your codebase in an efficient way.

At Shopify, the leading cloud-based, multi-channel commerce platform with 600,000+ merchants in over 175 countries, upgrading our dependencies is a frequently applied best practice. We even have bots that automatically upgrade dependencies when a minor version is released.

However, more complex upgrades require human intervention and the time required varies from dependency to dependency, some even taking years. We realized that we could speed up this process if our application were using as little deprecated code as possible.

The motivation for building the Deprecation Toolkit came after a few unsuccessful attempts to prevent the hundreds of developers working on our monolith from accidentally using deprecated code in libraries, but also in our codebase.

Why Should You Use This Gem and How Can It Help?

Did I just called a new deprecated method? “Did I just call a new deprecated method?“ 🤔

If you are the creator/maintainer of a library or if you’d like to deprecate methods in your application, you have couple options to notify consumers of your code about a future API change. The most common option is to output a warning message on the standard output explaining the change happening in the next release.

This approach has a major caveat: it doesn’t prevent developers from using the deprecated code by accident. The only warning is the deprecation message, which is very easy to miss and becomes impossible to spot if there is already a lot of them.

The second option is to provide a callback mechanism whenever a deprecation is triggered. If you are familiar with Ruby on Rails or Active Support you might have heard about the ActiveSupport::Deprecation module which allows you to configure the behavior of your choice that gets called whenever a deprecation is triggered. Active Support provides few behavior options by default, the two most common ones are log or raise.

Raising an error when deprecated code is triggered looked like a solution, but it would mean we’d have to fix every single deprecation before activating the configuration; otherwise, our CI wouldn’t pass and that would block developers from doing their daily tasks. We needed a different way to solve this problem that didn’t require fixing all deprecations at once and treat existing deprecations as “acceptable” allowing us time to fix those gradually. New deprecations, however, should be handled differently and be the one that raises errors. This is the approach we took with the Deprecation Toolkit.

Internally, we called this process the “Shitlist-driven development.” My colleague Flo gave an amazing talk at the Red Dot Ruby Conference in 2017 you can view called "Shitlist-driven development and other tricks for working on large codebases."

How Does It Work?

Introducing the Deprecation Toolkit

The Deprecation Toolkit uses a whitelist approach. First, you need to record all existing deprecations in your application by running your test suites, either locally or on CI. The toolkit writes each deprecation that gets triggered for a given test inside YAML files. These YAML files will consist of your whitelist of acceptable deprecations.

The next time your tests run, the toolkit will compare all the deprecations that got triggered in the test run against the ones marked as acceptable. If a mismatch is found it either means a deprecation was introduced or removed, either way, the Deprecation Toolkit will trigger the behavior of your choice, but by default, it’ll raise an error.

The toolkit has many configuration options, however, if the default configuration suits your needs, all you need to do is add the gem in your Gemfile. The Deprecation Toolkit README has a detailed configuration reference to help you setup the toolkit in the way you need. You can, for example, configure the toolkit to ignore some deprecations, dynamically determine where deprecations should be recorded, or even create custom behaviors when new deprecations are introduced.

Deprecation Toolkit in ActionDeprecation Toolkit in Action

Keeping your system free of deprecations is part of having a sane codebase, whether that's fixing deprecations from libraries or your codebase. We’ve used the Deprecation Toolkit in our core application for about a year now. It helped us to reduce the number of deprecations in our system significantly and contributed towards speeding up our dependencies upgrade process. It’s instrumental in making every developer involved in fixing deprecations as Pull Requests can’t be merged if the code is introducing new deprecations.

Last but not least, we gamified fixing existing deprecations amongst developers. All deprecations were grouped by component and assigned an owner, usually a team lead, to help fix them. Over time, we counted the failures and progression of each team. All participating teams viewed their results in a shared Google sheet. Splitting the deprecated code into chunks and assigning each one to a different owner made the process super smooth and even faster.

Give the Deprecation Toolkit a try; we are looking forward to hearing if it helped you and how we can improve it! If the current workflow doesn’t work for you or if you’d like to see a new feature in this gem, feel free to open an issue in our issue tracker.

Continue reading

Start your free 14-day trial of Shopify