Stop trying to do it all alone, add Kit to your team. Learn more.
Lessons From Building Android Widgets

Lessons From Building Android Widgets

By Matt Bowen, James Lockhart, Cecilia Hunka, and Carlos Pereira

When the new widget announcement was made for iOS 14, our iOS team went right to work designing an experience to leverage the new platform. However, widgets aren’t new to Android and have been around for over a decade. Shopify cares deeply about its mobile experience and for as long as we’ve had the Shopify mobile app, both our Android and iOS teams ship every feature one-to-one in unison. With the spotlight now on iOS 14, this was a perfect time to revisit our offering on Android.

Since our contribution was the same across both platforms, just like our iOS counterparts at the time, we knew merchants were using our widgets, but they needed more.

Why Widgets are Important to Shopify

Our widgets mainly focus on analytics that help merchants understand how they’re doing and gain insights to make better decisions quickly about their business. Monitoring metrics is a daily activity for a lot of our merchants, and on mobile, we have the opportunity to give merchants a faster way to access this data through widgets. They provide merchants a unique avenue to quickly get a pulse on their shops that isn’t available on the web.

Add Insights widget
Add Insights widget

After gathering feedback and continuously looking for opportunities to enhance our widget capabilities, we’re at our third iteration, and we’ll share with you the challenges we faced and how we solved them.

Why We Didn’t Use React Native

A couple of years ago Shopify decided to go full on React Native. New development should be done in React Native, and we’re also migrating some apps to the technology. This includes the flagship admin app, which is the companion app to the widgets.

Then why not write the widgets in React Native?

After doing some initial investigation, we quickly hit some roadblocks (like the fact that RemoteViews are the only way to create widgets). There’s currently no official support in the React Native community for RemoteViews, which is needed to support widgets. This felt very akin to a square peg in a round hole. Our iOS app also ran into issues supporting React Native, and we were running down the same path. Shopify believes in using the right tool for the job, we believe that native development was the right call in this case.

Building the Widgets

When building out our architecture for widgets, we wanted to create a consistent experience on both Android and iOS while preserving platform idioms where it made sense. In the sections below, we want to give you a view of our experiences building widgets. Pointing out some of the more difficult challenges we faced. Our aim is to shed some light around these less used surfaces, hopefully give some inspiration, and save time when it comes to implementing widgets in your applications.

Fetching Data

Some types of widgets have data that change less frequently (for example, reminders) and some that can be forecasted for the entire day (for example, calendar and weather). In our case, the merchants need up-to-date metrics about their business, so we need to show data as fresh as possible. Delays in data can cause confusion, or even worse, delay information that could change an action. Say you follow the stock market, you expect the stock app and widget data to be as up to date as possible. If the data is multiple hours stale, you may have missed something important! For our widgets to be valuable, we need information to be fresh while considering network usage.

Fetching Data in the App

Widgets can be kept up to date with relevant and timely information by using data available locally or fetching it from a server. The server fetching can be initiated by the widget itself or by the host app. In our case, since the app doesn’t need the same information the widget needed, we decided it would make more sense to fetch it from the widget. 

One benefit to how widgets are managed in the Android ecosystem over iOS is the flexibility. On iOS you have limited communication between the app and widget, whereas on Android there doesn’t seem to be the same restrictions. This becomes clear when we think about how we configure a widget. The widget configuration screen has access to all of the libraries and classes that our main app does. It’s no different than any other screen in the app. This is mostly true with the widget as well. We can access the resources contained in our main application, so we don’t need to duplicate any code. The only restrictions in a widget come with building views, which we’ll explore later.

When we save our configuration, we use shared preferences to persist data between the configuration screen and the widget. When a widget update is triggered, the shared preferences data for a given widget is used to build our request, and the results are displayed within the widget. We can read that data from anywhere in our app, allowing us to reuse this data in other parts of our app if desired.

Making the Widgets Antifragile

The widget architecture is built in a way that updates are mindful of battery usage, where updates are controlled by the system. In the same way, our widgets must also be mindful of saving bandwidth when fetching data over a network. While developing our second iteration, we came across a peculiar problem that was exacerbated by our specific use case. Since we need data to be fresh, we always pull new data from our backend on every update. Each update is approximately 15 minutes apart to avoid having our widgets stop updating. What we found is that widgets call their update method onUpdate(), more than once in an update cycle. In widgets like calendar, these extra calls come without much extra cost as the data is stored locally. However, in our app, this was triggering two to five extra network calls for the same widget with the same data in quick succession.

In order to correct the unnecessary roundtrips, we built a simple short-lived cache:

  1. The system asks our widget to provide new data from Reportify (Shopify’s data service)
  2. We first look into the local cache using the widgetID provided by the system.
  3. If there’s data, and that data was set less than one minute ago, we return it and avoid making a network request. We also take into account configuration such as locale so as not to avoid forcing updates after a language change.
  4. Otherwise, we fetch the data as normal and store it in the cache with the timestamp.
A flow diagram highlighting the steps of the cache
The simple short-lived cache flow

With this solution, we reduced unused network calls and system load and avoided collecting incorrect analytics.

Implementing Decoder Strategy with Dynamic Selections

We follow a similar approach as we have on iOS. We create a dynamic set of queries based on what the merchant has configured.

For each metric we have a corresponding definition implementation. This approach allows each metric the ability to have complete flexibility around what data it needs, and how it decodes the data from the response.

When Android asks us to update our widgets, we pull the merchants selection from our configuration object. Since each of the metric IDs has a definition, we map over them to create a dynamic set of queries.

We include an extension on our response object that binds the definitions to a decoder. Our service sends back an array of the response data corresponding to the queries made. We map over the original definitions, decoding each chunk to the expected return type.

Building the UI

Similar to iOS, we support three widget sizes for versions prior to Android 12 and follow the same rules for cell layout, except for the small widget. The small widget on Android supports a single metric (compared to the two on iOS) and the smallest widget size on Android is a 2x1 grid. We quickly found that only a single metric would fit in this space, so this design differs slightly between the platforms.

Unlike iOS with swift previews, we were limited to XML previews and running the widget on the emulator or device. We’re also building widgets dynamically, so even XML previews were relatively useless if we wanted to see an entire widget preview. Widgets are currently on the 2022 Jetpack Compose roadmap, so this is likely to change soon with Jetpack composable previews.

With the addition of dynamic layouts in Android 12, we created five additional sizes to support each size in between the original three. These new sizes are unique to Android. This also led to using grid sizes as part of our naming convention as you can see in our WidgetLayout enum below.

For the structure of our widget, we used an enum that acts as a blueprint to map the appropriate layout file to an area of our widget. This is particularly useful when we want to add a new widget because we simply need to add a new enum configuration.

To build the widgets dynamically, we read our configuration from shared preferences and provide that information to the RemoteViews API.

If you’re familiar with the RemoteViews API, you may notice the updateView() method, which is not a default RemoteViews method. We created this extension method as a result of an issue we ran into while building our widget layout in this dynamic manner. When a widget updates, the new remote views get appended to the existing ones. As you can probably guess, the widget didn’t look so great. Even worse, more remote views get appended on each subsequent update. We found that combining the two RemoteViews API methods removeAllViews() and addView() solved this problem.

Once we build our remote views, we then pass the parent remote view to the AppWidgetProvider updateAppWidget() method to display the desired layout.

It’s worth noting that we attempted to use partiallyUpdateAppWidget() to stop our remote views from appending to each other, but encountered the same issue.

Using Dynamic Dates

One important piece of information on our widget is the last updated timestamp. It helps remove confusion by allowing the merchants to quickly know how fresh the data is they are looking at. If the data is quite stale (say you went to the cottage for the weekend and missed a few updates) and there wasn’t a displayed timestamp, you would assume the data you’re looking at is up to the second. This can cause unnecessary confusion for our merchants. The solution here was to ensure there’s some communication to our merchant when the last update was made.

In our previous design, we only had small widgets, and they were able to display only one metric. This information resulted in a long piece of text that, on smaller devices, would sometimes wrap and show over two lines. This was fine when space was abundant in our older design but not in our new data rich designs. We explored how we could best work with timestamps on widgets, and the most promising solution was to use relative time. Instead of having a static value such as “as of 3:30pm” like our previous iteration. We would have a dynamic date that would look like: “1 min, 3 sec ago.”

One thing to remember is that even though the widget is visible, we have a limited number of updates we can trigger. Otherwise, it would be consuming a lot of unnecessary resources on the device. We knew we couldn’t keep triggering updates on the widget as often as we wanted. Android has a strategy for solving this with TextClock. However, there’s no support for relative time, which wouldn’t be useful in our use case. We also explored using Alarms, but potentially updating every minute would consume too much battery. 

One big takeaway we had from these explorations was to always test your widgets under different conditions. Especially low battery or poor network. These surfaces are much more restrictive than general applications and the OS is much more likely to ignore updates.

We eventually decided that we wouldn’t use relative time and kept the widget’s refresh time as a timestamp. This way we have full control over things like date formatting and styling.

Adding Configuration

Our new widgets have a great deal of configuration options, allowing our merchants to choose exactly what they care about. For each widget size, the merchant can select the store, a certain number of metrics and a date range. This is the only part of the widget that doesn’t use RemoteViews, so there aren’t any restrictions on what type of View you may want to use. We share information between the configuration and the widget via shared preferences.

Insights widget configuration
Insights widget configuration

Working with Charts and Images

Android widgets are limited to RemoteViews as their building blocks and are very restrictive in terms of the view types supported. If you need to support anything outside of basic text and images, you need to be a bit creative.

Our widgets support both a sparkline and spark bar chart built using the MPAndroidCharts library. We have these charts already configured and styled in our main application, so the reuse here was perfect; except, we can’t use any custom Views in our widgets. Luckily, this library is creating the charts via drawing to the canvas, and we simply export the charts as a bitmap to an image view. 

Once we were able to measure the widget, we constructed a chart of the required size, create a bitmap, and set it to a RemoteView.ImageView. One small thing to remember with this approach, is that if you want to have transparent backgrounds, you’ll have to use ARGB_8888 as the Bitmap Config. This simple bitmap to ImageView approach allowed us to handle any custom drawing we needed to do. 

Eliminating Flickering

One minor, but annoying issue we encountered throughout the duration of the project is what we like to call “widget flickering.” Flickering is a side-effect of the widget updating its data. Between updates, Android uses the initialLayout from the widget’s configuration as a placeholder while the widget fetches its data and builds its new RemoteViews. We found that it wasn’t possible to eliminate this behavior, so we implemented a couple of strategies to reduce the frequency and duration of the flicker.

The first strategy is used when a merchant first places a widget on the home screen. This is where we can reduce the frequency of flickering. In an earlier section “Making the Widgets Antifragile,” we shared our short-lived cache. The cache comes into play because the OS will trigger multiple updates for a widget as soon as it’s placed on the home screen. We’d sometimes see a quick three or four flickers, caused by updates of the widget. After the widget gets its data for the first time, we prevent any additional updates from happening for the first 60 seconds, reducing the frequency of flickering.

The second strategy reduces the duration of a flicker (or how long the initialLayout is displayed). We store the widgets configuration as part of shared preferences each time it’s updated. We always have a snapshot of what widget information is currently displayed. When the onUpdate() method is called, we invoke a renderEarlyFromCache() method as early as possible. The purpose of this method is to build the widget via shared preferences. We provide this cached widget as a placeholder until the new data has arrived.

Gathering Analytics

Largest Insights widget in light mode
Largest widget in light mode

Since our first widgets were developed, we added strategic analytics in key areas so that we could understand how merchants were using the functionality.  This allowed us to learn from the usage so we could improve on them. The data team built dashboards displaying detailed views of how many widgets were installed, the most popular metrics, and sizes.

Most of the data used to build the dashboards came from analytics events fired through the widgets and the Shopify app.

For these new widgets, we wanted to better understand adoption and retention of widgets, so we needed to capture how users are configuring their widgets over time and which ones are being added or removed.

Detecting Addition and Removal of Widgets 

Unlike iOS, capturing this data in Android is straight-forward. To capture when a merchant adds a widget, we send our analytical event when the configuration is saved. When removing a widget, the widgets built-in onDeleted method gives us the widget ID of the removed widget. We can then look up our widget information in shared preferences and send our event prior permanently deleting the widget information from the device. 

Supporting Android 12

When we started development of our new widgets, our application was targeting Android 11. We knew we’d be targeting Android 12 eventually, but we didn’t want the upgrade to block the release. We decided to implement Android 12 specific features once our application targeted the newer version, leading to an unforeseen issue during the upgrade process with widget selection.

Our approach to widget selection in previous versions was to display each available size as an option. With the introduction of responsive layouts, we no longer needed to display each size as its own option. Merchants can now pick a single widget and resize to their desired layout. In previous versions, merchants can select a small, medium, and large widget. In versions 12 and up, merchants can only select a single widget that can be resized to the same layouts as small, medium, large, and several other layouts that fall in between. This pattern follows what Google does with their large weather widget included on devices, as well as an example in their documentation. We disabled the medium and small widgets in Android 12 by adding a flag to our AndroidManifest and setting that value in attrs.xml for each version:

The approach above behaves as expected, the medium and small widgets were now unavailable from the picker. However, if a merchant was already on Android 12 and had added a medium or large widget before our widget update, those widgets were removed from their home screen. This could easily be seen as a bug and reduce confidence in the feature. In retrospect, we could have prototyped what the upgrade would have looked like to a merchant who was already on Android 12.

Allowing only the large widget to be available was a data-driven decision. By tracking widget usage at launch, we saw that the large widget was the most popular and removing the other two would have the least impact on current widget merchants.

Building New Layouts

We encountered an error when building the new layouts that fit between the original small, medium and large widgets.

After researching the error, we were exceeding the Binder transaction buffer. However, the buffer’s size is 1mb and the error displayed .66mb, which wasn’t exceeding the documented buffer size. The error has appeared to stump a lot of developers. After experimenting with ways to get the size down, we found that we could either drop a couple of entire layouts or remove support for a fourth and fifth row of the small metric. We decided on the latter, which is why our 2x3 widget only has three rows of data when it has room for five.

Rethinking the Configuration Screen

Now that we have one widget to choose from, we had to rethink what our configuration screen would look like to a merchant. Without our three fixed sizes, we could no longer display a fixed number of metrics in our configuration. 

Our only choice was to display the maximum number of metrics available for the largest size (which is seven at the time of this writing). Not only did this make the most sense to us from a UX perspective, but we also had to do it this way because of how the new responsive layouts work. Android has to know all of the possible layouts ahead of time. Even if a user shrinks their widget to a size that displays a single metric, Android has to know what the other six are, so it can be resized to our largest layout without any hiccups.

We also updated the description that’s displayed at the top of the configuration screen that explains this behavior.

Capturing More Analytics

On iOS, we capture analytical data when a merchant reconfigures a widget to gain insights into usage patterns. Reconfiguration for Android was only possible in version 12 and due to the limitations of the AppWidgetProvider’s onAppWidgetOptionsChanged() method, we were unable to capture this data on Android.

I’ll share more information about building our layouts in order to give context to our problem. Setting breakpoints for new dynamic layouts works very well across all devices. Google recommends creating a mapping of your breakpoints to the desired remote view layout. To build on a previous example where we showed the buildWidgetRemoteView() method, we used this method again as part of our breakpoint mapping. This approach allows us to reliably map our breakpoints to the desired widget layout.

When reconfiguring or resizing a widget, the onAppWidgetOptionsChanged() method is called. This is where we’d want to capture our analytical data about what had changed. Unfortunately,  this view mapping doesn’t exist here. We have access to width and height values that are measured in dp, initially appearing to be useful. At first, we felt that we could discern these measurements into something meaningful and map these values back to our layout sizes. After testing on a couple of devices, we realized that the approach was unreliable and would lead to a large volume of bad analytical data. Without confidently knowing what size we are coming from, or going to, we decided to omit this particular analytics event from Android. We hope to bring this to Google’s attention, and get it included in a future release.

Shipping New Widgets

Already having a pair of existing widgets, we had to decide how to transition to the new widgets as they would be replacing the existing implementation.

We didn’t find much documentation around migrating widgets. The docs only provided a way to enhance your widget, which means adding the new features of Android 12 to something you already had. This wasn’t applicable to us since our existing widgets were so different from the ones we were building.

The major issue that we couldn’t get around was related to the sizing strategies of our existing and new widgets. The existing widgets used a fixed width and height so they’d always be square. Our new widgets take whatever space is available. There wasn’t a way to guarantee that the new widget would fit in the available space that the existing widget had occupied. If the existing widget was the same size as the new one, it would have been worth exploring further.

The initial plan we had hoped for, was to make one of our widgets transform into the new widget while removing the other one. Given the above, this strategy would not work.

The compromise we came to, so as to not completely remove all of a merchant’s widgets overnight, was to deprecate the old widgets at the same time we release the new one. To deprecate, we updated our old widget’s UI to display a message informing that the widget is no longer supported and the merchant must add the new ones.

Screen displaying the notice: This widget is no longer active. Add the new Shopify Insights widget for an improved view of your data. Learn more.
Widget deprecation message

There’s no way to add a new widget programmatically or to bring the merchant to the widget picker by tapping on the old widget. We added some communication to help ease the transition by updating our help center docs, including information around how to use widgets, pointing our old widgets to open the help center docs, and just giving lots of time before removing the deprecation message. In the end, it wasn’t the most ideal situation and we came away learning about the pitfalls within the two ecosystems.

What’s Next

As we continue to learn about how merchants use our new generation of widgets and Android 12 features, we’ll continue to hone in on the best experience across both our platforms. This also opens the way for other teams at Shopify to build on what we’ve started and create more widgets to help Merchants. 

On the topic of mobile only platforms, this leads us into getting up to speed on WearOS. Our WatchOS app is about to get a refresh with the addition of Widgetkit; it feels like a great time to finally give our Android Merchants watch support too!

Matt Bowen is a mobile developer on the Core Admin Experience team. Located in West Warwick, RI. Personal hobbies include exercising and watching the Boston Celtics and New England Patriots.

James Lockhart is a Staff Developer based in Ottawa, Canada. Experiencing mobile development over the past 10+ years: from Phonegap to native iOS/Android and now React native. He is an avid baker and cook when not contributing to the Shopify admin app.

Cecilia Hunka is a developer on the Core Admin Experience team at Shopify. Based in Toronto, Canada, she loves live music and jigsaw puzzles.

Carlos Pereira is a Senior Developer based in Montreal, Canada, with more than a decade of experience building native iOS applications in Objective-C, Swift and now React Native. Currently contributing to the Shopify admin app.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Lessons From Building iOS Widgets

Lessons From Building iOS Widgets

By Carlos Pereira, James Lockhart, and Cecilia Hunka

When the iOS 14 beta was originally announced, we knew we needed to take advantage of the new widget features and get something to our merchants. The new widgets looked awesome and could really give merchants a way to see their shop’s data at a glance without needing to open our app.

Fast forward a couple of years, and we now have lots of feedback from the new design. We knew merchants were using them, but they needed more. The current design was lacking and only provided two metrics—also, they took up a lot of space. This experience prompted us to start a new project. To upgrade our original design to better fit our merchant’s needs.

Why Widgets Are Important to Shopify

Our widgets mainly focus on analytics. Analytics can help merchants understand how they’re doing and gain insights to make better decisions quickly about their business. Monitoring metrics is a daily activity for a lot of our merchants, and on mobile, we have the opportunity to give merchants a faster way to access this data through widgets. As widgets provide access to “at a glance” information about your shop and allow merchants a unique avenue to quickly get a pulse on their shops that they wouldn’t find on desktop.

A screenshot showing the add widget screen for Insights on iOS
Add Insights widget

After gathering feedback and continuously looking for opportunities to enhance our widget capabilities, we’re at our third iteration, and we’ll share with you how we approached building widgets and some of the challenges we faced.

Why We Didn’t Use React Native

A couple years ago Shopify decided to go all in on React Native. New development was done in React Native and we began migrating some apps to the new stack. Including our flagship admin app, where we were building our widgets. Which posed the question, should we write the widgets in React Native?

After doing some investigation we quickly hit some roadblocks: app extensions are limited in terms of memory, WidgetKit’s architecture is highly optimized to work with SwiftUI as the view hierarchy is serialized to disk, there’s also, at this time, no official support in the React Native community for widgets.

Shopify believes in using the right tool for the job, we believe that native development with SwiftUI was the best choice in this case.

Building the Widgets

When building out our architecture for widgets, we wanted to create a consistent experience on both iOS and Android while preserving platform idioms where it made sense. Below we’ll go over our experience and strategies building the widgets, pointing out some of the more difficult challenges we faced. Our aim is to shed some light around these less talked about surfaces, give some inspiration for your projects, and hopefully, save time when it comes to implementing your widgets.

Fetching Data

Some types of widgets have data that change less frequently (for example, reminders) and some that can be forecasted for the entire day (for example, Calendar and weather). In our case, the merchants need up-to-date metrics about their business, so we need to show data as fresh as possible. Time for our widget is crucial. Delays in data can cause confusion, or even worse, delay information that could inform a business decision. For example, let’s say you watch the stocks app. You would expect the stock app and its corresponding widget data to be as up to date as possible. If the data is multiple hours stale, you could miss valuable information for making decisions or you could miss an important drop or rise in price. With our product, our merchants need as up to date information as we can provide them to run their business.

Fetching Data in the App

Widgets can be kept up to date with relevant and timely information by using data available locally or fetching it from a server. The server fetching can be initiated by the widget itself or by the host app. In our case, since the app doesn’t share the same information as the widget, we decided it made more sense to fetch it from the widget.

We still consider moving data fetching to the app once we start sharing similar data between widgets and the app. This architecture could simplify the handling of authentication, state management, updating data, and caching in our widget since only one process will have this job rather than two separate processes. It’s worth noting that the widget can access code from the main app, but they can only communicate data through keychain and shared user defaults as widgets run on separate processes. Sharing the data fetching, however, comes with an added complexity of having a background process pushing or making data available to the widgets, since widgets must remain functional even if the app isn’t in the foreground or background. For now, we’re happy with the current solution: the widgets fetch data independently from the app while sharing the session management code and tokens.

A flow diagram highlighting widgets fetch data independently from the app while sharing the session management code and tokens
Current solution where widgets fetch data independently

Querying Business Analytics Data with Reportify and ShopifyQL

The business data and visualizations displayed in the widgets are powered by Reportify, an in-house service that exposes data through a set of schemas queried via ShopifyQL, Shopify’s Commerce data querying language. It looks very similar to SQL but is designed around data for commerce. For example, to fetch a shops total sales for the day:

Making Our Widgets Antifragile

iOS Widgets' architecture is built in a way that updates are mindful of battery usage and are budgeted by the system. In the same way, our widgets must also be mindful of saving bandwidth when fetching data over a network. While developing our second iteration we came across a peculiar problem that was exacerbated by our specific use case.

Since we need data to be fresh, we always pull new data from our backend on every update. Each update is approximately 15 minutes apart to avoid having our widgets stop updating (which you can read about why on Apple’s Developer site). We found that iOS calls the update methods, getTimeline() and getSnapshot(), more than once in an update cycle. In widgets like calendar, these extra calls come without much extra cost as the data is stored locally. However, in our app, this was triggering two to five extra network calls for the same widget with the same data in quick succession.

We also noticed these calls were causing a seemingly unrelated kick out issue affecting the app. Each widget runs on a different process than the main application, and all widgets share the keychain. Once the app requests data from the API, it checks to see if it has an authenticated token in the keychain. If that token is stale, our system pauses updates, refreshes the token, and continues network requests. In the case of our widgets, each widget call to update was creating another workflow that could need a token refresh. When we only had a single widget or update flow, it worked great! Even four to five updates would usually work pretty well. However, eventually one of these network calls would come out of order and an invalid token would get saved. On our next update, we have no way to retrieve data or request a new token resulting in a session kick out. This was a great find as it was causing a lot of frustration for our affected merchants and ourselves, who could never really put our finger on why these things would, every now and again, just log us out.

In order to correct the unnecessary roundtrips, we built a simple short-lived cache:

  1. The system asks our widget to provide new data
  2. We first look into the local cache using a key specific to that widget. On iOS, our key is produced from a configuration for that widget as there’s no unique identifiers provided. We also take into account configuration such as locale so as not to avoid forcing updates after a language change.
  3. If there’s data, and that data was set less than one minute ago, we return it and avoid making a network request. 
  4. Otherwise, we fetch the data as normal and store it in the cache with the timestamp.
      A flow diagram highlighting the steps of the cache
      The simple short-lived cache flow

      With this solution, we reduced unused network calls and system load, avoided collecting incorrect analytics, and fixed a long running bug with our app kick outs!

      Implementing Decoder Strategy with Dynamic Selections

      When fetching the data from the Analytics REST service, each widget can be configured with two to seven metrics from a total of 12. This set should grow in the future as new metrics are available too! Our current set of metrics are all time-based and have a similar structure.

      But that doesn’t mean the structure of future metrics will not change. For example, what about a metric that contains data that isn’t mapped over a time range? (like orders to fulfill, which does not contain any historical information).

      The merchant is also able to configure the order the metrics appear, which shop (if they have more than one shop), and which date range represents the data: today, last 7 days, and last 30 days.

      We had to implement a data fetching and decoding mechanism that:

      • only fetches the data the merchant requested in order to avoid asking for unneeded information
      • supports a set of metrics as well as being flexible to add future metrics with different shapes
      • supports different date ranges for the data.

      A simplified version of the solution is shown below. First, we create a struct to represent the query to the analytics service (Reportify).

      Then, we create a class to represent the decodable response. Right now it has a fixed structure (value, comparison, and chart values), but in the future we can use an enum or different subclasses to decode different shapes.

      Next, we create a response wrapper that attempts to decode the metrics based on a list of metric types passed to it. Each metric has its configuration, so we know which class is used to read the values.

      Finally, when the widget Timeline Provider asks for new data, we fetch the data from the current metrics and decode the response. 

      Building the UI

      We wanted to support the three widget sizes: small, medium, and large. From the start we wanted to have a single View to support all sizes as an attempt to minimize UI discrepancies and make the code easy to maintain.

      We started by identifying the common structure and creating components. We ended up with a Metric Cell component that has three variations:

      A metric cell from a widget
      A metric cell
      A metric cell from a widget
      A metric cell with a sparkline
      A metric cell from a widget
      A metric cell with barchart

      All three variations consist of a metric name and value, chart, and a comparison. As the widget containers become bigger, we show the merchant more data. Each view size contains more metrics, and the largest widget contains a full width chart on the first chosen metric. The comparison indicator also gets shifted from bottom to right on this variation.

      The first chosen metric, on the large widget, is shown as a full width cell with a bar chart showing the data more clearly; we call it the Primary cell. We added a structure to indicate if a cell is going to be used as primary or not. Besides the primary flag, our component doesn’t have any context about the widget size, so we use chart data as an indicator to render a cell primary or not. This paradigm fits very well with SwiftUI.

      A simplified version of the actual Cell View:

      After building our cells, we need to create a structure to render them in a grid according to the size and metrics chosen by the merchant. This component also has no context of the widget size, so our layout decisions are mainly based on how many metrics we are receiving. In this example, we’ll refer to the View as a WidgetView.

      The WidgetView is initialized with a WidgetState, a struct that holds most of the widget data such as shop information, the chosen metrics and their data, and a last updated string (which represents the last time the widget was updated).

      To be able to make decisions on layout based on the widget characteristics, we created an OptionSet called LayoutOption. This is passed as an array to the WidgetView.

      Layout options:

      That helped us not to tie this component to Widget families, rather to layout characteristics that makes this component very reusable in other contexts.

      The WidgetView layout is built using mainly a LazyVGrid component:

      A simplified version of the actual View:

      Adding Dynamic Dates

      One important piece of information on our widget is the last updated timestamp. It helps remove confusion by allowing merchants too quickly know how fresh the data is they’re looking at. Since iOS has an approximate update time with many variables, coupled with data connectivity, it’s very possible the data could be over 15 minutes old. If the data is quite stale (say you went to the cottage for the weekend and missed a few updates) and there was no update string, you would assume the data you’re looking at is up to the second. This can cause unnecessary confusion for our merchants. The solution here was to ensure there’s some communication to our merchant when the last update was.

      In our previous design, we only had small widgets, and they were able to display only one metric. This information resulted in a long string, that on smaller devices, would sometimes wrap and show over two lines. This was fine when space was abundant in our older design but not in our new data rich designs. We explored how we could best work with timestamps on widgets, and the most promising solution was to use relative time. Instead of having a static value such as “as of 3:30pm” like our previous iteration, we would have a dynamic date that would look like: “1 min, 3 sec ago.”

      One thing to remember is that even though the widget is visible, we have a limited number of updates we can trigger. Otherwise, it would be consuming a lot of unnecessary resources on the merchant’s device. We knew we couldn’t keep triggering updates on the widget as often as we wanted (nor would it be allowed), but iOS has ways to deal with this. Apple did release support for dynamic text on widgets during our development that allowed using timers on your widgets without requiring updates. We simply need to pass a style to a Text component and it automatically keeps everything up to date:

      Text("\(now, style: .relative) ago")

      It was good, but we have no options to customize the relative style. Being able to customize the relative style was an important point for us, as the current supported style does not fit well with our widget layout. One of our biggest constraints with widgets is space as we always need to think about the smallest widget possible. In the end we decided not to move forward with the relative time approach, and kept a reduced version of our previous timestamp.

      Adding Configuration

      Our new widgets have a great amount of configuration, allowing for merchants to choose exactly what they care about. For each widget size, the merchant can select the store, a certain number of metrics, and a date range. On iOS, widgets are configured through the SiriKit Intents API. We faced some challenges with the WidgetConfiguration, but fortunately, all had workarounds that fit our use cases.

      Insights widget configuration
      Insights widget configuration

      It’s Not Possible to Deselect a Metric

      When defining a field that has multiple values provided dynamically, we can limit the number of options per widget family. This was important for us, since each widget size has a different number of metrics it can support. However, the current UI on iOS for widget configuration only allows selecting a value but not deselecting it. So, once we selected a metric we couldn’t remove it, only update the selection. But what if the merchant were only interested in one metric on the small widget? We solved this with a small design change, by providing “None” as an option. If the merchant were to choose this option, it would be ignored and shown as an empty state. 

      It's not possible to validate the user selections

      With the addition of “None” and the way intents are designed, it was possible to select all “None” and have a widget with no metrics. In addition, it was possible to select the same metric twice.. We would like to be able to validate the user selection, but the Intents API didn't support it. The solution was to embrace the fact that a widget can be empty and show as an empty state. Duplicates were filtered out so any more than a single metric choice was changed to “None” before we sent any network requests.

      The First Calls to getTimeline and getSnapshot Don’t Respect the Maximum Metric Count

      For intent configurations provided dynamically, we must provide default values in the IntentHandler. In our case, the metrics list varies per widget family. In the IntentHandler, it’s not possible to query which widget family is being used. So we had to return at least as many metrics as the largest widget (seven). 

      However, even if we limit the number of metrics per family, the first getTimeline and getSnapshot calls in the Timeline Provider were filling the configuration object with all default metrics, so a small widget would have seven metrics instead of two!

      We ended up adding some cleanup code in the beginning of the Timeline Provider methods that trims the list depending on the expected number of metrics.

      Optimizing Testing

      Automated tests are a fundamental part of Shopify’s development process. In the Shopify app, we have a good amount of unit and snapshot tests. The old widgets on Android had good test coverage already, and we built on the existing infrastructure. On iOS, however, there were no tests since it’s currently not possible to add test targets against a widget extension on Xcode.

      Given this would be a complex project and we didn’t want to compromise on quality, we investigated possible solutions for it.

      The simplest solution would be to add each file on both the app and in the widget extension targets, then we could unit test it in the app side in our standard test target. We decided not to do this since we would always need to add a file to both targets, and it would bloat the Shopify app unnecessarily.

      We chose to create a separate module (a framework in our case) and move all testable code there. Then we could create unit and snapshot tests for this module.

      We ended up moving most of the code, like views and business logic, to this new module (WidgetCore), while the extension only had WidgetKit specific code and configuration like Timeline provider, widget bundle, and intent definition generated files.

      Given our code in the Shopify app is based on UIKit, we did have to update our in-house snapshot testing framework to support SwiftUI views. We were very happy with the results. We ended up achieving a high test coverage, and the tests flagged many regressions during development.

      Fast SwiftUI Previews 

      The Shopify app is a big application, and it takes a while to build. Given the widget extension is based on our main app target, it took a long time to prepare the SwiftUI previews. This caused frustration during development. It also removed one of the biggest benefits of SwiftUI—our ability to iterate quickly with Previews and the fast feedback cycle during UI development.

      One idea we had was to create a module that didn’t rely on our main app target. We created one called WidgetCore where we put a lot of our reusable Views and business logic. It was fast to build and could also render SwiftUI previews. The one caveat is, since it wasn’t a widget extension target, we couldn’t leverage the WidgetPreviewContext API to render views on a device. It meant we needed to load up the extension to ensure the designs and changes were always working as expected on all sizes and modes (light and dark).

      To solve this problem, we created a PreviewLayout extension. This had all the widget sizes based on the Apple documentation, and we were able to use it in a similar way:

      Our PreviewLayout extension would be used on all of our widget related views in our WidgetCore module to emulate the sizes in previews:

      Acquiring Analytics

      Shopify Insights widget largest size in light mode
      Largest widget in light mode

      Since our first widgets were developed, we wanted to understand how merchants are using the functionality, so we can always improve it. The data team built some dashboards showing things like a detailed view of how many widgets installed, the most popular metrics, and sizes.

      Most of the data used to build the dashboards come from analytics events fired through the widgets and the Shopify app.

      For the new widgets, we wanted to better understand adoption and retention of widgets, so we needed to capture how users are configuring their widgets over time and which ones are being added or removed.

      Managing Unique Ids

      WidgetKit has the WidgetCenter struct that allows requesting information about the widgets currently configured in the device through the getCurrentConfigurations method. However, the list of metadata returned (WidgetInfo) doesn’t have a stable unique identifier. Its identifier is the object itself, since it’s hashable. Given this constraint, if two identical widgets are added, they’ll both have the same identifier. Also, given the intent configuration is part of the id, if something changes (for example, date range) it’ll look like it’s a totally different widget.

      Given this limitation, we had to adjust the way we calculate the number of unique widgets. It also made it harder to distinguish between different life-cycle events (adding, removing, and configuring). Hopefully there will be a way to get unique ids for widgets in future versions of iOS. For now we created a single value derived from the most important parts of the widget configuration.

      Detecting, Adding, and Removing Widgets 

      Currently there’s no WidgetKit life cycle method that tells us when a widget was added, configured, or removed. We needed it so we can better understand how widgets are being used.

      After some exploration, we noticed that the only methods we could count on were getTimeline and getSnapshot. We then decided to build something that could simulate these missing life cycle methods by using the ones we had available. getSnapshot is usually called on state transitions and also on the widget Gallery, so we discarded it as an option.

      We built a solution that did the following

      1. Every time the Timeline providers’ getTimeline is called, we call WidgetKit’s getCurrentConfigurations to see what are the current widgets installed.
      2. We then compare this list with a previous snapshot we persist on disk.
      3. Based on this comparison we try to guess which widgets were added and removed.
      4. Then we triggered the proper life cycle methods: didAddWidgets(), didRemoveWidgets().

      Due to identifiers not being stable, we couldn’t find a reliable approach to detect configuration changes, so we ended up not supporting it.

      We also noticed that WidgetKit.getCurrentConfigurations’s results can have some delay. If we remove a widget, it may take a couple getTimeline calls for it to be reflected. We adjusted our analytics scheme to take that into account.

      Detecting, adding, and removing widgets solution
      Current solution

      Supporting iOS 16

      Our approach to widgets made supporting iOS 16 out of the gate really simple with a few changes. Since our lock screen complications will surface the same information as our home screen widgets, we can actually reuse the Intent configuration, Timeline Provider, and most of the views! The only change we need to make is to adjust the supported families to include .accessoryInline, .accessoryCircular, and .accessoryRectangular, and, of course, draw those views.

      Our Main View would also just need a slight adjustment to work with our existing home screen widgets.

      Migrating Gracefully

      WidgetKit was introduced for watchOS complications in iOS 16. This update comes with a foreboding message from Apple:

      As soon as you offer a widget-based complication, the system stops calling ClockKit APIs. For example, it no longer calls your CLKComplicationDataSource object’s methods to request timeline entries. The system may still wake your data source for migration requests.

      We really care about our apps at Shopify, so we really needed to unpack what this meant, and how does this affect our merchants running older devices? With some testing on devices, we were able to find out, everything is fine.

      If you’re currently running WidgetKit complications and add support for lock screen complications, your ClockKit app and complications will continue to function as you’d expect.

      What we had assumed was that WidgetKit itself was taking the place of WatchOS complications; however, to use Widgetkit on WatchOS, you need to create a new target for the Watch. This makes sense, although the APIs are so similar we had assumed it was a one and done approach. One WidgetKit extension for both platforms.

      One thing to watch out for,  if you do implement the new WidgetKit on WatchOS, if your users are on WatchOS 9 and above will lose all of their complications from ClockKit. Apple did provide a migration API to support the change that’s called instead of your old complications.

      If you don’t have the luxury of just setting your target to iOS 16, your complications will continue to load up for those on WatchOS 8 and below from our testing.

      Shipping New Widgets

      We already had a set of widgets running on both platforms, now we had to decide how to transition to the new update as they would be replacing the existing implementation. On iOS we had two different widget kinds each with their own small widget (you can think of kinds as a widget group). With the new implementation, we wanted to provide a single widget kind that offered all three sizes. We didn’t find much documentation around the migration, so we simulated what happens to the widgets under different scenarios.

      If the merchant has a widget on their home screen and the app updates, one of two things would happen:

      1. The widget would become a white blank square (the kind IDs matched).
      2. The widget just disappeared altogether (the kind ID was changed).

      The initial plan (we had hoped for) was to make one of our widgets transform into the new widget while removing the other one. Given the above, this strategy wouldn’t work. This also includes some annoying tech debt since all of our Intent files would continue to mention the name of the old widget.

      The compromise we came to, so as to not completely remove all of a merchant’s widgets overnight, was to deprecate the old widgets at the same time we release the new ones. To deprecate, we updated our old widget’s UI to display a message informing that the widget is no longer supported, and the merchant must add the new ones. The lesson here is you have to be careful when you make decisions around widget grouping as it’s not easy to change.

      There’s no way to add a new widget programmatically or to bring the merchant to the widget gallery by tapping on the old widget. We also added some communication to help ease the transition by:

        • updating our help center docs, including information around how to use widgets 
        • pointing our old widgets to open the help center docs 
        • giving lots of time before removing the deprecation message.

        In the end, it wasn’t the most ideal situation, and we came away learning about the pitfalls within the two ecosystems. One piece of advice is to really reflect on current and future needs when defining which widgets to offer and how to split them, since a future modification may not be straightforward.

        Screen displaying the notice: This widget is no longer active. Add the new Shopify Insights widget for an improved view of your data. Learn more.
        Widget deprecation message

        What’s Next

        As we continue to learn about how merchants use our new generation of widgets, we’ll continue to hone in on the best experience across both our platforms. Our widgets were made to be flexible, and we’ll be able to continually grow the list of metrics we offer through our customization. This work opens the way for other teams at Shopify to build on what we’ve started and create more widgets to help Merchants too.

        2022 is a busy year with iOS 16 coming out. We’ve got a new WidgetKit experience to integrate to our watch complications, lock screen complications, and live activities hopefully later this year!

        Carlos Pereira is a Senior Developer based in Montreal, Canada, with more than a decade of experience building native iOS applications in Objective-C, Swift and now React Native. Currently contributing to the Shopify admin app.

        James Lockhart is a Staff Developer based in Ottawa, Canada. Experiencing mobile development over the past 10+ years: from Phonegap to native iOS/Android and now React native. He is an avid baker and cook when not contributing to the Shopify admin app.

        Cecilia Hunka is a developer on the Core Admin Experience team at Shopify. Based in Toronto, Canada, she loves live music and jigsaw puzzles.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        What is a Full Stack Data Scientist?

        What is a Full Stack Data Scientist?

        At Shopify, we've embraced the idea of full stack data science and are often asked, "What does it mean to be a full stack data scientist?". The term has seen a recent surge in the data industry, but there doesn’t seem to be a consensus on a definition. So, we chatted with a few Shopify data scientists to share our definition and experience.

        What is a Full Stack Data Scientist?

        "Full stack data scientists engage in all stages of the data science lifecycle. While you obviously can’t be a master of everything, full stack data scientists deliver high-impact, relatively quickly because they’re connected to each step in the process and design of what they’re building." - Siphu Langeni, Data Scientist

        "Full stack data science can be summed up by one word—ownership. As a data scientist you own a project end-to-end. You don't need to be an expert in every method, but you need to be familiar with what’s out there. This helps you identify what’s the best solution for what you’re solving for." - Yizhar (Izzy) Toren, Senior Data Scientist

        Typically, data science teams are organized to have different data scientists work on singular aspects of a data science project. However, a full stack data scientist’s scope covers a data science project from end-to-end, including:

        • Discovery and analysis: How you collect, study, and interpret data from a number of different sources. This stage includes identifying business problems.
        • Acquisition: Moving data from diverse sources into your data warehouse.
        • Data modeling: The process for transforming data using batch, streaming, and machine learning tools.

        What Skills Make a Successful Full Stack Data Scientist? 

        "Typically the problems you're solving for, you’re understanding them as you're solving them. That’s why you need to be constantly communicating with your stakeholders and asking questions. You also need good engineering practices. Not only are you responsible for identifying a solution, you also need to build the pipeline to ship that solution into production." - Yizhar (Izzy) Toren, Senior Data Scientist

        "The most effective full stack data scientists don't just wait for ad hoc requests. Instead, they proactively propose solutions to business problems using data. To effectively do this, you need to get comfortable with detailed product analytics and developing an understanding of how your solution will be delivered to your users." - Sebastian Perez Saaibi, Senior Data Science Manager

        Full stack data scientists are generalists versus specialists. As full stack data scientists own projects from end-to-end, they work with multiple stakeholders and teams, developing a wide range of both technical and business skills, including:

        • Business acumen: Full stack data scientists need to be able to identify business problems, and then ask the right questions in order to build the right solution.
        • Communication: Good communication—or data storytelling—is a crucial skill for a full stack data scientist who typically helps influence decisions. You need to be able to effectively communicate your findings in a way that your stakeholders will understand and implement. 
        • Programming: Efficient programming skills in a language like Python and SQL are essential for shipping your code to production.
        • Data analysis and exploration: Exploratory data analysis skills are a critical tool for every full stack data scientist, and the results help answer important business questions.
        • Machine learning: Machine learning is one of many tools a full stack data scientist can use to answer a business question or solve a problem, though it shouldn’t be the default. At Shopify, we’re proponents of starting simple, then iterating with complexity.  

        What’s the Benefit of Being a Full Stack Data Scientist? 

        “You get to choose how you want to solve different problems. We don't have one way of doing things because it really depends on what the problem you’re solving for is. This can even include deciding which tooling to use.”- Yizhar (Izzy) toren, Senior Data Scientist

        “You get maximum exposure to various parts of the tech stack, develop a confidence in collaborating with other crafts, and become astute in driving decision-making through actionable insights.” - Siphu Langeni, Data Scientist

        As a generalist, is a full stack data scientist a “master of none”? While full stack data scientists are expected to have a breadth of experience across the data science specialty, each will also bring additional expertise in a specific area. At Shopify, we encourage T-shaped development. Emphasizing this type of development not only enables our data scientists to hone skills they excel at, but it also empowers us to work broadly as a team, leveraging the depth of individuals to solve complex challenges that require multiple skill sets. 

        What Tips Do You Have for Someone Looking to Become a Full Stack Data Scientist? 

        “Full stack data science might be intimidating, especially for folks coming from academic backgrounds. If you've spent a career researching and focusing on building probabilistic programming models, you might be hesitant to go to different parts of the stack. My advice to folks taking the leap is to treat it as a new problem domain. You've already mastered one (or multiple) specialized skills, so look at embracing the breadth of full stack data science as a challenge in itself.” - Sebastian Perez Saaibi, Senior Data Science Manager

        “Ask lots of questions and invest effort into gathering context that could save you time on the backend. And commit to honing your technical skills; you gain trust in others when you know your stuff!” - Siphu Langeni, Data Scientist

        To sum it up, a full stack data scientist is a data scientist who:

        • Focuses on solving business problems
        • Is an owner that’s invested in an end-to-end solution, from identifying the business problem to shipping the solution to production
        • Develops a breadth of skills that cover the full stack of data science, while building out T-shaped skills
        • Knows which tool and technique to use, and when

        If you’re interested in tackling challenges as a full stack data scientist, check out Shopify’s career page.

        Continue reading

        Managing React Form State Using the React-Form Library

        Managing React Form State Using the React-Form Library

        One of Shopify’s philosophies when it comes to adopting a new technology isn’t only to level up the proficiency of our developers so they can implement the technology at scale, but also with the intent of sharing their new found knowledge and understanding of the tech with the developer community.

        In Part 1 (Building a Form with Polaris) of this series, we were introduced to Shopify’s Polaris Design System, an open source library used to develop the UI within our Admin and here in Part 2 we’ll delve further into Shopify’s open source Quilt repo that contains 72 npm packages, one of which is the react-form library. Each package was created to facilitate the adoption and standardization of React and each has its own README and thorough documentation to help get you started.

        The react-form Library

        If we take a look at the react-form library repo we can see that it’s used to:

        “Manage React forms tersely and safely-typed with no effort using React hooks. Build up your form logic by combining hooks yourself, or take advantage of the smart defaults provided by the powerful useForm hook.”

        The useForm and useField Hooks

        The documentation categorizes the API into three main sections: Hooks, Validation, and Utilities. There are eight hooks in total and for this tutorial we’ll focus our attention on just the ones most frequently used: useForm and useField.

        useForm is a custom hook for managing the state of an entire form and makes use of many of the other hooks in the API. Once instantiated, it returns an object with all of the fields you need to manage a form. When combined with useField, it allows you to easily build forms with smart defaults for common use cases. useField is a custom hook for handling the state and validations of an input field.

        The Starter CodeSandbox

        As this tutorial is meant to be a step-by-step guide providing all relevant code snippets along the way, we highly encourage you to fork this Starter CodeSandbox so you can code along throughout the tutorial.

        If you hit any roadblocks along the way, or just prefer to jump right into the solution code, here’s the Solution CodeSandbox.

        First Things First—Clean Up Old Code

        The useForm hook creates and manages the state of the entire form which means we no longer need to import or write a single line of useState in our component, nor do we need any of the previous handler functions used to update the input values. We still need to manage the onSubmit event as the form needs instructions as to where to send the captured input but the handler itself is imported from the useForm hook.

        With that in mind let’s remove all the following previous state and handler logic from our form.

        React isn’t very happy at the moment and presents us with the following error regarding the handleTitleChange function not being defined.

        handleTitleChange is not defined

        This occurs because both TextField Components are still referencing their corresponding handler functions that no longer exist. For the time being, we’ll remove both onChange events along with the value prop for both Components.

        Although we’re removing them at the moment, they’re still required as per our form logic and will be replaced by the fields object provided by useForm.

        React still isn’t happy and presents us with the following error in regards to the Page Component assigning onAction to the handleSubmit function that’s been removed.

        handleSubmit is not defined

        It just so happens that the useForm hook provides a submit function that does the exact same thing, which we’ll destructure in the next section. For the time being we’ll assign submit to onAction and place it in quotes so that it doesn’t throw an error.

        One last and final act of cleanup is to remove the import for useState, at the top of the file, as we’ll no longer manage state directly.

        Our codebase now looks like the following:

        Importing and Using the useForm and useField Hooks

        Now that our Form has been cleaned up, let’s go ahead and import both the useForm and useField hooks from react-form. Note, for this tutorial the shopify/react-form library has already been installed as a dependency.

        import { useForm, useField } from "@shopify/react-form”;

        If we take a look at the first example of useForm in the documentation, we can see that useForm provides us quite a bit of functionality in a small package. This includes several properties and methods that can be instantiated, in addition to accepting a configuration object that’s used to define the form fields and an onSubmit function.

        In order to keep ourselves focused on the basic functionality, we start with only capturing the inputs for our two fields, title and description, and then handle the submission of the form.  We’ll also pass in the configuration object, assigning useField() to each field, and lastly, an onSubmit function.

        Since we previously removed the value and onChange props from our TextField components the inputs no longer capture nor display text. They both worked in conjunction where onChange updated state, allowing value to display the captured input once the component re-rendered. The same functionality is still required but those props are now found in the fields object, which we can easily confirm by adding a console.log and viewing the output:

        If we do a bit more investigation and expand the description key, we see all of its additional properties and methods, two of which are onChange and value.

        With that in mind, let’s add the following to our TextField components:

        It’s clear from the code we just added that we’re destructuring the fields object and assigning the key that corresponds to the input’s label field. We should also be able to type into the inputs and see the text updated. The field object also contains additional properties and methods such as reset and dirty that we’ll make use of later when we connect our submit function.

        Submitting the Form

        With our TextField components all set up, it’s time to enable the form to be submitted. As part of the previous clean up process, we updated the Page Components onAction prop and now it’s time to remove the quotes.

        Now that we’ve enabled submission of the form, let’s confirm that the onSubmit function works and take a peek at the fields object by adding a console log.

        Let’s add a title and description to our new product and click Save.

        Adding a Product Screen with a Title and Description fields
        Adding A Product

        We see the following output:

        More Than Just Submitting

        When we reviewed the useForm documentation earlier we made note of all the additional functionality that it provides, two of which we will make use of now: reset and dirty.

        Reset the Form After Submission

        reset is a method and is used to clear the form, providing the user with a clean slate to add additional products once the previous one has been saved. reset should be called only after the fields have been passed to the backend and the data has been handled appropriately, but also before the return statement.

        If you input some text and click Save, our form should clear the input fields as expected.

        Conditionally Enable The Save Button

        dirty is used to disable the Save button until the user has typed some text into either of the input fields. The Page component manages the Save button and has a disabled property that we assign the value of !dirty because its value is set to false when imported, so we need to change that to true. 

        You should now notice that the Save button is disabled until you type into either of the fields, at which point Save is enabled 

        We can also validate that it’s now disabled by examining the Save button in developer tools.

        Developer Tools screenshot showing the code disabling the save button.
        Save Button Disabled

        Form Validation

        What we might have noticed when adding dirty, is that if the user types into either field the Save button is immediately enabled. One last aspect of our form is that we’ll require the Title field to contain some input before being allowed to submit the product. To do this we’ll import the notEmpty hook from react-form.

        Assigning it also requires that we now pass useField a configuration object that includes the following keys: value and validates. The value key keeps track of the current input value and validates provides us a means of validating input based on some criteria.

        In our case, we’ll prevent the form from being submitted if the title field is empty and provide the user an error message indicating that it’s a required field.

        Let’s give it a test run and confirm it’s working as expected. Try adding only a description and then click Save.

        Add Product screen with Title and Description fields. Title field is empty and showing error message “field is required.”
        Field is required error message

        As we can see our form implements all the previous functionality and then some, all of which was done via the useForm and useField hooks. There’s quite a bit more functionality that these specific hooks provide, so I encourage you to take a deeper dive and explore them further.

        This tutorial was meant to introduce you to Shopify’s open source react-form library that’s available in Shopify’s public Quilt repo. The repo provides many more useful React hooks such as the following to name a few:

        • react-graphql: for creating type-safe and asynchronous GraphQL components for React
        • react-testing:  for testing React components according to our conventions
        • react-i18n: which includes i18n utilities for handling translations and formatting.

        Joe Keohan is a Front End Developer at Shopify, located in Staten Island, NY and working on the Self Serve Hub team enhancing the buyer’s experience. He’s been instructing software engineering bootcamps for the past 6 years and enjoys teaching others about software development. When he’s not working through an algorithm, you’ll find him jogging, surfing and spending quality time with his family.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Leveraging Go Worker Pools to Scale Server-side Data Sharing

        Leveraging Go Worker Pools to Scale Server-side Data Sharing

        Maintaining a service that processes over a billion events per day poses many learning opportunities. In an ecommerce landscape, our service needs to be ready to handle global traffic, flash sales, holiday seasons, failovers and the like. Within this blog, I'll detail how I scaled our Server Pixels service to increase its event processing performance by 170%, from 7.75 thousand events per second per pod to 21 thousand events per second per pod.

        But First, What is a Server Pixel?

        Capturing a Shopify customer’s journey is critical to contributing insights into marketing efforts of Shopify merchants. Similar to Web Pixels, Server Pixels grants Shopify merchants the freedom to activate and understand their customers’ behavioral data by sending this structured data from storefronts to marketing partners. However, unlike the Web Pixel service, Server Pixels sends these events through the server rather than client-side. This server-side data sharing is proven to be more reliable allowing for better control and observability of outgoing data to our partners’ servers. The merchant benefits from this as they are able to drive more sales at a lower cost of acquisition (CAC). With regional opt-out privacy regulations built into our service, only customers who have allowed tracking will have their events processed and sent to partners.  Key events in a customer’s journey on a storefront are captured such as checkout completion, search submissions and product views. Server Pixels is a service written in Golang which validates, processes, augments, and consequently, produces more than one billion customer events per day. However, with the management of such a large number of events, problems of scale start to emerge.

        The Problem

        Server Pixels leverages Kafka infrastructure to manage the consumption and production of events. We began to have a problem with our scale when an increase in customer events triggered an increase in consumption lag for our Kafka’s input topic. Our service was susceptible to falling behind events if any downstream components slowed down. Shown in the diagram below, our downstream components process (parse, validate, and augment) and produce events in batches:

        Flow diagram showing the flow of information from Storefronts to Kafka Consumer to the Batch Processor to the Unlimited Go Routines to the Kafka Producer and finally to Third Party Partners.
        Old design

        The problem with our original design was that an unlimited number of threads would get spawned when batch events needed to be processed or produced. So when our service received an increase in events, an unsustainable number of goroutines were generated and ran concurrently.

        Goroutines can be thought of as lightweight threads that are functions or methods that run concurrently with other functions and threads. In a service, spawning an unlimited number of goroutines to execute increasingly growing tasks on a queue is never ideal. The machine executing these tasks will continue to expend its resources, like CPU and memory, until it reaches its limit. Furthermore, our team has a service level objective (SLO) of five minutes for event processing, so any delays in processing our data would exceed our processing beyond its timed deadline. In anticipation of three times the usual load for BFCM, we needed a way for our service to work smarter, not harder.

        Our solution? Go worker pools.

        The Solution

        A flow diagram showing the Go worker pool pattern
        Go worker pool pattern

        The worker pool pattern is a design in which a fixed number of workers are given a stream of tasks to process in a queue. The tasks stay in the queue until a worker is free to pick up the task and execute it. Worker pools are great for controlling the concurrent execution for a set of defined jobs. As a result of these workers controlling the amount of concurrent goroutines in action, less stress is put on our system’s resources. This design also worked perfectly for scaling up in anticipation of BFCM without relying entirely on vertical or horizontal scaling.

        When tasked with this new design, I was surprised at the intuitive setup for worker pools. The premise was creating a Go channel that receives a stream of jobs. You can think of Go channels as pipes that connect concurrent goroutines together, allowing them to communicate with each other. You send values into channels from one goroutine and receive those values into another goroutine. The Go workers retrieve their jobs from a channel as they become available, given the worker isn’t busy processing another job. Concurrently, the results of these jobs are sent to another Go channel or to another part of the pipeline.

        So let me take you through the logistics of the code!

        The Code

        I defined a worker interface that requires a CompleteJobs function that requires a go channel of type Job.

        The Job type takes the event batch, that’s integral to completing the task, as a parameter. Other types, like NewProcessorJob, can inherit from this struct to fit different use cases of the specific task.

        New workers are created using the function NewWorker. It takes workFunc as a parameter which processes the jobs. This workFunc can be tailored to any use case, so we can use the same Worker interface for different components to do different types of work. The core of what makes this design powerful is that the Worker interface is used amongst different components to do varying different types of tasks based on the Job spec.

        CompleteJobs will call workFunc on each Job as it receives it from the jobs channel. 

        Now let’s tie it all together.

        Above is an example of how I used workers to process our events in our pipeline. A job channel and a set number of numWorkers workers are initialized. The workers are then posed to receive from the jobs channel in the CompleteJobs function in a goroutine. Putting go before the CompleteJobs function allows the function to run in a goroutine!

        As event batches get consumed in the for loop above, the batch is converted into a Job that’s emitted to the jobs channel with the <- keyword. Each worker receives these jobs accordingly from the jobs channel. The goroutine we previously called with go worker.CompleteJobs(jobs, &producerWg) runs concurrently and receives these jobs.

        But wait, how do the workers know when to stop processing events?

        When the system is ready to be scaled down, wait groups are used to ensure that any existing tasks in flight are completed before the system shuts down. A waitGroup is a type of counter in Go that blocks the execution of a function until its internal counter becomes zero. As the workers were created above, the waitGroup counter was incremented for every worker that was created with the function producerWg.Add(1). In the CompleteJobs function wg.Done() is executed when the jobs channel is closed and jobs stop being received. wg.Done decrements the waitGroup counter for every worker.

        When a context cancel signal is received (signified by <- ctx.Done() above ), the remaining batches are sent to the Job channel so the workers can finish their execution. The Job channel is closed safely enabling the workers to break out of the loop in CompleteJobs and stop processing jobs. At this point, the WaitGroups’ counters are zero and the outputBatches channel,where the results of the jobs get sent to, can be closed safely. 

        The Improvements

        A flow diagram showing the new design flow from Storefronts to Kafka consumer to batch processor to Go worker pools to Kafka producer to Third party partners.
        New design

        Once deployed, the time improvement using the new worker pool design was promising. I conducted load testing that showed as more workers were added, more events could be processed on one pod. As mentioned before, in our previous implementation our service could only handle around 7.75 thousand events per second per pod in production without adding to our consumption lag.

        My team initially set the number of workers to 15 each in the processor and producer. This introduced a processing lift of  66% (12.9 thousand events per second per pod). By upping the workers to 50, we increased our event load by 149% from the old design resulting in 19.3 thousand events per second per pod. Currently, with performance improvements we can do 21 thousand events per second per pod. A 170% increase! This was a great win for the team and gave us the foundation to be adequately prepared for BFCM 2021, where we experienced a max of 46 thousand events per second!

        Go worker pools are a lightweight solution to speed up computation and allow concurrent tasks to be more performant. This go worker pool design has been reused to work with other components of our service such as validation, parsing, and augmentation. 

        By using the same Worker interface for different components, we can scale out each part of our pipeline differently to meet its use case and expected load. 

        Cheers to more quiet BFCMs!

        Kyra Stephen is a backend software developer on the Event Streaming - Customer Behavior team at Shopify. She is passionate about working on technology that impacts users and makes their lives easier. She also loves playing tennis, obsessing about music and being in the sun as much as possible.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Shopify Data’s Guide To Opportunity Sizing

        Shopify Data’s Guide To Opportunity Sizing

        For every initiative that a business takes on, there is an opportunity potential and a cost—the cost of not doing something else. But how do you tangibly determine the size of an opportunity?

        Opportunity sizing is a method that data scientists can use to quantify the potential impact of an initiative ahead of making the decision to invest in it. Although businesses attempt to prioritize initiatives, they rarely do the math to assess the opportunity, relying instead on intuition-driven decision making. While this type of decision making does have its place in business, it also runs the risk of being easily swayed by a number of subtle biases, such as information available, confirmation bias, or our intrinsic desire to pattern-match a new decision to our prior experience.

        At Shopify, our data scientists use opportunity sizing to help our product and business leaders make sure that we’re investing our efforts in the most impactful initiatives. This method enables us to be intentional when checking and discussing the assumptions we have about where we can invest our efforts.

        Here’s how we think about opportunity sizing.

        How to Opportunity Size

        Opportunity sizing is more than just a tool for numerical reasoning, it’s a framework businesses can use to have a principled conversation about the impact of their efforts.

        An example of opportunity sizing could look like the following equation: if we build feature X, we will acquire MM (+/- delta) new active users in T timeframe under DD assumptions.

        So how do we calculate this equation? Well, first things first, although the timeframe for opportunity sizing an initiative can be anything relevant to your initiative, we recommend an annualized view of the impact so you can easily compare across initiatives. This is important because when your initiative goes live, it can have a significant impact on the in-year estimated impact of your initiative.

        Diving deeper into how to size an opportunity, below are a few methods we recommend for various scenarios.

        Directional T-Shirt Sizing

        Directional t-shirt sizing is the most common approach when opportunity sizing an existing initiative and is a method anyone (not just data scientists) can do with a bit of data to inform their intuition. This method is based on rough estimates and depends on subject matter experts to help estimate the opportunity size based on similar experiences they’ve observed in the past and numbers derived from industry standards. The estimates used in this method rely on knowing your product or service and your domain (for example, marketing, fulfillment, etc.). Usually the assumptions are generalized, assuming overall conversion rates using averages or medians, and not specific to the initiative at hand.

        For example, let’s say your Growth Marketing team is trying to update an email sequence (an email to your users about a new product or feature) and is looking to assess the size of the opportunity. Using the directional t-shirt sizing method, you can use the following data to inform your equation:

        1. The open rates of your top-performing content
        2. The industry average of open rates 

        Say your top-performing content has an open rate of five percent, while the industry average is ten percent. Based on these benchmarks, you can assume that the opportunity can be doubled (from five to ten percent).

        This method offers speed over accuracy, so there is a risk of embedded biases and lack of thorough reflection on the assumptions made. Directional t-shirt sizing should only be used in the stages of early ideation or sanity checking. Opportunity sizing for growth initiatives should use the next method: bottom-up.

        A matrix diagram with high rigor, lower rigor, new initiatives and existing initiatives as categories. It highlights that Directional T-Shirt sizing requires lower rigor opportunities
        Directional t-shirt sizing should be used for existing initiatives that require lower rigor opportunities.

        Bottom-Up Using Comparables

        Unlike directional t-shirt sizing, the bottom-up method uses the performance of a specific comparable product or system as a benchmark, and relies on the specific skills of a data scientist to make data-informed decisions. The bottom-up method is used to determine the opportunity of an existing initiative. The bottom-up method relies on observed data on similar systems, which means it tends to have a higher accuracy than directional t-shirt sizing. Here are some tips for using the bottom-up method:

        1. Understand the performance of a product or system that is comparable.

        To introduce any enhancements to your current product or system, you need to understand how it’s performing in the first place. You’ll want to identify, observe and understand the performance rates in a comparable product or system, including the specifics of its unique audience and process.

        For example, let’s say your Growth Marketing team wants to localize a new welcome email to prospective users in Italy that will go out to 100,000 new leads per year. A comparable system could be a localized welcome email in France that the team sent out the prior year. With your comparable system identified, you’ll want to dig into some key questions and performance metrics like:

        • How many people received the email?
        • Is there anything unique about that audience selection? 
        • What is the participation rate of the email? 
        • What is the conversion rate of the sequence? Or in other words, of those that opened your welcome email, how many converted to customers?

        Let’s say we identified that our current non-localized email in Italy has a click through rate (CTR) of three percent, while our localized email in France has a CTR of five percent over one year. Based on the metrics of your comparable system, you can identify a base metric and make assumptions of how your new initiative will perform.

        2. Be clear and document your assumptions.

        As you think about your initiative, be clear and document your assumptions about its potential impact and the why behind each assumption. Using the performance metrics of your comparable system, you can generate an assumed base metric and the potential impact your initiative will have on that metric. With your base metric in hand, you’ll want to consider the positive and negative impacts your initiative may have, so quantify your estimate in ranges with an upper and lower bound.

        Returning to our localized welcome email example, based on the CTR metrics from our comparable system we can assume the impact of our Italy localization initiative: if we send out a localized welcome email to 100,000 new leads in Italy, we will obtain a CTR between three and five percent (+/- delta) in one year. This is based on our assumptions that localized content will perform better than non-localized content, as seen in the performance metrics of our localized welcome email in France.

        3. Identify the impact of your initiative on your business’ wider goals.

        Now that you have your opportunity sizing estimate for your initiative, the next question that comes to mind is “what does that mean for the rest of your business goals?”. To answer this, you’ll want to estimate the impact on your top-line metric. This enables you to compare different initiatives with an apples-to-apples lens, while also avoiding the tendency to bias to larger numbers when making comparisons and assessing impact. For example, a one percent change in the number of sessions can look much bigger than a three percent change in the number of customers which is further down the funnel.

        Returning to our localized welcome email example, we should ask ourselves how an increase in CTR impacts our topline metric of active user count? Let’s say that when we localized the email in France, we saw an increase of five percent in CTR that translated to a three percent increase in active users per year. Accordingly, if we localize the welcome email in Italy, we may expect to get a three percent increase which would translate to 3,000 more active users per year.

        Second order thinking is a great asset here. It’s beneficial for you to consider potential modifiers and their impact. For instance, perhaps getting more people to click on our welcome email will reduce our funnel performance because we have lower intent people clicking through. Or perhaps it will improve funnel performance because people are better oriented to the offer. What are the ranges of potential impact? What evidence do we have to support these ranges? From this thinking, our proposal may change: we may not be able to just change our welcome email, we may also have to change landing pages, audience selection, or other upstream or downstream aspects.

        A matrix diagram with high rigor, lower rigor, new initiatives and existing initiatives as categories. It highlights that Bottom-up sizing is used for existing initiatives that requires higher rigor opportunities
        Bottom-up opportunity sizing should be used for existing initiatives that require higher rigor opportunities.


        The top-down method should be used when opportunity sizing a new initiative. This method is more nuanced as you’re not optimizing something that exists. With the top-down method, you’ll start by using a larger set of vague information, which you’ll then attempt to narrow down into a more accurate estimation based on assumptions and observations. 2

        Here are a few tips on how to implement the top-down method:

        1. Gather information about your new initiative.

        Unlike the bottom-up method, you won’t have a comparable system to establish a base metric. Instead, seek as much information on your new initiative as you can from internal or external sources.

        For example, let’s say you’re looking to size the opportunity of expanding your product or service to a new market. In this case, you might want to get help from your product research team to gain more information on the size of the market, number of potential users in that market, competitors, etc.

        2. Be clear and document your assumptions.

        Just like the bottom-up method, you’ll want to clearly identify your estimates and what evidence you have to support them. For new initiatives, typically assumptions are going to lean towards being more optimistic than existing initiatives because we’re biased to believe that our initiatives will have a positive impact. This means you need to be rigorous in testing your assumptions as part of this sizing process. Some ways to test your assumptions include:

        • Using the range of improvement of previous initiative launches to give you a sense of what's possible. 
        • Bringing the business case to senior stakeholders and arguing your case. Often this makes you have to think twice about your assumptions.

        You should be conservative in your initial estimates to account for this lack of precision in your understanding of the potential.

        Looking at our example of opportunity sizing a new market, we’ll want to document some assumptions about:

        • The size of the market: What is the size of the existing market versus the new market size. You can gather this information from external datasets. In the absence of data on a market or audience, you can also make assumptions based on similar audiences or regions elsewhere. 
        • The rate at which you think you can reach and engage this market: This includes the assumed conversion rates of new users. The conversion rates may be assumed to be similar to past performance when a new channel or audience was introduced. You can use the tips identified in the bottom-up method.

        3. Identify the impact of your initiative on your business’ wider goals.

        Like the bottom-up method, you need to assess the impact your initiative will have on your business’ wider goals. Based on the above example, what does the assumed impact of our initiative mean in terms of active users?

        A matrix diagram with high rigor, lower rigor, new initiatives and existing initiatives as categories. It highlights that Top-down sizing is for new initiatives that require higher rigor opportunities
        Top-down opportunity sizing should be used for new initiatives that require higher rigor opportunities.

        And there you have it! Opportunity sizing is a worthy investment that helps you say yes to the most impactful initiatives. It’s also a significant way for data science teams to help business leaders prioritize and steer decision-making. Once your initiative launches, test to see how close your sizing estimates were to actuals. This will help you hone your estimates over time.

        Next time your business is outlining its product roadmap, or your team is trying to decide whether it’s worth it to take on a particular project, use our opportunity sizing basics to help identify the potential opportunity (or lack thereof).

        Dr. Hilal is the VP of Data Science at Shopify, responsible for overseeing the data operations that power the company’s commercial and service lines.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Data Science & Engineering career page to find out about our open positions. Learn about how we’re hiring to design the future together—a future that is Digital by Design.

        Continue reading

        How We Built Oxygen: Hydrogen’s Counterpart for Hosting Custom Storefronts

        How We Built Oxygen: Hydrogen’s Counterpart for Hosting Custom Storefronts

        In June, we shared the details of how we built Hydrogen, our React-based framework for building custom storefronts. We talked about some of the big bets we made on new technologies like React Server components, and the many internal and external collaborations that made Hydrogen a reality.  

        This time we tell the story of Oxygen, Hydrogen’s counterpart that makes hosting Hydrogen custom storefronts easy and seamless. Oxygen guarantees fast and globally available storefronts that securely integrate with the Shopify ecosystem while eliminating additional costs of setting up third-party hosting tools. 

        We’ll dive into the experiences we focused on, the technical choices we made to build those experiences, and how those choices paved the path for Shopify to get involved in standardizing serverless runtimes in partnership with leaders like Cloudflare and Deno.

        Shopify-Integrated Merchant Experience

        Let’s first briefly look at why we built Oxygen. There are existing products in the market that can host Hydrogen custom storefronts. Oxygen’s uniqueness is in the tight integration it provides with Shopify. Our technical choices so far have largely been grounded in ensuring this integration is frictionless for the user.

        We started with GitHub for version control, GitHub actions for continuous deployment, and Cloudflare for worker runtimes and edge distribution. We combined these third-party services with first-party services such as Shopify CDN, Shopify Admin API, and Shopify Identity and Access Management. They’re glued together by Oxygen-scoped services that additionally provide developer tooling and observability. Oxygen today is the result of bundling together this collection of technologies.

        A flow diagram highlighting the interaction between the developer, GitHub, Oxygen, Cloudflare, the buyer, and Shopify
        Oxygen overview

        We introduced the Hydrogen sales channel as the connector between Hydrogen, Oxygen, and the shop admin. The Hydrogen channel is the portal that provides controls to create and manage custom storefronts, link them to the rest of shop administrative functions, and connect them to Oxygen for hosting. It is built on Shopify’s standard Rails and React stack, leveraging Polaris design system for consistent user experience across Shopify-built admin experiences.

        Fast Buyer Experience

        Oxygen exists to give merchants the confidence that Shopify will deliver an optimal buyer experience while merchants focus on their entrepreneurial objectives. Optimal buyer experience in Oxygen’s context is a combination of high availability guarantees, super fast site performance from anywhere in the world, and resilience to handle high-volume traffic.

        To Build or To Reuse

        This is where we had the largest opportunity to contemplate our technical direction. We could leverage over a decade’s experience at Shopify in building infrastructure solutions that keep the entire Shopify platform up and running to build an infrastructure layer, control plane, and proprietary V8 isolates. In fact, we did briefly and it was a successful venture! However, we ultimately decided to opt for Cloudflare’s battle-hardened worker infrastructure that guarantees buyer access to storefronts within milliseconds due to global edge distribution.

        This foundational decision significantly simplified upfront infrastructural complexity, scale and security risk considerations, allowing us to get Oxygen to merchants faster and validate our bets. These choices also leave us enough room to go back and build our own proprietary version at scale or a simpler variation of it if it makes sense both for the business and the users.

        A flow diagram highlighting the interactions between the Shopify store, Cloudflare, and the Oxygen workers.
        Oxygen workers overview

        We were able to provide merchants and buyers the promised fast performance while locking in access controls. When a buyer makes a request to a Hydrogen custom storefront hosted at, that request is received by Oxygen’s Gateway Worker running in Cloudflare. This worker is responsible for validating that the accessor has the necessary authorization to the shop and the specific storefront version before routing them to the Storefront Worker that is running the Hydrogen-based storefront code. The worker chaining is made possible using Cloudflare’s new Dynamic Dispatch API from Workers for Platforms.

        Partnerships and Open Source

        Rather than reinventing the wheel, we took the opportunity to work with leaders in the JavaScript runtimes space to collectively evolve the ecosystem. We use and contribute to the Workers for Platforms solution through tight feedback loops with Cloudflare. We also jointly established WinterCG, a JavaScript runtimes community group in partnership with Cloudflare, Vercel, Deno, and others. We leaned in to collectively building with and for the community, just the way we like it at Shopify.

        Familiar Developer Experience

        Oxygen largely provides developer-oriented capabilities and we strive to provide a developer experience that cohesively integrates with existing developer workflows. 

        Continuous Data-Informed Improvements

        While the Oxygen platform takes care of the infrastructure and distribution management of custom storefronts, it surfaces critical information about how custom storefronts are performing in production, ensuring fast feedback loops throughout the development lifecycle. Specifically, runtime logs and metrics are surfaced through the observability dashboard within the Hydrogen channel for troubleshooting and performance trend monitoring. The developer-oriented user can extrapolate actions necessary to further improve the site quality.

        A flow diagram highlighting the observability enablement side
        Oxygen observability overview

        We made very deliberate technical choices again on the observability enablement side. Unlike Shopify’s internal observability stack, Oxygen’s observability stack consists of Grafana for dashboards and alerting, Cortex for metrics, Loki for logging, and Tempo for tracing. Under the hood, Oxygen’s Trace Worker runs in Cloudflare and attaches itself to Storefront Workers to capture all of the logging and metrics information and forwards all of it to our Grafana stack. Logs are sent to Loki and metrics are sent to Cortex where the Oxygen Observability Service pulls both on-demand when the Hydrogen channel requests it.

        The tech stack was chosen for two key purposes: Oxygen provided a good test bed to experiment and evaluate these tools for a potential long-term fit for the rest of Shopify, and Oxygen’s use case is fundamentally different from internal Shopify. To support the latter, we needed a way to separate internal-facing from external-facing metrics cleanly while scaling to the data loads. We also needed the tool to be flexible enough that we can provide merchants optionality to integrate with any existing monitoring tools in their workflows.

        What’s Next

        Thanks to many flexible, eager, and collaborative merchants who maintained tight feedback loops every step of the way, Oxygen is used in production today by Allbirds, Shopify Supply, Shopify Hardware, and Denim Tears. It is generally available to our Plus merchants as of June.

        We’re just getting started though! We have our eyes on unlocking composable, plug-and-play styled usage in addition to surfacing deeper development insights earlier in the development lifecycle to shorten feedback loops. We also know there is a lot of opportunity for us to enhance the developer experience by reducing the number of surfaces to interact with, providing more control from the command line, and generally streamlining the Oxygen developer tools with the overall Shopify developer toolbox.

        We’re eager to take in all the merchant feedback as they demand the best of us. It helps us discover, learn, and push ourselves to revalidate assumptions, which will ultimately create new opportunities for the platform to evolve.

        Curious to learn more? We encourage you to check out the docs!

        Sneha is an engineering leader on the Oxygen team and has contributed to various teams building tools for a developer-oriented audience.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        How We Enable Two-Day Delivery in the Shopify Fulfillment Network

        How We Enable Two-Day Delivery in the Shopify Fulfillment Network

        Merchants sign up for Shopify Fulfillment Network (SFN) to save time on storing, packing, and shipping their merchandise, which is distributed and balanced across our network of Shopify-certified warehouses throughout North America. We optimize the placement of inventory so it’s closer to buyers, saving on shipping time, costs, and carbon emissions.

        Recently, SFN also made it easier and more affordable for merchants to offer two-day delivery  across the United States. We consider many real-world factors to provide accurate delivery dates, optimizing for sustainable ground shipment methods and low cost.

        A Platform Solution

        As with most new features at Shopify, we couldn’t just build a custom solution for SFN. Shopify is an extensible platform with a rich ecosystem of apps that solve complex merchant problems. Shopify Fulfillment Network is just one of the many third-party logistics (3PL) solutions available to merchants. 

        This was a multi-team initiative. One team built the delivery date platform in the core of Shopify, consisting of a new set of GraphQL APIs that any 3PL can use to upload their delivery dates to the Shopify platform. Another team integrated the delivery dates into the Shopify storefront, where they are shown to buyers on the product details page and at checkout.  A third team built the system to calculate and upload SFN delivery dates to the core platform. SFN is an app that merchants install in their shops, in the same way other 3PLs interact with the Shopify platform. The SFN app calls the new delivery date APIs to upload its own delivery dates to Shopify. For accuracy, the SFN delivery dates are calculated using network, carrier and product data. Let’s take a closer look at these inputs to the delivery date.

        Four Factors That Determine Delivery Date

        There are many factors to predict when a package will leave the warehouse, and how long it will spend in transport to the destination address. Each 3PL has its own particular considerations, and the Shopify platform is flexible enough to support them all. Requiring them to conform to fine-grained platform primitives such as operating days and capacity or processing and transit times would only result in loss of fidelity of their particular network and processes.

        With that in mind, we let 3PLs populate the platform with their pre-computed delivery dates that Shopify surfaces to buyers on the product details page and checkout. The 3PL has full control over all the factors that affect their delivery dates. Let’s take a look at some of these factors.

        1. Proximity to the Destination

        The time required for delivery is dependent on the distance the package must travel to arrive at its destination. Usually, the closer the inventory is to the destination, the faster the delivery. This means that SFN delivery dates depend on specific inventory availability throughout the network. 

        2. Heavy, Bulky, or Dangerous Goods

        Some carrier services aren’t applicable to merchandise that exceeds a specific weight or dimensions. Others can’t be used to transport hazardous materials. The delivery date we predict for such items must be based on a shipping carrier service that can meet the requirements.

        3. Time Spent at Fulfillment Centers

        Statutory holidays and warehouse closures affect when the package can be ready for shipment. Fulfillment centers have regular operating calendars and sometimes exceptional circumstances can force them to close, such as severe weather events.

        On any given operating day, warehouse staff need time to pick and pack items into packages for shipment. There’s also a physical limit to how many packages can be processed in a day. Again, exceptional circumstances such as illness can reduce the staff available at the fulfillment center, reducing capacity and increasing processing times.

        4. Time Spent in the Hands of Shipping Carriers

        In SFN, shipping carriers such as UPS and USPS are used to transport packages from the warehouse to their destination. Just like the warehouses, shipping carriers have their own holidays and closures that affect when the package can be picked up, transported, and delivered. These are modeled as carrier transit days, when packages are moved between hubs in the carrier’s own network, and delivery days, when packages are taken from the last hub to the final delivery address.

        Shipping carriers send trucks to pick up packages from the warehouse at scheduled times of the day. Orders that are made after the last pickup for the day have to wait until the next day to be shipped out. Some shipping carriers only pick up from the warehouse if there are enough packages to make it worth their while. Others impose a limit on the volume of packages they can take away from the warehouse, according to their truck dimensions. These capacity limits influence our choice of carrier for the package.

        Shipping carriers also publish the expected number of days it takes for them to transport a package from the warehouse to its destination (called Time in Transit). The first transit day is the day after pickup, and the last transit day is the delivery day. Some carriers deliver on Saturdays even though they won’t transport the package within their network on a Saturday.

        Putting It All Together

        Together, all of these factors are considered when we decide which fulfillment center should process a shipment and which shipping carrier service to use to transport it to its final destination. At regular intervals, we select a fulfillment center and carrier service that optimizes for sustainable two-day delivery for every merchant SKU to every US zip code. From this, we upload pre-calculated schedules of delivery dates to the Shopify platform.

        That’s a Lot of Data

        It’s a lot of data, but much of it can be shared between merchant SKUs. Our strategy is to produce multiple schedules, each one reflecting the delivery dates for inventory available at the same set of fulfillment centers and with similar characteristics such as weight and dimensions. Each SKU is mapped to a schedule, and SKUs from different shops can share the same schedule.

        Example mapping of SKUs to schedules of delivery dates
        Example mapping of SKUs to schedules of delivery dates

        In this example, SKU-1.2 from Shop 1 and SKU-3.1 from Shop 3 share the same schedule of delivery dates for heavy items stocked in both California and New York. If an order is placed today by 4pm EST for SKU-1.2 or SKU-3.1, shipping to zip code 10019 in New York, it will arrive on June 10. Likewise, if an order is placed today for SKU-1.2 or SKU-3.1 by 3pm PST, shipping to zip code 90002 in California, it will arrive on June 11.

        Looking Up Delivery Dates in Real Time

        When Shopify surfaces a delivery date on a product details page or at checkout, it’s a direct, fast lookup (under 100 ms) to find the pre-computed date for that item to the destination address. This is because the SFN app uploads pre-calculated schedules of delivery dates to the Shopify platform and maps each SKU to a schedule using the delivery date APIs.  

        SFN System overview
        System overview

        The date the buyer sees at checkout is sent back to SFN during order fulfillment, where it’s used to ensure that the order is routed to a warehouse and shipping label that meets the delivery date.

        There you have it, a highly simplified overview of how we built two-day delivery in the SFN.

        Learn More About SFN

        Spin Cycle: Shopify's SFN Team Overcomes a Cloud-Development Spiral

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Instant Performance Upgrade: From FlatList to FlashList

        Instant Performance Upgrade: From FlatList to FlashList

        When was the last time you scrolled through a list? Odds are it was today, maybe even seconds ago. Iterating over a list of items is a feature common to many frameworks, and React Native’s FlatList is no different. The FlatList component renders items in a scrollable view without you having to worry about memory consumption and layout management (sort of, we’ll explain later).

        The challenge, as many developers can attest to, is getting FlatList to perform on a range of devices without display artifacts like drops in UI frames per second (FPS) and blank items while scrolling fast.

        Our React Native Foundations team solved this challenge by creating FlashList, a fast and performant list component that can be swapped into your project with minimal effort. The requirements for FlashList included

        • High frame rate: even when using low-end devices, we want to guarantee the user can scroll smoothly, at a consistent 60 FPS or greater.
        • No more blank cells: minimize the display of empty items caused by code not being able to render them fast enough as the user scrolls (and causing them to wonder what’s up with their app).
        • Developer friendly: we wanted FlashList to be a drop-in replacement for FlatList and also include helpful features beyond what the current alternatives offer.

        The FlashList API is five stars. I've been recommending all my friends try FlashList once we open source.

        Daniel Friyia, Shopify Retail team
        Performance comparison between FlatList and FlashList

        Here’s how we approached the FlashList project and its benefits to you.

        The Hit List: Why the World Needs Faster Lists

        Evolving FlatList was the perfect match between our mission to continuously create powerful components shared across Shopify and solving a difficult technical challenge for React Native developers everywhere. As more apps move from native to React Native, how could we deliver performance that met the needs of today’s data-hungry users while keeping lists simple and easy to use for developers?

        Lists are in heavy use across Shopify, in everything from Shop to POS. For Shopify Mobile in particular, where over 90% of the app uses native lists, there was growing demand for a more performant alternative to FlatList as we moved more of our work to React Native. 

        What About RecyclerListView?

        RecyclerListView is a third-party package optimized for rendering large amounts of data with very good real-world performance. The difficulty lies in using it, as developers must spend a lot of time understanding how it works, playing with it, and requiring significant amounts of code to manage. For example, RecyclerListView needs predicted size estimates for each item and if they aren’t accurate, the UI performance suffers. It also renders the entire layout in JavaScript, which can lead to visible glitches.

        When done right, RecyclerListView works very well! But it’s just too difficult to use most of the time. It’s even harder if you have items with dynamic heights that can’t be measured in advance.

        So, we decided to build upon RecyclerListView and solve these problems to give the world a new list library that achieved native-like performance and was easy to use.

        Shop app's search page using FlashList

        The Bucket List: Our Approach to Improving React Native Lists

        Our React Native Foundations team takes a structured approach to solving specific technical challenges and creates components for sharing across Shopify apps. Once we’ve identified an ambitious problem, we develop a practical implementation plan that includes rigorous development and testing, and an assessment of whether something should be open sourced or not. 

        Getting list performance right is a popular topic in the React Native community, and we’d heard about performance degradations when porting apps from native to React Native within Shopify. This was the perfect opportunity for us to create something better. We kicked off the FlashList project to build a better list component that had a similar API to FlatList and boosted performance for all developers. We also heard some skepticism about its value, as some developers rarely saw issues with their lists on newer iOS devices.

        The answer here was simple. Our apps are used on a wide range of devices, so developing a performant solution across devices based on a component that was already there made perfect sense.

        “We went with a metrics-based approach for FlashList rather than subjective, perception-based feelings. This meant measuring and improving hard performance numbers like blank cells and FPS to ensure the component worked on low-end devices first, with high-end devices the natural byproduct.” - David Cortés, React Native Foundations team

        FlashList feedback via Twitter

        Improved Performance and Memory Utilization

        Our goal was to beat the performance of FlatList by orders of magnitude, measured by UI thread FPS and JavaScript FPS. With FlatList, developers typically see frame drops, even with simple lists. With FlashList, we improved upon the FlatList approach of generating new views from scratch on every update by using an existing, already allocated view and refreshing elements within it (that is, recycling cells). We also moved some of the layout operations to native, helping smooth over some of the UI glitches seen when RecyclerListView is provided with incorrect item sizes.

        This streamlined approach boosted performance to 60 FPS or greater—even on low-end devices!

        Comparison between FlatList and FlashList via Twitter (note this is on a very low-end device)

        We applied a similar strategy to improve memory utilization. Say you have a Twitter feed with 200-300 tweets per page, FlatList starts rendering with a large number of items to ensure they’re available as the user scrolls up or down. FlashList, in comparison, requires a much smaller buffer which reduces memory footprint, improves load times, and keeps the app significantly more responsive. The default buffer is just 250 pixels.

        Both these techniques, along with other optimizations, help FlashList achieve its goal of no more blank cells, as the improved render times can keep up with user scrolling on a broader range of devices. 

        Shopify Mobile is using FlashList as the default and the Shop team moved search to FlashList. Multiple teams have seen major improvements and are confident in moving the rest of their screens.

        Talha Naqvi, Software Development Manager, Retail Accelerate

        These performance improvements included extensive testing and benchmarking on Android devices to ensure we met the needs of a range of capabilities. A developer may not see blank items or choppy lists on the latest iPhone but that doesn’t mean it’ll work the same on a lower-end device. Ensuring FlashList was tested and working correctly on more cost-effective devices made sure that it would work on the higher-end ones too.

        Developer Friendly

        If you know how FlatList works, you know how FlashList works. It’s easy to learn, as FlashList uses almost the same API as FlatList, and has new features designed to eliminate any worries about layout and performance, so you can focus on your app’s value.

        Shotgun's main feature is its feed, so ensuring consistent and high-quality performance has always been crucial. That's why using FlashList was a no brainer. I love that compared to the classic FlatList, you can scrollToIndex to an index that is not within your view. This allowed us to quickly develop our new event calendar list, where users can jump between dates to see when and where there are events.

        It takes seconds to swap your existing list implementation from FlatList to FlashList. All you need to do is change the component name and optionally add the estimatedItemSize prop, as you can see in this example from our documentation:

        Example of FlashList usage.

        Powerful Features

        Going beyond the standard FlatList props, we added new features based on common scenarios and developer feedback. Here are three examples:

        • getItemType: improves render performance for lists that have different types of items, like text vs. image, by leveraging separate recycling pools based on type.
        Example of getItemType usage
        • Flexible column spans: developers can create grid layouts where each item’s column span can be set explicitly or with a size estimate (using overrideItemLayout), providing flexibility in how the list appears.
        • Performance metrics: as we moved many layout functions to native, FlashList can extract key metrics for developers to measure and troubleshoot performance, like reporting on blank items and FPS. This guide provides more details.

        Documentation and Samples

        We provide a number of resources to help you get started with FlashList:

        Flash Forward with FlashList

        Accelerate your React Native list performance by installing FlashList now. It’s already deployed within Shopify Mobile, Shop, and POS and we’re working on known issues and improvements. We recommend starring the FlashList repo and can’t wait to hear your feedback!

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        When is JIT Faster Than A Compiler?

        When is JIT Faster Than A Compiler?

        I had this conversation over and over before I really understood it. It goes:

        “X language can be as fast as a compiled language because it has a JIT compiler!”
        “Wow! It’s as fast as C?”
        “Well, no, we’re working on it. But in theory, it could be even FASTER than C!”
        “Really? How would that work?”
        “A JIT can see what your code does at runtime and optimize for only that. So it can do things C can’t!”
        “Huh. Uh, okay.”

        It gets hand-wavy at the end, doesn’t it? I find that frustrating. These days I work on YJIT, a JIT for Ruby. So I can make this extremely NOT hand-wavy. Let’s talk specifics.

        I like specifics.

        Wait, What’s JIT Again?

        An interpreter reads a human-written description of your app and executes it. You’d usually use interpreters for Ruby, Python, Node.js, SQL, and nearly all high-level dynamic languages. When you run an interpreted app, you download the human-written source code, and you have an interpreter on your computer that runs it. The interpreter effectively sees the app code for the first time when it runs it. So an interpreter doesn’t usually spend much time fixing or improving your code. They just run it how it’s written. An interpreter that significantly transforms your code or generates machine code tends to be called a compiler.

        A compiler typically turns that human-written code into native machine code, like those big native apps you download. The most straightforward compilers are ahead-of-time compilers. They turn human-written source code into a native binary executable, which you can download and run. A good compiler can greatly speed up your code by putting a lot of effort into improving it ahead of time. This is beneficial for users because the app developer runs the compiler for them. The app developer pays the compile cost, and users get a fast app. Sometimes people call anything a compiler if it translates from one kind of code to another—not just source code to native machine code. But when I say “compiler” here, I mean the source-code-to-machine-code kind.

        A JIT, aka a Just-In-Time compiler, is a partial compiler. A JIT waits until you run the program and then translates the most-used parts of your program into fast native machine code. This happens every time you run your program. It doesn’t write the code to a file—okay, except MJIT and a few others. But JIT compilation is primarily a way to speed up an interpreter—you keep the source code on your computer, and the interpreter has a JIT built into it. And then long-running programs go faster.

        It sounds kind of inefficient, doesn’t it? Doing it all ahead of time sounds better to me than doing it every time you run your program.

        But some languages are really hard to compile correctly ahead of time. Ruby is one of them. And even when you can compile ahead of time, often you get bad results. An ahead-of-time compiler has to create native code that will always be correct, no matter what your program does later, and sometimes that means it’s about as bad as an interpreter, which has that exact same requirement.

        Ruby is Unreasonably Dynamic

        Ruby is like my four-year-old daughter: the things I love most about it are what make it difficult.

        In Ruby, I can redefine + on integers like 3, 7, or -45. Not just at the start—if I wanted, I could write a loop and redefine what + means every time through that loop. My new + could do anything I want. Always return an even number? Print a cheerful message? Write “I added two numbers” to a log file? Sure, no problem.

        That’s thrilling and wonderful and awful in roughly equal proportions.

        And it’s not just +. It’s every operator on every type. And equality. And iteration. And hashing. And so much more. Ruby lets you redefine it all.

        The Ruby interpreter needs to stop and check every time you add two numbers if you have changed what + means in between. You can even redefine + in a background thread, and Ruby just rolls with it. It picks up the new + and keeps right on going. In a world where everything can be redefined, you can be forgiven for not knowing many things, but the interpreter handles it.

        Ruby lets you do awful, awful things. It lets you do wonderful, wonderful things. Usually, it’s not obvious which is which. You have expressive power that most languages say is a very bad idea.

        I love it.

        Compilers do not love it.

        When JITs Cheat, Users Win

        Okay, we’ve talked about why it’s hard for ahead-of-time (AOT) compilers to deliver performance gains. But then, how do JIT compilers do it? Ruby lets you constantly change absolutely everything. That’s not magically easy at runtime. If you can’t compile + or == or any operator, why can you compile some parts of the program?

        With a JIT, you have a compiler around as the program runs. That allows you to do a trick.

        The trick: you can compile the method wrong and still get away with it.

        Here’s what I mean.

        YJIT asks, “Well, what if you didn’t change what + means every time?” You almost never do that. So it can compile a version of your method where + keeps its meaning from right now. And so does equals, iteration, hashing, and everything you can change in Ruby but you nearly never do.

        But… that’s wrong. What if I do change those things? Sometimes apps do. I’m looking at you, ActiveRecord.

        But your JIT has a compiler around at runtime. So when you change what + means, it will throw away all those methods it compiled with the old definition. Poof. Gone. If you call them again, you get the interpreted version again. For a while—until JIT compiles a new version with the new definition. This is called de-optimization. When the code starts being wrong, throw it away. When 3+4 stops being 7 (hey, this is Ruby!), get rid of the code that assumed it was. The devil is in the details—switching from one version of a method to another version midway through is not easy. But it’s possible, and JIT compilers basically do it successfully.

        So your JIT can assume you don’t change + every time through the loop. Compilers and interpreters can’t get away with that.

        An AOT compiler has to create fully correct code before your app even ships. It’s very limited if you change anything. And even if it had some kind of fallback (“Okay, I see three things 3+4 could be in this app”), it can only respond at runtime with something it figured out ahead of time. Usually, that means very conservative code that constantly checks if you changed anything.

        An interpreter must be fully correct and respond immediately if you change anything. So they normally assume that you could have redefined everything at any time. The normal Ruby interpreter spends a lot of time checking if you changed the definition of + over time. You can do  clever things to speed up that check, and CRuby does. But if you make your interpreter extremely clever, pre-building optimized code and invalidating assumptions, eventually you realize that you’ve built a JIT.

        Ruby and YJIT

        I work on YJIT, which is part of CRuby. We do the stuff I mention here. It’s pretty fast.

        There are a lot of fun specifics to figure out. What do we track? How do we make it faster? When it’s invalid, do we need to recompile or cancel it? Here’s an example I wrote recently.

        You can try out our work by turning on --yjit on recent Ruby versions. You can use even more of our work if you build the latest head-of-master Ruby, perhaps with ruby-build 3.2.0-dev. You can also get all the details by reading the source, which is built right into CRuby.

        By the way, YJIT has some known bugs in 3.1 that mean you should NOT use it for real production. We’re a lot closer now—it should be production-ready for some uses in 3.2, which comes out Christmas 2022.

        What Was All That Again?

        A JIT can add assumptions to your code, like the fact that you probably didn’t change what + means. Those assumptions make the compiled code faster. If you do change what + means, you can throw away the now-incorrect code.

        An ahead-of-time compiler can’t do that. It has to assume you could change anything you want. And you can.

        An interpreter can’t do that. It has to assume you could have changed anything at any time. So it re-checks constantly. A sufficiently smart interpreter that pre-builds machine code for current assumptions and invalidates if it changes could be as fast as JIT… Because it would be a JIT.

        And if you like blog posts about compiler internals—who doesn’t?—you should hit “Yes, sign me up” up above and to the left.

        Noah Gibbs wrote the ebook Rebuilding Rails and then a lot about how fast Ruby is at various tasks. Despite being a grumpy old programmer in Inverness, Scotland, Noah believes that some day, somehow, there will be a second game as good as Stuart Smith’s Adventure Construction Set for the Apple IIe. Follow Noah on Twitter and GitHub

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Spin Cycle: Shopify’s SFN Team Overcomes a Cloud-Development Spiral

        Spin Cycle: Shopify’s SFN Team Overcomes a Cloud-Development Spiral

        You may have read about Spin, Shopify’s new cloud-based development tool. Instead of editing and running a local version of a service on a developer’s MacBook, Shopify is moving towards a world where the development servers are available on-demand as a container running in Kubernetes. When using Spin, you don’t need anything on your local machine other than an ssh client and VSCode, if that’s your editor of choice. 

        By moving development off our MacBooks and onto Spin, we unlock the ability to easily share work in progress with coworkers and can work on changes that span different codebases without any friction. And because Spin instances are lightweight and ephemeral, we don’t run the risk of messing up long-lived development databases when experimenting with data migrations.

        Across Shopify, teams have been preparing and adjusting their codebases so that their services can run smoothly in this kind of environment. In the Shopify Fulfillment Network (SFN) engineering org, we put together a team of three engineers to get us up and running on Spin.

        At first, it seemed like the job would be relatively straightforward. But as we started doing the work, we began to notice some less obvious forces at play that were pushing against our efforts.

        Since it was easier for most developers to use our old tooling instead of Spin while we were getting the kinks worked out, developers would often unknowingly commit changes that broke some functionality we’d just enabled for Spin. In hindsight, the process of getting SFN working on Spin is a great example of the kind of hidden difficulty in technical work that's more related to human systems than how to get bits of electricity to do what you want.

        Before we get to the interesting stuff, it’s important to understand the basics of the technical challenge. We'll start by getting a broad sense of the SFN codebase and then go into the predictable work that was needed to get it running smoothly in Spin. With that foundation, we’ll be able to describe how and why we started treading water, and ultimately how we’re pushing past that.

        The Shape of SFN

        SFN exists to take care of order fulfillment on behalf of Shopify merchants. After a customer has completed the checkout process, their order information is sent to SFN. We then determine which warehouse has enough inventory and is best positioned to handle the order. Once SFN has identified the right warehouse, it sends the order information to the service responsible for managing that warehouse’s operations. The state of the system is visible to the merchant through the SFN app running in the merchant’s Shopify admin. The SFN app communicates to Shopify Core via the same GraphQL queries and mutations that Shopify makes available to all app developers.

        At a highly simplified level, this is the general shape of the SFN codebase:

        Diagram of SFN codebase. SFN is a large rectangle in the center containing six boxes labelled Subcomponent. Outside of the SFN box are six directional arrows. Each arrow connects to boxes called Dependency
        SFN’s monolithic Rails application with many external dependencies

        Similar to the Shopify codebase, SFN is a monolithic Rails application divided into individual components owned by particular teams. Unlike Shopify Core, however, SFN has many strong dependencies on services outside of its own monolith.

        SFN’s biggest dependency is on Shopify itself, but there are plenty more. For example, SFN does not design shipping labels, but does need to send shipping labels to the warehouse. So, SFN is a client to a service that provides valid shipping labels. Similarly, SFN does not tell the mobile Chuck robots in a warehouse where to go— we are a client of a service that handles warehouse operations.

        The value that SFN provides is in gluing together a bunch of separate systems with some key logic living in that glue. There isn't much you can do with SFN without those dependencies around in some form.

        How SFN Handles Dependencies

        As software developers, we need quick feedback about whether an in-progress change is working as expected. And to know if something is working in SFN, that code generally needs to be validated alongside one or several of SFN’s dependencies. For example, if a developer is implementing a feature to display some text in the SFN app after a customer has placed an order, there’s no useful way to validate that change without also having Shopify available.

        So the work of getting a useful development environment for SFN with Spin appears to be about looking at each dependency, figuring how to handle that dependency, and then implementing that decision. We have a few options for how to handle any particular dependency when running SFN in Spin:

        1. Run an instance of the dependency directly in the same Spin container.
        2. Mock the dependency.
        3. Use a shared running instance of the dependency, such as a staging or live test environment.

        Given all the dependencies that SFN has, this seems like a decent amount of work for a three-person team.

        But this is not the full extent of the problem—it’s just the foundation.

        Once we added configuration to make some dependency or some functional flow of SFN work in Spin, another commit would often be added to SFN that nullifies that effort. For example, after getting some particular flow functioning in a Spin environment, the implementation of that flow might be rewritten with new dependencies that are not yet configured to work in Spin.

        One apparent solution to this problem would be simply to pay more attention to what work is in flight in the SFN codebase and better prepare for upcoming changes.

        But here’s the problem: It’s not just one or two flows changing. Across SFN, the internal implementation of functionality is constantly being improved and refactored. With over 150 SFN engineers deploying to production over 30 times a day, the SFN codebase doesn’t sit still for long. On top of that, Spin itself is constantly changing. And all of SFN’s dependencies are changing. For any dependencies that were mocked, those mocks will become stale and need to be updated.

        The more we accomplished, the more functionality existed with the potential to stop working when something changes. And when one of those regressions occurred, we needed to interrupt the dependency we were working on solving in order to keep a previously solved flow functioning. The tension between making improvements and maintaining what you’ve already built is central to much of software engineering. Getting SFN working on Spin was just a particularly good example.

        The Human Forces on the System

        After recognizing the problem, we needed to step back and look at the forces acting on the system. What incentive structures and feedback loops are contributing to the situation?

        In the case of getting SFN working on Spin, changes were happening frequently and those changes were causing regressions. Some of those changes were within our control (e.g., a change goes into SFN that isn’t configured to work in Spin), and some are less so (e.g., Spin itself changing how it uses certain inputs).

        This led us to observe two powerful feedback loops that could be happening when SFN developers are working in Spin:

        Two feedback loops: Loop of Happy Equilibrium and Spiral of Struggle.
        Loop of Happy Equilibrium and Spiral of Struggle

        If it’s painful to use Spin for SFN development, it’s less likely that developers will use Spin the next time they have to validate their work. And if a change hasn’t been developed and tested using Spin, maybe something about that change breaks a particular testing flow, and that causes another SFN developer to become frustrated enough to stop using Spin. And this cycle continues until SFN is no longer usable in Spin.

        Alternatively, if it’s a great experience to use and validate work in Spin, developers will likely want to continue using the tool, which will catch any Spin-specific regressions before they make it into the main branch.

        As you can imagine, it’s very difficult to move from the Spiral of Struggle into the positive Loop of Happy Equilibrium. Our solution is to try our best to dampen the force acting on the negative spiral while simultaneously propping up the force of the positive feedback loop. 

        As the team focused on getting SFN working on Spin, our solution to this problem was to be very intentional about where we spent our efforts while asking the other SFN developers to endure a little pain and pitch in as we go through this transition. The SFN-on-Spin team narrowed its focus to just getting SFN to a basic level of functionality on Spin so that most developers could use it for the most common validation flows, and we prioritized fixing any bugs that disrupted those areas. This meant explicitly not working to get all SFN functionality running Spin, but just enough so that we could manage upkeep. And at the same time, we asked other SFN developers to use Spin for their daily work, even though it’s missing functionality they need or want. Where they feel frustrations or see gaps, we encouraged and supported them in adding the functionality they needed.

        Breaking the cycle

        Our hypothesis is that this is a temporary stage of transition to cloud development. If we’re successful, we’ll land in the Loop of Happy Equilibrium where regressions are caught before they’re merged, individuals add the missing functionality they need, and everyone ultimately has a fun time developing. They will feel confident about shipping their code.

        Our job seems to be all about code and making computers do what we say. But many of the real-life challenges we face when working on a codebase are not apparent from code or architecture diagrams. Instead they require us to reflect on the forces operating on the humans that are building that software. And once we have an idea of what those forces might be, we can brainstorm how to disrupt or encourage the feedback loops we’ve observed.

        Jen is a Staff Software Engineer at Shopify who's spent her career seeking out and building teams that challenge the status quo. In her free time, she loves getting outdoors and spending time with her chocolate lab.

        Learn More About Spin

        The Journey to Cloud Development: How Shopify Went All-In On Spin
        The Story Behind Shopify's Isospin Tooling
        Spin Infrastructure Adventures: Containers, Systemd, and CGroups

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Mastering React’s Stable Values

        Mastering React’s Stable Values

        The concept of stable value is a distinctly React term, and especially relevant since the introduction of Functional ComponentsIt refers to values (usually coming from a hook) that have the same value across multiple renders. And they’re immediately confusing. In this post, Colin Gray, Principal Developer at Shopify, walks through some cases where they really matter and how to make sense of them.

        Continue reading

        10 Tips for Building Resilient Payment Systems

        10 Tips for Building Resilient Payment Systems

        During the past five years I’ve worked on a lot of different parts of Shopify’s payment infrastructure and helped onboard dozens of developers in one way or another. Some people came from different programming languages, others used Ruby before but were new to the payments space. What was mostly consistent among new team members was little experience in building systems at Shopify’s scale—it was new for me too when I joined.

        It’s hard to learn something when you don’t know what you don’t know. As I learned things over the years—sometimes the hard way—I eventually found myself passing on these lessons to others. I distilled these topics into a presentation I gave to my team and boiled that down into this blog post. So, without further ado, here are my top 10 tips and tricks for building resilient payment systems.

        1. Lower Your Timeouts

        Ruby’s built-in Net::HTTP client has a default timeout of 60 seconds to open a connection to a server, 60 seconds to write data, and another 60 seconds to read a response. For online applications where a human being is waiting for something to happen, this is too long. At least there’s a default timeout in place. HTTP clients in other programming languages, like http.Client in Go and http.request in Node.JS don’t have a default timeout at all! This means an unresponsive server could tie up your resources indefinitely and increase your infrastructure bill unnecessarily.

        Timeouts can also be set in data stores. For example MySQL has the MAX_EXECUTION_TIME optimizer hint for setting a per-SELECT query timeout in milliseconds. Combined with other tools like pt-kill, we try to prevent bad queries from overwhelming the database.

        If there’s only a single thing you take away from this post, dear reader, it should be to investigate and set low timeouts everywhere you can. But what is the right timeout to set? you may wonder. That ultimately depends on your application’s unique situation and can be deduced with monitoring (more on that later), but I found that an open timeout of one second with a write and read or query timeout of five seconds is a decent starting point. Consider this waiting time from the perspective of the end user: would you like to wait for more than five seconds for a page to load successfully or show an error?

        2. Install Circuit Breakers

        Timeouts put an upper bound on how long we wait before giving up. But services that go down tend to stay down for a while, so if we see multiple timeouts in a short period of time, we can improve on this by not trying at all. Much like the circuit breaker you will find in your house or apartment, once the circuit is opened or tripped, nothing is let through.

        Shopify developed Semian to protect Net::HTTP, MySQL, Redis, and gRPC services with a circuit breaker in Ruby. By raising an exception instantly once we detect a service being down, we save resources by not waiting for another timeout we expect to happen. In some cases rescuing these exceptions allows you to provide a fallback. Building and Testing Resilient Ruby on Rails Applications describes how we design and unit tests such fallbacks using Toxiproxy.

        The Semian readme recommends to concatenate the host and port of an HTTP endpoint to create the identifier for the resource being protected. Worldwide payment processing typically uses a single endpoint, but often payment gateways use local acquirers behind the scenes to optimize authorization rates and lower costs. For Shopify Payments credit card transactions, we add the merchant's country code to the endpoint host and port to create a more fine-grained Semian identifier, so an open circuit due to an local outage in one country doesn’t affect transactions for merchants in other countries.

        Semian and other circuit breaker implementations aren’t a silver bullet that will solve all your resiliency problems by adding it to your application. It requires understanding the ways your application can fail and what falling back could look like. At scale a circuit breaker can still waste a lot of resources (and money) as well. The article Your Circuit Breaker is Misconfigured explains how to fine tune this pattern for maximum performance.

        3. Understand Capacity

        Understanding a bit of queueing theory goes a long way in being able to reason about how a system will behave under load. Slightly summarized, Little’s Law states that “the average number of customers in a system (over an interval) is equal to their average arrival rate, multiplied by their average time in the system.” The arrival rate is the amount of customers entering and leaving the system.

        Some might not realize it at first, but queues are everywhere: in grocery stores, traffic, factories, and as I recently rediscovered, at a festival in front of the toilets. Jokes aside, you find queues in online applications as well. A background job, a Kafka event, and a web request all are examples of units of work processed on queues. Put in a formula, Little’s Law is expressed as capacity = throughput * latency. This also means that throughput = capacity / latency. Or in more practical terms: if we have 50 requests arrive in our queue and it takes an average of 100 milliseconds to process a request, our throughput is 500 requests per second.

        With the relationship between queue size, throughput, and latency clarified, we can reason about what changing any of the variables implies. An N+1 query increases the latency of a request and lowers our throughput. If the amount of requests coming in exceeds our capacity, the requests queue grows and at some point a client is waiting so long for their request to be served that they time out. At some point you need to put a limit on the amount of work coming in—your application can’t out scale the world. Rate limiting and load shedding are two techniques for this.

        4. Add Monitoring and Alerting

        With our newfound understanding of queues, we now have a better idea of what kind of metrics we need to monitor to know our system is at risk of going down due to overload. Google’s site reliability engineering (SRE) book lists four golden signals a user-facing system should be monitored for:

        • Latency: the amount of time it takes to process a unit of work, broken down between success and failures. With circuit breakers failures can happen very fast and lead to misleading graphs.
        • Traffic: the rate in which new work comes into the system, typically expressed in requests per minute.
        • Errors: the rate of unexpected things happening. In payments, we distinguish between payment failures and errors. An example of a failure is a charge being declined due to insufficient funds, which isn’t unexpected at all. HTTP 500 response codes from our financial partners on the other hand are. However a sudden increase in failures might need further investigation.
        • Saturation: how much load the system is under, relative to its total capacity. This could be the amount of memory used versus available or a thread pool’s active threads versus total number of threads available, in any layer of the system.

        5. Implement Structured Logging

        Where metrics provide a high-level overview of how our system is behaving, logging allows us to understand what happened inside a single web request or background job. Out of the box Ruby on Rails logs are human-friendly but hard to parse for machines. This can work okay if you have only a single application server, but beyond that you’ll quickly want to store logs in a centralized place and make them easily searchable. This is done by using structured logging in a machine readable format, like key=value pairs or JSON, allows log aggregation systems to parse and index the data. 

        In distributed systems, it’s useful to pass along some sort of correlation identifier. A hypothetical example is when a buyer initiates a payment at checkout, a correlation_id is generated by our Rails controller. This identifier is passed along to a background job that makes the API call to the payment service that handles sensitive credit card data, which contains the correlation identifier in the API parameters and SQL query comments. Because these components of our checkout process all log the correlation_id, we can easily find all related logs when we need to debug this payment attempt.

        6. Use Idempotency Keys

        Distributed systems use unreliable networks, even if the networks look reliable most of the time. At Shopify’s scale, a once in a million chance of something unreliable occurring during payment processing means it’s happening many times a day. If this is a payments API call that timed out, we want to retry the request, but do so safely. Double charging a customer's card isn’t just annoying for the card holder, it also opens up the merchant for a potential chargeback if they don’t notice the double charge and refund it. A double refund isn’t good for the merchant's business either.

        In short, we want a payment or refund to happen exactly once despite the occasional hiccups that could lead to sending an API request more than once. Our centralized payment service can track attempts, which consists of at least one or more (retried) identical API requests, by sending an idempotency key that’s unique for each one. The idempotency key looks up the steps the attempt completed (such as creating a local database record of the transaction) and makes sure we send only a single request to our financial partners. If any of these steps fail and a retried request with the same idempotency key is received, recovery steps are run to recreate the same state before continuing. Building Resilient GraphQL APIs Using Idempotency describes how our idempotency mechanism works in more detail.

        An idempotency key needs to be unique for the time we want the request to be retryable, typically 24 hours or less. We prefer using an Universally Unique Lexicographically Sortable Identifier (ULID) for these idempotency keys instead of a random version 4 UUID. ULIDs contain a 48-bit timestamp followed by 80 bits of random data. The timestamp allows ULIDs to be sorted, which works much better with the b-tree data structure databases use for indexing. In one high-throughput system at Shopify we’ve seen a 50 percent decrease in INSERT statement duration by switching from UUIDv4 to ULID for idempotency keys.

        7. Be Consistent With Reconciliation

        With reconciliation we make sure that our records are consistent with those of our financial partners. We reconcile individual records such as charges or refunds, and aggregates such as the current balance not yet paid out to a merchant. Having accurate records isn’t just for display purposes, they’re also used as input for tax forms were required to generate for merchants in some jurisdictions.

        In case of a mismatch, we record the anomaly in our database. An example is the MismatchCaptureStatusAnomaly, expressing the status of a captured local charge wasn’t the same as the status as returned by our financial partners. Often we can automatically attempt to remediate the discrepancy and mark the anomaly as resolved. In cases where this isn’t possible, the developer team investigates anomalies and ships fixes as necessary.

        Even though we attempt automatic fixes where possible, we want to keep track of the mismatch so we know what our system did and how often. We should rely on anomalies to fix things as a last resort, preferring solutions that prevent anomalies from being created in the first place.

        8. Incorporate Load testing

        While Little’s Law is a useful theorem, practice is messier: the processing time for work isn’t uniformly distributed, making it impossible to achieve 100% saturation. In practice, queue size starts growing somewhere around the 70 to 80 percent mark, and if the time spent waiting in the queue exceeds the client timeout, from the client’s perspective our service is down. If the volume of incoming work is large enough, our servers can even run out of memory to store work on the queue and crash.

        There are various ways we can keep queue size under control. For example, we use scriptable load balancers to throttle the amount of checkouts happening at any given time. In order to provide a good user experience for buyers, if the amount of buyers wanting to check out exceeds our capacity, we place these buyers on a waiting queue (I told you they are everywhere!) before allowing them to pay for their order. Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers describes this system in more detail.

        We regularly test the limits and protection mechanisms of our systems by simulating large volume flash sales on specifically set up benchmark stores. Pummelling the Platform–Performance Testing Shopify describes our load testing tooling and philosophy. Specifically for load testing payments end-to-end, we have a bit of a problem: the test and staging environments of our financial partners don’t have the same capacity or latency distribution as production. To solve this, our benchmark stores are configured with a special benchmark gateway whose responses mimic these properties.

        9. Get on Top of Incident Management

        As mentioned at the start of this article, we know that failure can’t be completely avoided and is a situation that we need to prepare for. An incident usually starts when the oncall service owners get paged, either by an automatic alert based on monitoring or by hand if someone notices a problem. Once the problem is confirmed, we start the incident process with a command sent to our Slack bot spy

        The conversation moves to the assigned incident channel where we have three roles involved:

        • Incident Manager on Call (IMOC): responsible for coordinating the incident
        • Support Response Manager (SRM): responsible for public communication 
        • the service owner(s): who are working on restoring stability.

        The article Implementing ChatOps into our Incident Management Procedure goes into more detail about the process. Once the problem has been mitigated, the incident is stopped, and the Slack bot generates a Service Disruption in our services database application. The disruption contains an initial timeline of events, Slack messages marked as important, and a list of people involved.

        10. Organize Incident Retrospectives

        We aim to have an incident retrospective meeting within a week after the incident occurred. During this meeting:

        • we dig deep into what exactly happened
        • what incorrect assumptions we held about our systems
        • what we can do to prevent the same thing from happening again. 

        Once these things are understood, typically a few action items are assigned to implement safeguards to prevent the same thing from happening again.

        Retrospectives aren’t just good for trying to prevent problems, they’re also a valuable learning tool for new members of the team. At Shopify, the details of every incident are internally public for all employees to learn from. A well-documented incident can also be a training tool for newer members joining the team on call rotation, either as an archived document to refer to or by creating a disaster role playing scenario from it.

        Scratching the Surface

        I moved from my native Netherlands to Canada for this job in 2016, before Shopify became a Digital by Design company. During my work, I’m often reminded of this Dutch saying “trust arrives on foot, but leaves on horseback.” Merchants’ livelihoods are dependent on us if they pick Shopify Payments for accepting payments online or in-person, and we take that responsibility seriously. While failure isn’t completely avoidable, there are many concepts and techniques that we apply to minimize downtime, limit the scope of impact, and build applications that are resilient to failure.

        This top ten only scratches the tip of the iceberg, it was meant as an introduction to the kind of challenges the Shopify Payments team deals with after all. I usually recommend Release It! by Michael Nygard as a good resource for team members who want to learn more.

        Bart is a staff developer on the Shopify Payments team and has been working on the scalability, reliability, and security of Shopify’s payment processing infrastructure since 2016.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Data-Centric Machine Learning: Building Shopify Inbox’s Message Classification Model

        Data-Centric Machine Learning: Building Shopify Inbox’s Message Classification Model

        By Eric Fung and Diego Castañeda

        Shopify Inbox is a single business chat app that manages all Shopify merchants’ customer communications in one place, and turns chats into conversions. As we were building the product it was essential for us to understand how our merchants’ customers were using chat applications. Were they reaching out looking for product recommendations? Wondering if an item would ship to their destination? Or were they just saying hello? With this information we could help merchants prioritize responses that would convert into sales and guide our product team on what functionality to build next. However, with millions of unique messages exchanged in Shopify Inbox per month, this was going to be a challenging natural language processing (NLP) task. 

        Our team didn’t need to start from scratch, though: off-the-shelf NLP models are widely available to everyone. With this in mind, we decided to apply a newly popular machine learning process—the data-centric approach. We wanted to focus on fine-tuning these pre-trained models on our own data to yield the highest model accuracy, and deliver the best experience for our merchants.

        A merchant’s Shopify Inbox screen titled Customers that displays snippets of messages from customers that are labelled with things for easy identification like product details, checkout, and edit order.
        Message Classification in Shopify Inbox

        We’ll share our journey of building a message classification model for Shopify Inbox by applying the data-centric approach. From defining our classification taxonomy to carefully training our annotators on labeling, we dive into how a data-centric approach, coupled with a state-of-the-art pre-trained model, led to a very accurate prediction service we’re now running in production.

        Why a Data-Centric Approach?

        A traditional development model for machine learning begins with obtaining training data, then successively trying different model architectures to overcome any poor data points. This model-centric process is typically followed by researchers looking to advance the state-of-the-art, or by those who don't have the resources to clean a crowd-sourced dataset.

        By contrast, a data-centric approach focuses on iteratively making the training data better to reduce inconsistencies, thereby yielding better results for a range of models. Since anyone can download a well-performing, pre-trained model, getting a quality dataset is the key differentiator in being able to produce a high-quality system. At Shopify, we believe that better training data yields machine learning models that can serve our merchants better. If you’re interested in hearing more about the benefits of the data-centric approach, check out Andrew Ng’s talk on MLOps: From Model-centric to Data-centric.

        Our First Prototype

        Our first step was to build an internal prototype that we could ship quickly. Why? We wanted to build something that would enable us to understand what buyers were saying. It didn’t have to be perfect or complex, it just had to prove that we could deliver something with impact. We could iterate afterwards. 

        For our first prototype, we didn't want to spend a lot of time on the exploration, so we had to construct both the model and training data with limited resources. Our team chose a pre-trained model available on TensorFlow Hub called Universal Sentence Encoder. This model can output embeddings for whole sentences while taking into account the order of words. This is crucial for understanding meaning. For example, the two messages below use the same set of words, but they have very different sentiments:

        • “Love! More please. Don’t stop baking these cookies.”
        • “Please stop baking more cookies! Don’t love these.”

        To rapidly build our training dataset, we sought to identify groups of messages with similar meaning, using various dimensionality reduction and clustering techniques, including UMAP and HDBScan. After manually assigning topics to around 20 message clusters, we applied a semi-supervised technique. This approach takes a small amount of labeled data, combined with a larger amount of unlabeled data. We hand-labeled a few representative seed messages from each topic, and used them to find additional examples that were similar. For instance, given a seed message of “Can you help me order?”, we used the embeddings to help us find similar messages such as “How to order?” and “How can I get my orders?”. We then sampled from these to iteratively build the training data.

        A visualization using a scatter plot of message clusters during one of our explorations.
        A visualization of message clusters during one of our explorations.

        We used this dataset to train a simple predictive model containing an embedding layer, followed by two fully connected, dense layers. Our last layer contained the logits array for the number of classes to predict on.

        This model gave us some interesting insights. For example, we observed that a lot of chat messages are about the status of an order. This helped inform our decision to build an order status request as part of Shopify Inbox’s Instant Answers FAQ feature. However, our internal prototype had a lot of room for improvement. Overall, our model achieved a 70 percent accuracy rate and could only classify 35 percent of all messages with high confidence (what we call coverage). While our scrappy approach of using embeddings to label messages was fast, the labels weren’t always the ground truth for each message. Clearly, we had some work to do.

        We know that our merchants have busy lives and want to respond quickly to buyer messages, so we needed to increase the accuracy, coverage, and speed for version 2.0. Wanting to follow a data-centric approach, we focused on how we could improve our data to improve our performance. We made the decision to put additional effort into defining the training data by re-visiting the message labels, while also getting help to manually annotate more messages. We sought to do all of this in a more systematic way.

        Creating A New Taxonomy

        First, we dug deeper into the topics and message clusters used to train our prototype. We found several broad topics containing hundreds of examples that conflated distinct semantic meanings. For example, messages asking about shipping availability to various destinations (pre-purchase) were grouped in the same topic as those asking about what the status of an order was (post-purchase).

        Other topics had very few examples, while a large number of messages didn’t belong to any specific topic at all. It’s no wonder that a model trained on such a highly unbalanced dataset wasn’t able to achieve high accuracy or coverage.

        We needed a new labeling system that would be accurate and useful for our merchants. It also had to be unambiguous and easy to understand by annotators, so that labels would be applied consistently. A win-win for everybody!

        This got us thinking: who could help us with the taxonomy definition and the annotation process? Fortunately, we have a talented group of colleagues. We worked with our staff content designer and product researcher who have domain expertise in Shopify Inbox. We were also able to secure part-time help from a group of support advisors who deeply understand Shopify and our merchants (and by extension, their buyers).

        Over a period of two months, we got to work sifting through hundreds of messages and came up with a new taxonomy. We listed each new topic in a spreadsheet, along with a detailed description, cross-references, disambiguations, and sample messages. This document would serve as the source of truth for everyone in the project (data scientists, software engineers, and annotators).

        In parallel with the taxonomy work, we also looked at the latest pre-trained NLP models, with the aim of fine-tuning one of them for our needs. The Transformer family is one of the most popular, and we were already using that architecture in our product categorization model. We settled on DistilBERT, a model that promised a good balance between performance, resource usage, and accuracy. Some prototyping on a small dataset built from our nascent taxonomy was very promising: the model was already performing better than version 1.0, so we decided to double down on obtaining a high-quality, labeled dataset.

        Our final taxonomy contained more than 40 topics, grouped under five categories: 

        • Products
        • Pre-Purchase
        • Post-Purchase
        • Store
        • Miscellaneous

        We arrived at this hierarchy by thinking about how an annotator might approach classifying a message, viewed through the lens of a buyer. The first thing to determine is: where was the buyer on their purchase journey when the message was sent? Were they asking about a detail of the product, like its color or size? Or, was the buyer inquiring about payment methods? Or, maybe the product was broken, and they wanted a refund? Identifying the category helped to narrow down our topic list during the annotation process.

        Our in-house annotation tool displaying the message to classify at top of screen with a list possible topics grouped by category below
        Our in-house annotation tool displaying the message to classify, along with some of the possible topics, grouped by category

        Each category contains an other topic to group the messages that don’t have enough content to be clearly associated with a specific topic. We decided to not train the model with the examples classified as other because, by definition, they were messages we couldn’t classify ourselves in the proposed taxonomy. In production, these messages get classified by the model with low probabilities. By setting a probability threshold on every topic in the taxonomy, we could decide later on whether to ignore them or not.

        Since this taxonomy was pretty large, we wanted to make sure that everyone interpreted it consistently. We held several training sessions with our annotation team to describe our classification project and philosophy. We divided the annotators into two groups so they could annotate the same set of messages using our taxonomy. This exercise had a two-fold benefit:

        1. It gave annotators first-hand experience using our in-house annotation tool.
        2. It allowed us to measure inter-annotator agreement.

        This process was time-consuming as we needed to do several rounds of exercises. But, the training led us to refine the taxonomy itself by eliminating inconsistencies, clarifying descriptions, adding additional examples, and adding or removing topics. It also gave us reassurance that the annotators were aligned on the task of classifying messages.

        Let The Annotation Begin

        Once we and the annotators felt that they were ready, the group began to annotate messages. We set up a Slack channel for everyone to collaborate and work through tricky messages as they arose. This allowed everyone to see the thought process used to arrive at a classification.

        During the preprocessing of training data, we discarded single-character messages and messages consisting of only emojis. During the annotation phase, we excluded other kinds of noise from our training data. Annotators also flagged content that wasn’t actually a message typed by a buyer, such as when a buyer cut-and-pastes the body of an email they’ve received from a Shopify store confirming their purchase. As the old saying goes, garbage in, garbage out. Lastly, due to our current scope and resource constraints, we had to set aside non-English messages.

        Handling Sensitive Information

        You might be wondering how we dealt with personal information (PI) like emails or phone numbers. PI occasionally shows up in buyer messages and we took special care to ensure that it was handled appropriately. This was a complicated, and at times manual, process that involved many steps and tools.

        To avoid training our machine learning model on any messages containing PI, we couldn’t just ignore them. That would likely bias our model. Instead, we wanted to identify the messages with PI, then replace it with realistic, mock data. In this way, we would have examples of real messages that wouldn’t be identifiable to any real person.

        This anonymization process began with our annotators flagging messages containing PI. Next, we used an open-source library called Presidio to analyze and anonymize the PI. This tool ran in our data warehouse, keeping our merchants’ data within Shopify’s systems. Presidio is able to recognize many different types of PI, and the anonymizer provides different kinds of operators that can transform the instances of PI into something else. For example, you could completely remove it, mask part of it, or replace it with something else.

        In our case, we used another open-source tool called Faker to replace the PI. This library is customizable and localized, and its providers can generate realistic addresses, names, locations, URLs, and more. Here’s an example of its Python API:

        Combining Presidio and Faker allowed us to semi-automate the PI replacement, see below for a fabricated example:


        can i pickup today? i ordered this am: Sahar Singh my phone is 852 5555 1234. Email is


        can i pickup today? i ordered this am: Sahar Singh my phone is 090-722-7549. Email is


        If you’re a sharp eyed reader, you’ll notice (as we did), that our tools missed identifying a bit of fabricated PI in the above example (hint: the name). Despite Presidio using a variety of techniques (regular expressions, named entity recognition, and checksums), some PI slipped through the cracks. Names and addresses have a lot of variability and are hard to recognize reliably. This meant that we still needed to inspect the before and after output to identify whether any PI was still present. Any PI was manually replaced with a placeholder (for example, the name Sahar Singh was replaced with <PERSON>). Finally, we ran another script to replace the placeholders with Faker-generated data.

        A Little Help From The Trends

        Towards the end of our annotation project, we noticed a trend that persisted throughout the campaign: some topics in our taxonomy were overrepresented in the training data. It turns out that buyers ask a lot of questions about products!

        Our annotators had already gone through thousands of messages. We couldn’t afford to split up the topics with the most popular messages and re-classify them, but how could we ensure our model performed well on the minority classes? We needed to get more training examples from the underrepresented topics.

        Since we were continuously training a model on the labeled messages as they became available, we decided to use it to help us find additional messages. Using the model’s predictions, we excluded any messages classified with the overrepresented topics. The remaining examples belonged to the other topics, or were ones that the model was uncertain about. These messages were then manually labeled by our annotators.


        So, after all of this effort to create a high-quality, consistently labeled dataset, what was the outcome? How did it compare to our first prototype? Not bad at all. We achieved our goal of higher accuracy and coverage:


        Version 1.0 Prototype

        Version 2.0 in Production

        Size of training set



        Annotation strategy

        Based on embedding similarity

        Human labeled

        Taxonomy classes



        Model accuracy



        High confidence coverage




        Another key part of our success was working collaboratively with diverse subject matter experts. Bringing in our support advisors, staff content designer, and product researcher provided perspectives and expertise that we as data scientists couldn’t achieve alone.

        While we shipped something we’re proud of, our work isn’t done. This is a living project that will require continued development. As trends and sentiments change over time, the topics of conversations happening in Shopify Inbox will shift accordingly. We’ll need to keep our taxonomy and training data up-to-date and create new models to continue to keep our standards high.

        If you want to learn more about the data work behind Shopify Inbox, check out Building a Real-time Buyer Signal Data Pipeline for Shopify Inbox that details the real-time buyer signal data pipeline we built.

        Eric Fung is a senior data scientist on the Messaging team at Shopify. He loves baking and will always pick something he’s never tried at a restaurant. Follow Eric on Twitter.

        Diego Castañeda is a senior data scientist on the Core Optimize Data team at Shopify. Previously, he was part of the Messaging team and helped create machine learning powered features for Inbox. He loves computers, astronomy and soccer. Connect with Diego on LinkedIn.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Data Science & Engineering career page to find out about our open positions. Learn about how we’re hiring to design the future together—a future that is Digital by Design.

        Continue reading

        Spin Infrastructure Adventures: Containers, Systemd, and CGroups

        Spin Infrastructure Adventures: Containers, Systemd, and CGroups

        The Spin infrastructure team works hard at improving the stability of the system. In February 2022 we moved to Container Optimized OS (COS), the Google maintained operating system for their Kubernetes Engine SaaS offering. A month later we turned on multi-cluster to allow for increased scalability as more users came on board. Recently, we’ve increased default resources allotted to instances dramatically. However, with all these changes we’re still experiencing some issues, and for one of those, I wanted to dive a bit deeper in a post to share with you.

        Spin’s Basic Building Blocks

        First it's important to know the basic building blocks of Spin and how these systems interact. The Spin infrastructure is built on top of Kubernetes, using many of the same components that Shopify’s production applications use. Spin instances themselves are implemented via a custom resource controller that we install on the system during creation. Among other things, the controller transforms the Instance custom resource into a pod that’s booted from a special Isospin container image along with the configuration supplied during instance creation. Inside the container we utilize systemd as a process manager and workflow engine to initialize the environment, including installing dotfiles, pulling application source code, and running through bootstrap scripts. Systemd is vital because it enables a structured way to manage system initialization and this is used heavily by Spin.

        There’s definitely more to what makes Spin then what I’ve described, but from a high level and for the purposes of understanding the technical challenges ahead it's important to remember that:

        1. Spin is built on Kubernetes
        2. Instances are run in a container
        3. systemd is run INSIDE the container to manage the environment.

        First Encounter

        In February 2022, we had a serious problem with Pod relocations that we eventually tracked to node instability. We had several nodes in our Kubernetes clusters that would randomly fail and require either a reboot or to be replaced entirely. Google had decent automation for this that would catch nodes in a bad state and replace them automatically, but it was occurring often enough (five nodes per day or about one percent of all nodes) that users began to notice. Through various discussions with Shopify’s engineering infrastructure support team and Google Cloud support we eventually honed in on memory consumption being the primary issue. Specifically, nodes were running out of memory and pods were being out of memory (OOM) killed as a result. At first, this didn’t seem so suspicious, we gave users the ability to do whatever they want inside their containers and didn’t provide much resources to them (8 to 12 GB of RAM each), so it was a natural assumption that containers were, rightfully, just using too many resources. However, we found some extra information that made us think otherwise.

        First, the containers being OOM killed would occasionally be the only Spin instance on the node and when we looked at their memory usage, often it would be below the memory limit allotted to them.

        Second, in parallel to this another engineer was investigating an issue with respect to Kafka performance where he identified a healthy running instance using far more resources than should have been possible.

        The first issue would eventually be connected to a memory leak that the host node was experiencing, and through some trial and error we found that switching the host OS from Ubuntu to Container Optimized OS from Google solved it. The second issue remained a mystery. With the rollout of COS though, we saw 100 times reduction in OOM kills, which was sufficient for our goals and we began to direct our attention to other priorities.

        Second Encounter

        Fast forward a few months to May 2022. We were experiencing better stability which was a source of relief for the Spin team. Our ATC rotations weren't significantly less frantic, the infrastructure team had the chance to roll out important improvements including multi-cluster support and a whole new snapshotting process. Overall things felt much better.

        Slowly but surely over the course of a few weeks, we started to see increased reports of instance instability. We verified that the nodes weren’t leaking memory as before, so it wasn’t a regression. This is when several team members re-discovered the excess memory usage issue we’d seen before, but this time we decided to dive a little further.

        We needed a clean environment to do the analysis, so we set up a new spin instance on its own node. During our test, we monitored the Pod resource usage and the resource usage of the node it was running on. We used kubectl top pod and kubectl top node to do this. Before we performed any tests we saw

        Next, we needed to simulate memory load inside of the container. We opted to use a tool called stress, allowing us to start a process that consumes a specified amount of memory that we could use to exercise the system.

        We ran kubectl exec -it spin-muhc – bash to land inside of a shell in the container and then stress -m 1 --vm-bytes 10G --vm-hang 0 to start the test.

        Checking the resource usage again we saw

        This was great, exactly what we expected. The 10GB used by our stress test showed up in our metrics. Also, when we checked the cgroup assigned to the process we saw it was correctly assigned to the Kubernetes Pod:

        Where 24899 was the PID of the process started by stress.This looked great as well. Next, we performed the same test, but in the instance environment accessed via spin shell. Checking the resource usage we saw

        Now this was odd. Here we saw that the memory created by stress wasn’t showing up under the Pod stats (still only 14Mi), but it was showing up for the node (33504Mi). Checking the usage from in the container we saw that it was indeed holding onto memory as expected

        However, when we checked the cgroup this time, we saw something new:

        What the heck!? Why was the cgroup different? We double checked that this was the correct hierarchy by using the systemd cgroup list tool from within the spin instance: 

        So to summarize what we had seen: 

        1. When we run processes inside the container via kubectl exec, they’re correctly placed within the kubepods cgroup hierarchy. This is the hierarchy that contains the pods memory limits.
        2. When we run the same processes inside the container via spin shell, they’re placed within a cgroup hierarchy that doesn’t contain the limits. We verify this by checking the cgroup file directly:

        The value above is close to the maximum value of a 64 bit integer (about 8.5 Billion Gigabytes of memory). Needless to say, our system has less than that, so this is effectively unlimited.

        For practical purposes, this means any resource limitation we put on the Pod that runs Spin instances isn’t being honored. So Spin instances can use more memory than they’re allotted which is concerning for a few reasons, but probably most importantly, we depend on this to avoid instances from interfering with one another.

        Isolating It

        In a complex environment like Spin it’s hard to account for everything that might be affecting the system. Sometimes it’s best to distill problems down to the essential details to properly isolate the issue. We were able to reproduce the cgroup leak in a few different ways. First on the Spin instances directly using crictl or ctr and custom arguments with real Spin instances. Second, running on a local Docker environment . Setting up an experiment like this also allowed for much quicker iteration time when testing potential fixes.

        From the experiments we discovered differences between the different runtimes (containerd, Docker, and Podman) execution of systemd containers. Podman for instance has a --systemd flag that enables and disables an integration with the host systemd. containerd has a similar flag –runc-systemd-cgroup that starts runc with the systemd cgroup manager. For Docker, however, no such integration exists (you can modify the cgroup manager via daemon.json, but not via the CLI like Podman and Containerd) and we saw the same cgroup leakage. When comparing the cgroups assigned to the container processes between Docker and Podman, we saw the following




        Podman placed the systemd and stress processes in a cgroup unique to the container. This allowed Podman to properly delegate the resource limitations to both systemd and any process that systemd spawns. This was the behavior we were looking for!

        The Fix

        We now had an example of a systemd container properly being isolated from the host with Podman. The trouble was that in our Spin production environments we use Kubernetes that uses Containerd, not Podman, for the container runtime. So how could we leverage what we learned from Podman toward a solution?

        While investigating differences between Podman and Docker with respect to Systemd we came across the crux of the fix. By default Docker and containerd use a cgroup driver called cgroupfs to manage the allocation of resources while Podman uses the systemd driver (this is specific to our host operating system COS from Google). The systemd driver delegates responsibility of cgroup management to the host systemd which then properly manages the delegate systemd that’s running in the container. 

        It’s recommended for nodes running systemd on the host to use the systemd cgroup driver by default, however, COS from Google is still set to use cgroupfs. Checking the developer release notes, we see that in version 101 of COS there is a mention of switching the default cgroup driver to systemd, so the fix is coming!

        What’s Next

        Debugging this issue was an enlightening experience. If you had asked us before, Is it possible for a container to use more resources than its assigned?, we would have said no. But now that we understand more about how containers deliver the sandbox they provide, it’s become clear the answer should have been, It depends.

        Ultimately the escaped permissions were from us bind-mounting /sys/fs/cgroup read-only into the container. A subtle side effect of this, while this directory isn’t writable, all sub directories are. But since this is required of systemd to even boot up, we don’t have the option to remove it. There’s a lot of ongoing work by the container community to get systemd to exist peacefully within containers, but for now we’ll have to make do.


        Special thanks to Daniel Walsh from RedHat for writing so much on the topic. And Josh Heinrichs from the Spin team for investigating the issue and discovering the fix.

        Additional Information

        Chris is an infrastructure engineer with a focus on developer platforms. He’s also a member of the ServiceMeshCon program committee and a @Linkerd ambassador.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Shopify and Open Source: A Mutually Beneficial Relationship

        Shopify and Open Source: A Mutually Beneficial Relationship

        Shopify and Rails have grown up together. Both were in their infancy in 2004, and our CEO (Tobi) was one of the first contributors and a member of Rails Core. Shopify was built on top of Rails, and our engineering culture is rooted in the Rails Doctrine, from developer happiness to the omakase menu, sharp knives, and majestic monoliths. We embody the doctrine pillars. 

        Shopify's success is due, in part, to Ruby and Rails. We feel obligated to pay that success back to the community as best we can. But our commitment and investment are about more than just paying off that debt; we have a more meaningful and mutually beneficial goal.

        One Hundred Year Mission

        At Shopify, we often talk about aspiring to be a 100-year company–to still be around in 2122! That feels like an ambitious dream, but we make decisions and build our code so that it scales as we grow with that goal in mind. If we pull that off, will Ruby and Rails still be our tech stack? It's hard to answer, but it's part of my job to think about that tech stack over the next 100 years.

        Ruby and Rails as 100-year tools? What does that even mean?

        To get to 100 years, Rails has to be more than an easy way to get started on a new project. It's about cost-effective performance in production, well-formed opinions on the application architecture, easy upgrades, great editors, avoiding antipatterns, and choosing when you want the benefits of typing. 

        To get to 100 years, Ruby and Rails have to merit being the tool of choice, every day, for large teams and well-aged projects for a hundred years. They have to be the tool of choice for thousands of developers, across millions of lines of code, handling billions of web requests. That's the vision. That's Rails at scale.

        And that scale is where Shopify is investing.

        Why Companies Should Invest In Open Source

        Open source is the heart and soul of Rails: I’d say that Rails would be nowhere near what it is today if not for the open source community.

        Rafael França, Shopify Principal Engineer and Rails Core Team Member

        We invest in open source to build the most stable, resilient, performant version to grow our applications. How much better could it be if more people were contributing? As a community, we can do more. Ruby and Rails can only continue to be a choice for companies if we're actively investing in the development, and to do that; we need more companies involved in contributing.

        It Improves Engineering Skills

        Practice makes progress! Building open source software with cross-functional teams helps build better communication skills and offers opportunities to navigate feedback and criticism constructively. It also enables you to flex your debugging muscles and develop deep expertise in how the framework functions, which helps you build better, more stable applications for your company.

        It’s Essential to Application Health & Longevity

        Contributing to open source helps ensure that Rails benefits your application and the company in the long term. We contribute because we care about the changes and how they affect our applications. Investing upfront in the foundation is proactive, whereas rewrites and monkey patches are reactive and lead to brittle code that's hard to maintain and upgrade.

        At our scale, it's common to find issues with, or opportunities to enhance, the software we use. Why keep those improvements private? Because we build on open source software, it makes sense to contribute to those projects to ensure that they will be as great as possible for as long as possible. If we contribute to the community, it increases our influence on the software that our success is built on and helps improve our chances of becoming a 100-year company. This is why we make contributions to Ruby and Rails, and other open source projects. The commitment and investment are significant, but so are the benefits.

        How We're Investing in Ruby and Rails

        Shopify is built on a foundation of open source software, and we want to ensure that that foundation continues to thrive for years to come and that it continues to scale to meet our requirements. That foundation can’t succeed without investment and contribution from developers and companies. We don’t believe that open source development is “someone else’s problem”. We are committed to Ruby and Rails projects because the investment helps us future-proof our foundation and, therefore, Shopify. 

        We contribute to strategic projects and invest in initiatives that impact developer experience, performance, and security—not just for Shopify but for the greater community. Here are some projects we’re investing in:

        Improving Developer Tooling 

        • We’ve open-sourced projects like toxiproxy, bootsnap, packwerk, tapioca, paquito, and maintenance_tasks that are niche tools we found we needed. If we need them, other developers likely need them as well.
        • We helped add Rails support to Sorbet's gradual typing to make typing better for everyone.
        • We're working to make Ruby support in VScode best-in-class with pre-configured extensions and powerful features like refactorings.
        • We're working on automating upgrades between Ruby and Rails versions to reduce friction for developers.

        Increasing Performance

        Enhancing Security

        • We're actively contributing to bundler and rubygems to make Ruby's supply chain best-in-class.
        • We're partnering with Ruby Central to ensure the long-term success and security of through strategic investments in engineering, security-related projects, critical tools and libraries, and improving the cycle time for contributors.

        Meet Shopify Contributors

        The biggest investment you can make is to be directly involved in the future of the tools that your company relies on. We believe we are all responsible for the sustainability and quality of open source. Shopify engineers are encouraged to contribute to open source projects where possible. The commitment varies. Some engineers make occasional contributions, some are part-time maintainers of important open source libraries that we depend on, and some are full-time contributors to critical open source projects.

        Meet some of the Shopify engineers contributing to open source. Some of those faces are probably familiar because we have some well-known experts on the team. But some you might not know…yet. We're growing the next generation of Ruby and Rails experts to build for the future.

        Mike is a NYC-based engineering leader who's worked in a variety of domains, including energy management systems, bond pricing, high-performance computing, agile consulting, and cloud computing platforms. He is an active member of the Ruby open-source community, where as a maintainer of a few popular libraries he occasionally still gets to write software. Mike has spent the past decade growing inclusive engineering organizations and delivering amazing software products for Pivotal, VMware, and Shopify.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        The Story Behind Shopify’s Isospin Tooling

        The Story Behind Shopify’s Isospin Tooling

        You may have read that Shopify has built an in-house cloud development platform named Spin. In that post, we covered the history of the platform and how it powers our everyday work. In this post, we’ll take a deeper dive into one specific aspect of Spin: Isospin, Shopify’s systemd-based tooling that forms the core of how we run applications within Spin.

        The initial implementation of Spin used the time-honored POSS (Pile of Shell Scripts) design pattern. As we moved to a model where all of our applications ran in a single Linux VM, we were quickly outgrowing our tooling⁠—not to mention the added complexity of managing multiple applications within a single machine. Decisions such as what dependency services to run, in what part of the boot process, and how many copies to run became much more difficult as we ran many applications together within the same instance. Specifically, we needed a way to:

        • split up an application into its component parts
        • specify the dependencies between those parts
        • have those jobs be scheduled at the appropriate times
        • isolate services and processes from each other.

        At a certain point, stepping back, an obvious answer began to emerge. The needs we were describing weren’t merely solvable, they were already solved—by something we were already using. We were describing services, the same as any other services run by the OS. There were already tools to solve this built right into the OS. Why not leverage that?

        A Lightning Tour of systemd

        systemd’s service management works by dividing the system into a graph of units representing individual units or jobs. Each unit can specify its dependencies on other units, in granular detail, allowing systemd to determine an order to launch services in order to bring the system up and reason about cascading failure states. In addition to units representing actual services, it supports targets, which represent an abstract grouping of one or more units. Targets can have dependencies of their own and be depended on by other units, but perform no actual work. By specifying targets representing phases of the boot process and a top-level target representing the desired state of the system, systemd can quickly and comprehensively prepare services to run.

        systemd has several features which enable dynamic generation of units. Since we were injecting multiple apps into a system at runtime, with varying dependencies and processes, we made heavy use of these features to enable us to create complex systemd service graphs on the fly.

        The first of these is template unit files. Ordinarily, systemd namespaces units via their names; any service named foo will satisfy a dependency on the service named foo, and only one instance of a unit with a name can be running at once. This was obviously not ideal for us, since we have many services that we’re running per-application. Template unit files expand this distinction a bit by allowing a service to take a parameter that becomes part of its namespace. For example, a service named foo@.service could take the argument bar, running as foo@bar. This allows multiple copies of the same service to run simultaneously. The parameter is also available within the unit as a variable, allowing us to namespace runtime directories and other values with the same parameter.

        Template units were key to us since not only do they allow us to share service definitions for applications themselves, they allow us to run multiple copies of dependency services. In order to maintain full isolation between applications—and to simulate the separately-networked services they would be talking to in production—neighbor apps within a single Isospin VM don’t use the same installation of core services such as MySQL or Elasticsearch. Instead, we run one copy of these services for each app that needs it. Template units simplified this process greatly and via a single service definition, we simply run as many copies of each as we need.

        We also made use of generators, a systemd feature that allows dynamically creating units at runtime. This was useful for us since the dynamic state of our system meant that a fixed service order wasn’t really feasible. There were two primary features of Isospin’s setup that complicated things:

        1. Which app or apps to run in the system isn’t fixed, but rather is assigned when we boot a system. Thus, via information assigned at the time the system is booted, we need to choose which top-level services to enable.

        2. While many of the Spin-specific services are run for every app, dependencies on other services are dynamic. Not every app requires MySQL or Elasticsearch or so on. We needed a way to specify these systemd-level dependencies dynamically.

        Generators provided a simple way to run this. Early in the bootup process, we run a generator that creates a target named spin-app for each app to be run in the system. That target contains all of the top-level dependencies an app needs to run, and is then assigned as a dependency of the system is running target. Despite sounding complex, this requires no more than a 28-line bash script and a simple template file for the service. Likewise, we’re able to assign the appropriate dependency services as requirements of this spin-app target via another generator that runs later in the process.

        Booting Up an Example

        To help understand how this works in action, let’s walk through an example of the Isospin boot process.

        We start by creating a target named, which we use to represent Spin has finished booting. We’ll use this target later to determine whether or not the system has successfully finished starting up. We then run a generator named apps that checks the configuration to see which apps we’ve specified for the system. It then generates new dependencies on the spin-app@ target, requesting one instance per application and passing in the name of the application as its parameter.

        spin-app@ depends on several of the core services that represent a fully available Spin application, including several more generators. Via those dependencies, we run the spin-svcs@ generator to determine which system-level service dependencies to inject, such as MySQL or Elasticsearch. We also run the spin-procs@ generator that determines which command or commands to run the application itself and generates one service per command.

        Finally, we bring the app up via the spin-init@ service and its dependencies. spin-init@ represents the final state of bootstrapping necessary for the application to be ready to run, and via its recursive dependencies systemd builds out the chain of processes necessary to clone an application’s source, run bootstrap processes, and then run any necessary finalizing tasks before it’s ready to run.

        Additional Tools (and Problems)

        Although the previously described tooling got us very far, we found that we had a few additional problems that required some additional tooling to fix.

        A problem we encountered under this new model was port collision between services. In the past, our apps were able to assume they were the only app in a service, so they could claim a common port for themselves without conflict. Although systemd gave us a lot of process isolation for free, this was a hole we’d dug for ourselves and one we’d need to get out of by ourselves too. 

        The solution we settled on was simple but effective, and one that leveraged a few systemd features to simplify the process. We reasoned that port collision is only a problem because port selection was in the user’s hands. We could solve this by making port assignment the OS’s responsibility. We created a service that handles port assignment programmatically via a hashing process—by taking the service’s name into account we produce a semi-stable automated port assignment that avoids port collision with any other ports we’ve assigned on the system. This service can be used as a dependency of another service that needs to bind to a port and writes the generated port to an environment path that can be used as input to another service to inject environment variables. As long as we specify this as a dependency, we can ensure that the dependent service receives a PORT variable that it’s meant to respect and bind to.

        Another feature that came in handy is systemd’s concept of service readiness. Many process runners, including the Foreman-based solutions we’d been using in the past, have a binary concept of service readiness (either a process is running, or it isn’t) and if a process exits unexpectedly it’s considered failed

        systemd has the same model by default, but it also supports something more complex: it allows configuring a notify socket that allows an application to explicitly communicate its readiness. systemd exposes a Unix datagram socket to the service it’s running via the NOTIFY_SOCKET environment variable. When the underlying app has finished starting up and is ready, it can communicate that status via writing a message to the socket. This granularity helps avoid some of the rare but annoying gotchas with a more simple model of service readiness. It ensures that the service is only considered ready to accept connections when it's actually ready, avoiding a scenario in which external services try sending messages during the startup window. It also avoids a situation where the process remains running but the underlying service has failed during startup.

        Some of the external services we depend on use this, such as MySQL, but we also wrote our own tooling to incorporate it. Our notify-port script is a thin wrapper around web applications that monitors whether the service we’re wrapping has begun accepting HTTP connections over the port Isospin has assigned to it. By polling the port and notifying systemd when it comes up, we’ve been able to catch many real world bugs where services were waiting on the wrong port, and situations in which a server failed on startup while leaving the process alive.

        Isospin on Top

        Although we started out with some relatively simple goals, the more we worked with systemd the more we found ourselves able to leverage its tools to our advantage. By building Isospin on top of systemd, we found ourselves able to save time by reusing pre-existing structures that suited our needs and took advantage of sophisticated tooling for expressing service interdependency and service health. 

        Going forward, we plan to continue expanding on Isospin to express more complex service relationships. For example, we’re investigating the use of systemd service dependencies to allow teams to express that certain parts of their application relies on another team’s application being available.

        Misty De Méo is an engineer who specializes in developer tooling. She’s been a maintainer of the Homebrew package manager since 2011.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        How to Build Trust as a New Manager in a Fully Remote Team

        How to Build Trust as a New Manager in a Fully Remote Team

        I had been in the same engineering org for seven years before the pandemic hit. We were a highly collaborative and co-located company and the team were very used to brainstorming and working on physical whiteboards and workspaces. When it did, we moved pretty seamlessly from working together from our office to working from our homes. We didn’t pay too much attention to designing a Digital First culture and neither did we alter our ways of working dramatically. 

        It was only when I joined Shopify last September that I began to realize that working remotely in a company where you have already established trust is very different from starting in a fully remote space and building trust from the ground up. 

        What Is Different About Starting Remotely?

        So, what changes? The one word that comes to mind is intentionality. I would define intentionality as “the act of thoughtfully designing interactions or processes.” A lot of things that happen seamlessly and organically in a real-life setting takes more intentionality in a remote setting. If you deconstruct the process of building trust, you’ll find that in a physical setting trust is built in active ways (words you speak, your actions, and your expertise), but also in passive ways (your body language, demeanor, and casual water cooler talk). In a remote setting, it’s much more difficult to observe, but also build casual, non-transactional relationships with people unless you’re intentional about it. 

        Also, since you’re represented a lot more through your active voice, it’s important to work on setting up a new way of working and gaining mastery over the set of tools and skills that will help build trust and create success in your new environment.

        The 90-Day Plan

        The 90-Day Plan is named after the famous book The First 90 Days written by Michael D. Watkins. Essentially, it breaks down your onboarding journey into three steps:

        1. First 30 days: focus on your environment
        2. First 60 days: focus on your team
        3. 90 days and beyond: focus on yourself.

        First 30 Days: Focus on Your Environment

        Take the time out to think about what kind of workplace you want to create and also to reach out and understand the tone of the wider organization that you are part of.

        Study the Building 

        When you start a new job in a physical location, it’s common to study the office and understand the layout of, not only the building, but also  the company itself. When beginning work remotely, I suggest you start with a metaphorical study of the building. Try and understand the wider context of the organization and the people in it. You can do this with a mix of pairing sessions, one-on-ones, and peer group sessions. These processes help you in gaining technical and organizational context and also build relationships with peers. 

        Set Up the Right Tools

        In an office, many details of workplace setup are abstracted away from you. In a fully digital environment, you need to pay attention to set your workplace up for success. There are plenty of materials available on how to set up your home office (like on and Ensure that you take the time to set up your remote tools to your taste. 

        Build Relationships 

        If you’re remote, it’s easy to be transactional with people outside of your immediate organization. However, it’s much more fun and rewarding to take the time to build relationships with people from different backgrounds across the company. It gives you a wider context of what the company is doing and the different challenges and opportunities.

        First 60 Days: Focus on Your Team

        Use asynchronous communication for productivity and synchronous for connection.

        Establish Connection and Trust

        When you start leading a remote team, the first thing to do is establish connection and trust. You do this by spending a lot of your time in the initial days meeting your team members and understanding their aspirations and expectations. You should also, if possible, attempt to meet the team once in real life within a few months of starting. 

        Meet in Real Life

        Meeting in real life will help you form deep human relationships and understand them beyond the limited scope of workplace transactions. Once you’ve done this, ensure that you create a mix of synchronous and asynchronous processes within your time. Examples of asynchronous processes are automated dailies, code reviews, receiving feedback, and collaboration on technical design documents. We use synchronous meetings for team retros, coffee sessions, demos, and planning sessions. We try to maximize async productivity and being intentional about the times that you do come together as a team.

        Establishing Psychological Safety

        The important thing about leading a team remotely is to firmly establish a strong culture of psychological safety. Psychological safety in the workplace is important, not only for teams to feel engaged, but also for members to thrive. While it might be trickier to establish psychological safety remotely, its definitely possible. Some of the ways to do it:

        1. Default to open communication wherever possible.
        2. Engage people to speak about issues openly during retros and all-hands meetings.
        3. Be transparent about things that have not worked well for you. Setting this example will help people open themselves up to be vulnerable with their teams.

        First 90 Days - Focus on Yourself

        How do you manage and moderate your own emotions as you find your way in a new organization with this new way of working?

        FOMO Is Real

        Starting in a new place is nerve wracking. Starting while fully remote can be a lonely exercise. Working in a global company like Shopify means that you need to get used to the fact that work is always happening in some part of the globe. It’s easy to get overwhelmed and be always on. While FOMO can be very real, be aware of all the new information that you’re ingesting and take the time to reflect upon it.

        Design Your Workday

        Remote work means you’re not chained to your nine-to-five routine anymore. Reflect on the possibilities this offers you and think about how you want to design your workday. Maybe you want meeting free times to walk the dog, hit the gym, or take a power nap. Ensure you think about how to structure the day in a way that suits your life and plan your agenda accordingly.

        Try New Things

        It’s pretty intense in the first few months as you try ways to establish trust and build a strong team together. Not everything you try will take and not everything will work. The important thing is to be clear with what you’re setting out to achieve, collect feedback on what works and doesn’t, learn from the experience, and move forward.

        Being able to work in a remote work environment is both rewarding and fun. It’s definitely a new superpower that, if used well, leads to rich and absorbing experiences. The first 90 days are just the beginning of this journey. Sit back, tighten your seatbelt and get ready for a joyride of learning and growth.

        Sadhana is an engineer, book nerd, constant learner and enabler of people. Worked in the industry for more than 20 years in various roles and tech stacks. Agile Enthusiast. You can connect with Sadhana on LinkedIn and Medium.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Introducing ShopifyQL: Our New Commerce Data Querying Language 

        Introducing ShopifyQL: Our New Commerce Data Querying Language 

        At Shopify, we recognize the positive impact data-informed decisions have on the growth of a business. But we also recognize that data exploration is gated to those without a data science or coding background. To make it easier for our merchants to inform their decisions with data, we built an accessible, commerce-focused querying language. We call it ShopifyQL. ShopifyQL enables Shopify Plus merchants to explore their data with powerful features like easy to learn syntax, one-step data visualization, built-in period comparisons, and commerce-specific date functions. 

        I’ll discuss how ShopifyQL makes data exploration more accessible, then dive into the commerce-specific features we built into the language, and walk you through some query examples.

        Why We Built ShopifyQL

        As data scientists, engineers, and developers, we know that data is a key factor in business decisions across all industries. This is especially true for businesses that have achieved product market fit, where optimization decisions are more frequent. Now, commerce is a broad industry and the application of data is deeply personal to the context of an individual business, which is why we know it’s important that our merchants be able to explore their data in an accessible way.

        Standard dashboards offer a good solution for monitoring key metrics, while interactive reports with drill-down options allow deeper dives into understanding how those key metrics move. However, reports and dashboards help merchants understand what happened, but not why it happened. Often, merchants require custom data exploration to understand the why of a problem, or to investigate how different parts of the business were impacted by a set of decisions. For this, they turn to their data teams (if they have them) and the underlying data.

        Historically, our Shopify Plus merchants with data teams have employed a centralized approach in which data teams support multiple teams across the business. This strategy helps them maximize their data capability and consistently prioritizes data stakeholders in the business. Unfortunately, this leaves teams in constant competition for their data needs. Financial deep dives are prioritized over operational decision support. This leaves marketing, merchandising, fulfillment, inventory, and operations to fend for themselves. They’re then forced to either make decisions with the standard reports and dashboards available to them, or do their own custom data exploration (often in spreadsheets). Most often they end up in the worst case scenario: relying on their gut and leaving data out of the decision making process.

        Going past the reports and dashboards into the underlying datasets that drive them is guarded by complex data engineering concepts and languages like SQL. The basics of traditional data querying languages are easy to learn. However, applying querying languages to datasets requires experience with, and knowledge of, the entire data lifecycle (from data capture to data modeling). In some cases, simple commerce-specific data explorations like year-over-year sales require a more complicated query than the basic pattern of selecting data from some table with some filter. This isn’t a core competency of our average merchant. They get shut out from the data exploration process and the ability to inform their decisions with insights gleaned from custom data explorations. That’s why we built ShopifyQL.

        A Data Querying Language Built for Commerce

        We understand that merchants know their business the best and want to put the power of their data into their hands. Data-informed decision making is at the heart of every successful business, and with ShopifyQL we’re empowering Shopify Plus merchants to gain insights at every level of data analysis. 

        With our new data querying language, ShopifyQL, Shopify Plus merchants can easily query their online store data. ShopifyQL makes commerce data exploration accessible to non-technical users by simplifying traditional aspects of data querying like:

        • Building visualizations directly from the query, without having to manipulate data with additional tools.
        • Creating year-over-year analysis with one simple statement, instead of writing complicated SQL joins.
        • Referencing known commerce date ranges (For example, Black Friday), without having to remember the exact dates.
        • Accessing data specifically modeled for commerce exploration purposes, without having to connect the dots across different data sources. 

        Intuitive Syntax That Makes Data Exploration Easy

        The ShopifyQL syntax is designed to simplify the traditional complexities of data querying languages like SQL. The general syntax tree follows a familiar querying structure:

        FROM {table_name}
        SHOW|VISUALIZE {column1, column2,...} 
        TYPE {visualization_type}
        AS {alias1,alias2,...}
        BY {dimension|date}
        WHERE {condition}
        SINCE {date_offset}
        UNTIL {date_offset}
        ORDER BY {column} ASC|DESC
        COMPARE TO {date_offset}
        LIMIT {number}

        We kept some of the fundamentals of the traditional querying concepts because we believe these are the bedrock of any querying language:

        • FROM: choose the data table you want to query
        • SELECT: we changed the wording to SHOW because we believe that data needs to be seen to be understood. The behavior of the function remains the same: choose the fields you want to include in your query
        • GROUP BY: shortened to BY. Choose how you want to aggregate your metrics
        • WHERE: filter the query results
        • ORDER BY: customize the sorting of the query results
        • LIMIT: specify the number of rows returned by the query.

        On top of these foundations, we wanted to bring a commerce-centric view to querying data. Here’s what we are making available via Shopify today.

        1. Start with the context of the dataset before selecting dimensions or metrics

        We moved FROM to precede SHOW. It’s more intuitive for users to select the dataset they care about first and then the fields. When wanting to know conversion rates it's natural to think about product and then conversion rates, that's why we swapped the order of FROM and SHOW as compared to traditional querying languages.

        2. Visualize the results directly from the query

        Charts are one of the most effective ways of exploring data, and VISUALIZE aims to simplify this process. Most query languages and querying interfaces return data in tabular format and place the burden of visualizing that data on the end user. This means using multiple tools, manual steps, and copy pasting. The VISUALIZE keyword allows Shopify Plus merchants to display their data in a chart or graph visualization directly from a query. For example, if you’re looking to identify trends in multiple sales metrics for a particular product category:

        A screenshot showing the ShopifyQL code at the top of the screen and a line chart that uses VISUALIZE to chart monthly total and gross sales
        Using VISUALIZE to chart monthly total and gross sales

        We’ve made the querying process simpler by introducing smart defaults that allow you to get the same output with less lines of code. The query from above can also be written as:

        FROM sales
        VISUALIZE total_sales, gross_sales
        BY month
        WHERE product_category = ‘Shoes’
        SINCE -13m

        The query and the output relationship remains explicit, but the user is able to get to the result much faster.

        The following language features are currently being worked on, and will be available later this year:

        3. Period comparisons are native to the ShopifyQL experience

        Whether it’s year-over-year, month-over-month, or a custom date range, period comparison analyses are a staple in commerce analytics. With traditional querying languages, you either have to model a dataset to contain these comparisons as their own entries or write more complex queries that include window functions, common table expressions, or self joins. We’ve simplified that to a single statement. The COMPARE TO keyword allows ShopifyQL users to effortlessly perform period-over-period analysis. For instance, comparing this week’s sales data to last week:

        A screenshot showing the ShopifyQL code at the top of the screen and a line chart that uses VISUALIZE for comparing total sales between 2 time periods with COMPARE TO
        Comparing total sales between 2 time periods with COMPARE TO

        This powerful feature makes period-over-period exploration simpler and faster; no need to learn joins or window functions. Future development will enable multiple comparison periods for added functionality.

        4. Commerce specific date ranges simplify time period filtering

        Commerce-specific date ranges (for example Black Friday Cyber Monday, Christmas Holidays, or Easter) involve a manual lookup or a join to some holiday dataset. With ShopifyQL, we take care of the manual aspects of filtering for these date ranges and let the user focus on the analysis.

        The DURING statement, in conjunction with Shopify provided date ranges, allows ShopifyQL users to filter their query results by commerce-specific date ranges. For example, finding out what the top five selling products were during BFCM in 2021 versus 2020:

        A screenshot showing the ShopifyQL code at the top of the screen and a table that shows Product Title, Total Sales BFCM 2021, and Total Sales BFCM 2019
        Using DURING to simplify querying BFCM date ranges

        Future development will allow users to save their own date ranges unique to their business, giving them even more flexibility when exploring data for specific time periods.

        Check out our full list of current ShopifyQL features and language docs at

        Data Models That Simplify Commerce-Specific Analysis and Explorations

        ShopifyQL allows us to access data models that address commerce-specific use cases and abstract the complexities of data transformation. Traditionally, businesses trade off SQL query simplicity for functionality, which limits users’ ability to perform deep dives and explorations. Since they can’t customize the functionality of SQL, their only lever is data modeling. For example, if you want to make data exploration more accessible to business users via simple SQL, you have to either create one flat table that aggregates across all data sources, or a number of use case specific tables. While this approach is useful in answering simple business questions, users looking to dig deeper would have to write more complex queries to either join across multiple tables, leverage window functions and common table expressions, or use the raw data and SQL to create their own models. 

        Alongside ShopifyQL we’re building exploration data models that are able to answer questions across the entire spectrum of commerce: products, orders, and customers. Each model focuses on the necessary dimensions and metrics to enable data exploration associated with that domain. For example, our product exploration dataset allows users to explore all aspects of product sales such as conversion, returns, inventory, etc. The following characteristics allow us to keep these data model designs simple while maximizing the functionality of ShopifyQL:

        • Single flat tables aggregated to a lowest domain dimension grain and time attribute. There’ is no need for complicated joins, common table expressions, or window functions. Each table contains the necessary metrics that describe that domain’s interaction across the entire business, regardless of where the data is coming from (for example, product pageviews and inventory are product concerns from different business processes).
        • All metrics are fully additive across all dimensions. Users are able to leverage the ShopifyQL aggregation functions without worrying about which dimensions are conformed. This also makes table schemas relatable to spreadsheets, and easy to understand for business users with no experience in data modeling practices.
        • Datasets support overlapping use cases. Users can calculate key metrics like total sales in multiple exploration datasets, whether the focus is on products, orders, or customers. This allows users to reconcile their work and gives them confidence in the queries they write.

        Without the leverage of creating our own querying language, the characteristics above would require complex queries which would limit data exploration and analysis.

        ShopifyQL Is a Foundational Piece of Our Platform

        We built ShopifyQL for our Shopify Plus merchants, third-party developer partners, and ourselves as a way to serve merchant-facing commerce analytics. 

        Merchants can access ShopifyQL via our new first party app ShopifyQL Notebooks

        We used the ShopifyQL APIs to build an app that allows our Shopify Plus merchants to write ShopifyQL queries inside a traditional notebooks experience. The notebooks app gives users the ultimate freedom of exploring their data, performing deep dives, and creating comprehensive data stories. 

        ShopifyQL APIs enable our partners to easily develop analytics apps

        The Shopify platform allows third-party developers to build apps that enable merchants to fully customize their Shopify experience. We’ve built GraphQL endpoints for access to ShopifyQL and the underlying datasets. Developers can leverage these APIs to submit ShopifyQL queries and return the resulting data in the API response. This allows our developer partners to save time and resources by querying modeled data. For more information about our GraphQL API, check out our API documentation.

        ShopifyQL will power all analytical experiences on the Shopify platform

        We believe ShopifyQL can address all commerce analytics use cases. Our internal teams are going to leverage ShopifyQL to power the analytical experiences we create in the Shopify Admin—the online backend where merchants manage their stores. This helps us standardize our merchant-facing analytics interfaces across the business. Since we’re also the users of the language, we’re acutely aware of its gaps, and can make changes more quickly.

        Looking ahead

        We’re planning new language features designed to make querying with ShopifyQL even simpler and more powerful:

        • More visualizations: Line and bar charts are great but, we want to provide more visualization options that help users discover different insights. New visualizations on the roadmap include dual axis charts, funnels, annotations, scatter plots, and donut charts.
        • Pivoting: Pivoting data with a traditional SQL query is a complicated endeavor. We will simplify this with the capability to break down a metric by dimensional attributes in a columnar fashion. This will allow for charting trends of dimensional attributes across time for specific metrics with one simple query.
        • Aggregate conditions: Akin to a HAVING statement in SQL, we are building the capability for users to filter their queries on an aggregate condition. Unlike SQL, we’re going to allow for this pattern in the WHERE clause, removing the need for additional language syntax and keyword ordering complexity.

        As we continue to evolve ShopifyQL, our focus will remain on making commerce analytics more accessible to those looking to inform their decisions with data. We’ll continue to empower our developer partners to build comprehensive analytics apps, enable our merchants to make the most out of their data, and support our internal teams with powering their merchant-facing analytical use cases.

        Ranko is a product manager working on ShopifyQL and data products at Shopify. He's passionate about making data informed decisions more accessible to merchants.

        Are you passionate about solving data problems and eager to learn more, we’re always hiring! Reach out to us or apply on our careers page.

        Continue reading

        Making Open Source Safer for Everyone with Shopify’s Bug Bounty Program

        Making Open Source Safer for Everyone with Shopify’s Bug Bounty Program

        Zack Deveau, Senior Application Security Engineer at Shopify, shares the details behind a recent contribution to the Rails library, inspired by a bug bounty report we received. He'll go over the report and its root cause, how we fixed it in our system, and how we took it a step further to make Rails more secure by updating the default serializer for a few classes to use safe defaults.

        Continue reading

        How We Built Shopify Party

        How We Built Shopify Party

        Shopify Party is a browser-based internal tool that we built to make our virtual hangouts more fun. With Shopify’s move to remote, we wanted to explore how to give people a break from video fatigue and create a new space designed for social interaction. Here's how we built it.

        Continue reading

        8 Data Conferences Shopify Data Thinks You Should Attend

        8 Data Conferences Shopify Data Thinks You Should Attend

        Our mission at Shopify is to make commerce better for everyone. Doing this in the long term requires constant learning and development – which happen to be core values here at Shopify.

        Learning and development aren’t exclusively internal endeavors, they also depend on broadening your horizons and gaining insight from what others are doing in your field. One of our favorite formats for learning are conferences.

        Conferences are an excellent way to hear from peers about the latest applications, techniques, and use cases in data science. They’re also great for networking and getting involved in the larger data community.

        We asked our data scientists and engineers to curate a list of the top upcoming data conferences for 2022. Whether you’re looking for a virtual conference or in-person learning, we’ve got something that works for everyone.

        Hybrid Events

        Data + AI Summit 2022

        When: June 27-30
        Where: Virtual or in-person (San Francisco)

        The Data + AI Summit 2022 is a global event that provides access to some of the top experts in the data industry through keynotes, technical sessions, hands-on training, and networking. This four-day event explores various topics and technologies ranging from business intelligence, data analytics, and machine learning, to working with Presto, Looker, and Kedro. There’s great content on how to leverage open source technology (like Spark) to develop practical ways of dealing with large volumes of data. I definitely walked away with newfound knowledge.

        Ehsan K. Asl, Senior Data Engineer

        Transform 2022

        When: July 19-28
        Where: Virtual or in-person (San Francisco)

        Transform 2022 offers three concentrated events over an action-packed two weeks. The first event—The Data & AI Executive Summit—provides a leadership lens on real-world experiences and successes in applying data and AI. The following two events—The Data Week and The AI & Edge Week—dive deep into the most relevant topics in data science across industry tracks like retail, finance, and healthcare. Transform’s approach to showcasing various industry use cases is one of the reasons this is such a great conference. I find hearing how other industries are practically applying AI can help you find unique solutions to challenges in your own industry.

        Ella Hilal, VP of Data Science

        RecSys 2022

        When: September 18-23
        Where: Virtual & In-person (Seattle)

        RecSys is a conference dedicated to sharing the latest developments, techniques, and use cases in recommender systems. The conference has both a research track and industry track, allowing for different types of talks and perspectives from the field. The industry track is particularly interesting, since you get to hear about real-world recommender use cases and challenges from leading companies. Expect talks and workshops to be centered around applications of recommender systems in various settings (fashion, ecommerce, media, etc), reinforcement learning, evaluation and metrics, and bias and fairness.

        Chen Karako, Data Scientist Lead

        ODSC West

        When: November 1-3
        Where: Virtual or in-person (San Francisco)

        ODSC West is a great opportunity to connect with the larger data science community and contribute your ideas to the open source ecosystem. Attend to hear keynotes on topics like machine learning, MLOps, natural language processing, big data analytics, and new frontiers in research. On top of in-depth technical talks, there’s a pre-conference bootcamp on programming, mathematics or statistics, a career expo, and opportunities to connect with experts in the industry.

        Ryan Harter, Senior Staff Data Scientist

        In-person Events

        PyData London 2022

        When: June 17-19
        Where: London

        PyData is a community of data scientists and data engineers who use and develop a variety of open source data tools. The organization has a number of events around the world, but PyData London is one of its larger events. You can expect the first day to be tutorials providing walkthroughs on methods like data validation and training object detection with small datasets. The remaining two days are filled with talks on practical topics like solving real-world business problems with Bayesian modeling. Catch Shopify at this year’s PyData London event.

        Micayla Wood, Data Brand & Comms Senior Manager

        KDD 2022

        When: August 14-18
        Where: Washington D.C.

        KDD is a highly research-focused event and home to some of the data industry’s top innovations like personalized advertising and recommender systems. Keynote speakers range from academics to industry professionals, and policy leaders, and each talk is accompanied by well-written papers for easy reference. I find this is one of the best conferences to keep in touch with the industry trends and I’ve actually applied what I learned from attending KDD to the work that I do.

        Vincent Chio, Data Science Manager

        Re-Work Deep Learning Summit

        When: November 9-10
        Where: Toronto

        The Re-Work Deep Learning Summit focuses on showcasing the learnings from the latest advancements in AI and how businesses are applying them in the real world. With content focusing on areas like computer vision, pattern recognition, generative models, and neural networks, it was great to hear speakers not only share successes, but also the challenges that still lie in the application of AI. While I’m personally not a machine learning engineer, I still found the content approachable and interesting, especially the talks on how to set up practical ETL pipelines for machine learning applications.

        Ehsan K. Asl, Senior Data Engineer


        When: October 3-5
        Where: Budapest

        Crunch is a data science conference focused on sharing the industry’s top use cases for using data to scale a business. With talks and workshops around areas like the latest data science trends and tools, how to build effective data teams, and machine learning at scale, Crunch has great content and opportunities to meet other data scientists from around Europe. With their partner conferences (Impact and Amuse) you also have a chance to listen in to interesting and relevant product and UX talks.

        Yizhar (Izzy) Toren, Senior Data Scientist

        Rebekah Morgan is a Copy Editor on Shopify's Data Brand & Comms team.

        Are you passionate about data discovery and eager to learn more, we’re always hiring! Visit our Data Science and Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

        Continue reading

        Building a Form with Polaris

        Building a Form with Polaris

        As companies grow in size and complexity, there comes a moment in time when teams realize the need to standardize portions of their codebase. This most often becomes the impetus for creating a common set of processes, guidelines, and systems, as well as a standard set of reusable UI elements. Shopify’s Polaris design system was born from such a need. It includes design and content guidelines along with a rich set of React components for the UI that Shopify developers use to build the Shopify admin, and that third-party app developers can use to create apps on the Shopify App Store.

        Working with Polaris

        Shopify puts a great deal of effort into Polaris and it’s used quite extensively within the Shopify ecosystem, by both internal and external developers, to build UIs with a consistent look and feel. Polaris includes a set of React components used to implement functionality such as lists, tables, cards, avatars, and more.

        Form… Forms… Everywhere

        Polaris includes 16 separate components that not only encompass the form element itself, but also the standard set of form inputs such as checkbox, color or date picker, radio button, and text field to name just a few. Here are a few examples of basic form design in Shopify admin related to creating a Product or Collection.

        Add Product and Create Collection Forms
        Add Product and Create Collection Forms

        Additional forms you’ll encounter in the Shopify admin also include creating an Order or Gift Card.

        Create Order and Issue Gift Card Forms
        Create Order and Issue Gift Card Forms

        From these form examples, we see a clear UI design implemented using reusable Polaris components.

        What We’re Building

        In this tutorial, we’ll build a basic form for adding a product that includes Title and Description fields, a Save button, and a top navigation that allows the user to return to the previous page.

        Add Product Form
        Add Product Form

        Although our initial objective is to create the basic form design, this tutorial is also meant as an introduction to React, so we’ll also add the additional logic such as state, events, and event handlers to emulate a fully functional React form. If you’re new to React then this tutorial is a great introduction into many of the fundamental underlying concepts of this library.

        Starter CodeSandbox

        As this tutorial is a step by step guide, providing all the code snippets along the way, we highly encourage you to fork this Starter CodeSandbox and code along with us. “Knowledge” is in the understanding but the “Knowing” is only gained in its application.

        If you hit any roadblocks along the way, or just prefer to jump right into the solution code, heres the Solution CodeSandbox.

        Initial Code Examination

        The codebase we’re working with is a basic create-react-app. We’ll use the PolarisForm component to build our basic form using Polaris components and then add standard React state logic. The starter code also imports the @shopify/polaris library which can be seen in the dependencies section.

        Initial Setup in CodeSandbox
        Initial Setup

        One other thing to note about our component structure is that the component folders contain both an index.js file and a ComponentName.js file. This is a common pattern visible throughout the Polaris component library, such as the Avatar component in the example below.

        Polaris Avatar Component Folder
        Polaris Avatar Component Folder

        Let’s first open the PolarisForm component. We can see that its a bare bones component that outputs some initial text just so that we know everything is working as expected.

        Choosing Components

        Choosing our Polaris components will seem intuitive but, at times, also requires a deeper understanding of the Polaris library which the tutorial is meant to introduce along the way. Here’s a list of the components we’ll include in our design:


        The actual form


        To apply a bit of styling between the fields


        The text inputs for our form


        To provide the back arrow navigation and Save button


        To apply a bit of styling around the form

        Reviewing the Polaris Documentation

        When working with any new library, it’s always best to examine the documentation or as developers like to say, RTFM. With that in mind we’ll review the Polaris documentation along the way, but for now let’s start with the Form component.

        The short description of the Form component describes it as “a wrapper component that handles the submissions of forms.” Also, in order to make it easier to start working with Polaris, each component provides best practices, related components, and several use case examples along with a corresponding CodeSandbox. The docs provide explanations for all the additional props that can be passed to the component.

        Adding Our Form

        It’s time to dive in and build out the form based on our design. The first thing we need to do is import the components we’ll be working with into the PolarisForm.js file.

        import { Form, FormLayout, TextField, Page, Card } from "@shopify/polaris";

        Now let’s render the Form and TextField components. I’ve also gone ahead and included the following TextField props: label, type, and multiline.

        So it seems our very first introduction into our Polaris journey is the following error message:

        No i18n was provided. Your application must be wrapped in an <AppProvider> component. See for implementation instructions.

        Although the message also provides a link to the AppProvider component, I’d suggest we take a few minutes to read the Get Started section of the documentation. We see there’s a working example of rendering a Button component that’s clearly wrapped in an AppProvider.

        And if we take a look at the docs for the AppProvider component it states it's “a required component that enables sharing global settings throughout the hierarchy of your application.”

        As we’ll see later, Polaris creates several layers of shared context which are used to manage distinct portions of the global state. One important feature of the Shopify admin is that it supports up to 20 languages. The AppProvider is responsible for sharing those translations to all child components across the app.

        We can move past this error by importing the AppProvider and replacing the existing React Fragment (<>) in our App component.

        The MissingAppProviderError should now be a thing of the past and the form renders as follows:

        Rendering The Initial Form
        Rendering The Initial Form

        Examining HTML Elements

        One freedom developers allow themselves when developing code is to be curious. In this case, my curiosity drives me towards viewing what the HTML elements actually look like in developer tools once rendered.

        Some questions that come to mind are: “how are they being rendered” and “do they include any additional information that isn’t visible in the component.” With that in mind, let’s take a look at the HTML structure in developer tools and see for ourselves.

        Form Elements In Developer Tools
        Form Elements In Developer Tools

        At first glance it’s clear that Polaris is prefixed to all class and ID names. There are a few more elements to the HTML hierarchy as some of the elements are collapsed, but please feel free to pursue your own curiosity and continue to dig a bit deeper into the code.

        Working With React Developer Tools

        Another interesting place to look as we satisfy our curiosity is the Components tab in Dev Tools. If you don’t see it then take a minute to install the React Developer Tools Chrome Extension. I’ve highlighted the Form component so we can focus our attention there first and see all the component hierarchy of our form. Once again, we’ll see there’s more being rendered than what we imported and rendered into our component.

        Form Elements In Developer Tools
        Form Elements In Developer Tools

        We also can see that at the very top of the hierarchy, just below App, is the AppProvider component. Being able to see the entire component hierarchy provides some context into how many nested levels of context are being rendered. 

        Context is a much deeper topic in React and is meant to allow child components to consume data directly instead of prop drilling. 

        Form Layout Component

        Now that we’ve allowed ourselves the freedom to be curious, let’s refocus ourselves on implementing the form. One thing we might have noticed in the initial layout of the elements is that there’s no space between the Title input field and the Description label. This can be easily fixed by wrapping both TextField components in a FormLayout component. 

        If we take a look at the documentation, we see that the FormLayout component is “used to arrange fields within a form using standard spacing and that by default it stacks fields vertically but also supports horizontal groups of fields.”

        Since spacing is what we needed in our design, lets include the component.

        Once the UI updates, we see that it now includes the additional space needed to meet our design specs.

        Form With Spacing Added
        Form With Spacing Added

        With our input fields in place, let’s now move onto adding the back arrow navigation and the Save button to the top of our form design.

        The Page Component

        This is where Polaris steps outside the bounds of being intuitive and requires that we do a little digging into the documentation, or better yet the HTML. Since we know that we’re rebuilding the Add Product form, then perhaps we take a moment to once again explore our curiosity and take a look at the actual Form component in Shopify admin in the Chrome Dev Tools.

        Polaris Page Component As Displayed In HTML
        Polaris Page Component As Displayed In HTML

        If we highlight the back arrow in HTML it highlights several class names prefixed with Polaris-Page. It looks like we’ve found a reference to the component we need, so now it’s off to the documentation to see what we can find.

        Located under the Structure category in the documentation, there’s a component called Page. The short description for the Page component states that it’s “used to build the outer-wrapper of a page, including the page title and associated actions.” The assumption is that title is used for the Add Product text and the associated action includes the Save button.

        Let’s give the rest of the documentation a closer look to see what props implement that functionality. Taking a look at the working example, we can correlate it with the following props:


        Adds the back arrow navigation


        Adds the title to the right of the navigation


        Adds the button which will include a event listener


        With props in hand, let’s add the Page component along with its props and set their values accordingly.

        Polaris Page Component As Displayed In HTML
        Page Component With Props

        The Card Component

        We're getting so close to completing the UI design, but based on a side by side comparison, it still needs a bit of white space around the form elements.  Of course we could opt to add our own custom CSS, but Polaris provides us a component that achieves the same results.

        Comparing Our Designs
        Comparing Our Designs

        If we take a look at the Shopify admin Add Products form in Dev Tools, we see a reference to the Card component, and it appears to be using padding (green outline in the image below) to create the space.

        Card Component Padding
        Card Component Padding

        Let’s also take a look at the documentation to see what the Card component brings to the table. The short description for the Card component states that it is “used to group similar concepts and tasks together to make Shopify easier for merchants to scan, read and get things done.”  Although it makes no mention of creating space either via padding or margin, if we look at the example provided, we see that it contains a prop called sectioned and the docs state the prop it used to “auto wrap content in a section.” Feel free to toggle the True/False buttons to confirm this does indeed create the spacing we’re looking for.

        Let’s add the Card component and include the sectioned prop.

        It seems our form has finally taken shape and our UI design is complete.

        Final Design
        Final Design

        Add the Logic

        Although our form is nice to look at, it doesnt do very much at this point. Typing into the fields doesn’t capture any input and clicking the Save button does nothing. The only feature that appears to function is clicking on the back button.

        If youre new to React then this section introduces some of the fundamental elements of React such as state, events, and event handlers.

        In React, the Forms can be configured as Controlled or Uncontrolled. If you google either concept you’ll find many articles that describe the differences and use cases for either approach. In our example we will configure the form as a Controlled form. What that means is that we’ll capture every keystroke and re-render the input in its current state.

        Adding State and Handler Functions

        Since we will be capturing every keystroke in two separate input fields we’ll opt to instantiate two instances of state. The first thing we need to do is import the useState Hook.

        import { useState } from "react";

        Now we can instantiate two unique instances of state called title and description. Instantiating state requires that we create both a state value and a setState function. React is very particular about state and requires that any updates to the state value use the setState function.

        With our state instantiated, lets create our event handler functions that manages all updates to the state. Handler functions aren’t required, however, they represent a React best practice in that developers expect a handler function as part of the convention and many times additional logic needs to take place before state is updated.

        Since we have two state values to update, we create two separate handler functions for each one, but being that we’re also implementing a form, we also need an event handler to manage when the form is submitted.

        Adding Events 

        We’re almost there. Now that state and our event handler functions are in place, it’s time to add the events and assign them the corresponding functions. The two events that we add are: onChange and onSubmit.

        Let’s start with adding the onChange event to both of the TextField components. Not only will we need to add the event, but also, being that we are implementing a Controlled form, need to include the value prop and assign the value to its corresponding state value.

        Take a moment to confirm that the form is capturing input by typing into the fields. If it works then were good to go.

        The last event we’ll add is the onSubmit event. Our logic dictates that the form would only be submitted once the user clicks on the Save button, so that’s where we’ll add the event logic.

        If we take a look at the documentation for the Page component we see that it includes an onAction prop. Although the documentation doesn’t go any further than providing an example we assume that’s the prop we use to trigger the onSubmit function.

        Of course let’s confirm that everything is now tied together by clicking on the Save button. If everything worked we should see the following console log output:

        SyntheticBaseEvent {_reactName: "onClick", _targetInst: null, type: "click", nativeEvent: PointerEvent, target: HTMLSpanElement...}

        Clearing the Form

        The very last step in our form submission process is to clear the form fields so that the merchant has a clean slate if they choose to add another product.

        This tutorial was meant to introduce you to Polaris React components available in Shopify’s Polaris Design System. The library provides a robust set of components that have been meticulously designed by our UX teams and implemented by our Development teams. The Polaris github library is open source, so feel free to look around or set up the local development environment (which uses Storybook).

        Joe Keohan is an RnD Technical Facilitator responsible for onboarding our new hire engineering developers. He has been teaching and educating for the past 10 years and is passionate about all things tech. Feel free to reach out on LinkedIn and extend your network or to discuss engineering opportunities at Shopify! When he isn’t leading sessions, you’ll find Joe jogging, surfing, coding and spending quality time with his family.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        To Thread or Not to Thread: An In-Depth Look at Ruby’s Execution Models

        To Thread or Not to Thread: An In-Depth Look at Ruby’s Execution Models

        Deploying Ruby applications using threaded servers has become widely considered as standard practice in recent years. According to the 2022 Ruby on Rails community survey, in which over 2,600 members of the global Rails community responded to a series of questions regarding their experience using Rails, threaded web servers such as Puma are by far the most popular deployment target. Similarly when it comes to job processors, the thread-based Sidekiq seems to represent the majority of the deployments. 

        In this post, I’ll explore the mechanics and reasoning behind this practice and share knowledge and advice to help you make well-informed decisions on whether or not you should utilize threads in your applications (and to that point—how many). 

        Why Are Threads the Popular Default?

        While there are certainly many different factors for threaded servers' rise in popularity, their main selling point is that they increase an application’s throughput without increasing its memory usage too much. So to fully understand the trade-offs between threads and processes, it’s important to understand memory usage.

        Memory Usage of a Web Application

        Conceptually, the memory usage of a web application can be divided in two parts.

        Two separate text boxes stacked on top of each other, the top one containing the words "Static memory" and the bottom containing the words "Processing memory".
        Static memory and processing memory are the two key components of memory usage in a web application.

        The static memory is all the data you need to run your application. It consists of the Ruby VM itself, all the VM bytecode that was generated while loading the application, and probably some static Ruby objects such as I18n data, etc. This part is like a fixed cost, meaning whether your server runs 1 or 10 threads, that part will stay stable and can be considered read-only.

        The request processing memory is the amount of memory needed to process a request. There you'll find database query results, the output of rendered templates, etc. This memory is constantly being freed by the garbage collector and reused, and the amount needed is directly proportional to the amount of thread your application runs.

        Based on this simplified model, we express the memory usage of a web application as:

        processes * (static_memory + (threads * processing_memory))

        So if you have only 512MiB available, with an application using 200MiB of static memory and needing 150MiB of processing memory, using two single threaded processes requires 700MiB of memory, while using a single process with two threads will use only 500MiB and fit in a Heroku dyno.

        Two columns of text boxes next to each other. On the left, a column representing a single process shows a box with the text "Static Memory" at the top, and two boxes with the text "Thread #2 Processing Memory beneath it. In the column on the right which represents two single threaded processes, there are four boxes, which read: "Process 1 Static Memory", "Process #1 Processing Memory", "Process #2 Static Memory", and "Process #2 Processing memory" in order from top to bottom.
        A single process with two threads uses less memory than two single threaded processes.

        However this model, like most models, is a simplified depiction of reality. Let’s bring it closer to reality by adding another layer of complexity: Copy on Write (CoW).

        Enter Copy on Write

        CoW is a common resource management technique involving sharing resources rather than duplicating them until one of the users needs to alter it, at which point the copy actually happens. If the alteration never happens, then neither does the copy.

        In old UNIX systems of the ’70s or ’80s, forking a process involved copying its entire addressable memory over to the new process address space, effectively doubling the memory usage. But since the mid ’90s, that’s no longer true as, most, if not all, fork implementations are now sophisticated enough to trick the processes into thinking they have their own private memory regions, while in reality they’re sharing it with other processes.

        When the child process is forked, its pages tables are initialized to point to the parent’s memory pages. Later on, if either the parent or the child tries to write in one of these pages, the operating system is notified and will actually copy the page before it’s modified.

        This means that if neither the child nor the parent write in these shared pages after the fork happens, forked processes are essentially free.

        A flow chart with "Parent Process Static Memory" in a text box at the top. On the second row, there are two text boxes containing the text "Process 1 Processing Memory" and "Process 2 Processing Memory", connected to the top text box with a line to illustrate resource sharing by forking of the parent process.
        Copy on Write allows for sharing resources by forking the parent process.

        So in a perfect world, our memory usage formula would now be:

        static_memory + (processes * threads * processing_memory)

        Meaning that threads would have no advantage at all over processes.

        But of course we're not in a perfect world. Some shared pages will likely be written into at some point, the question is how many? To answer this, we’ll need to know how to accurately measure the memory usage of an application.

        Beware of Deceiving Memory Metrics

        Because of CoW and other memory sharing techniques, there are now many different ways to measure the memory usage of an application or process. Depending on the context, some metrics can be more or less relevant.

        Why RSS Isn’t the Metric You’re Looking For

        The memory metric that’s most often shown by various administration tools, such as ps, is Resident Set Size (RSS). While RSS has its uses, it's really misleading when dealing with forking servers. If you fork a 100MiB process and never write in any memory region, RSS will report both processes as using 100MiB. This is inaccurate because 100MiB is being shared between the two processes—the same memory is being reported twice.

        A slightly better metric is Proportional Set Size (PSS). In PSS, shared memory region sizes are divided by the number of processes sharing them. So our 100MiB process that was forked once should actually have a PSS of 50MiB. If you’re trying to figure out whether you’re nearing memory exhaustion, this is already a much more useful metric to look at because if you add up all the PSS numbers you get how much memory is actually being used—but we can go even deeper.

        On Linux, you can get a detailed breakdown of a process memory usage though cat /proc/$PID/smaps_rollup. Here’s what it looks like for a Unicorn worker on one of our apps in production:

        And for the parent process:

        Let’s unpack what each element here means. First, the Shared and Private fields. As its name suggests, Shared memory is the sum of memory regions that are in use by multiple processes. Whereas Private memory is allocated for a specific process and isn’t shared by other processes. In this example, we see that out of the 771,912 kB of addressable memory only 437,928 kB (56.7%) are really owned by the Unicorn worker, the rest is inherited from the parent process.

        As for Clean and Dirty, Clean memory is memory that has been allocated but never written to (things like the Ruby binary and various native shared libraries). Dirty memory is memory that has been written into by at least one process. It can be shared as long as it was only written into by the parent process before it forked its children.

        Measuring and Improving Copy on Write Efficiency

        We’ve established that shared memory is a key to maximizing efficiency of processes, so the important question here is how much of the static memory is actually shared. To approximate this, we compare the worker shared memory with the parent process RSS, which is 508,544 kB in this app, so:

        worker_shared_mem / master_rss
        >>(18288 + 315648) / 508544.0 * 100

        Here we see that about two-thirds of the static memory is shared:

        A flow chart depicting worker shared memory, with Private and Parent Process Shared Static Memory in text boxes at the top, connecting to two separate columns, each containing Private Static Memory and Processing Memory.
        By comparing the worker shared memory with the parent process RSS, we can see that two thirds of this app’s static memory is shared.

        If we were looking at RSS, we’d think each extra worker would cost ~750MiB, but in reality it’s closer to ~427MiB, when an extra thread would cost ~257MiB. That’s still noticeably more, but far less than what the initial naive model would have predicted.

        There’s a number of ways an application owner can improve CoW efficiency with the general idea being to load as many things as possible as part of the boot process before the server forks. This topic is very broad and could be a whole post by itself, but here are a few quick pointers.

        The first thing to do is configure the server to fully load the application. Unicorn, Puma, and Sidekiq Enterprise all have a preload_app option for that purpose. Once that’s done, a common pattern that degrades CoW performance is memoized class variables, for example:

        Such delayed evaluation both prevents that memory from being shared and causes a slowdown for the first request to call this method. The simple solution is to instead use a constant, but when it’s not possible, the next best thing is to leverage the Rails eager_load_namespaces feature, as shown here:

        Now, locating these lazy loaded constants is the tricky part. Ruby heap-profiler is a useful tool for this. You can use it to dump the entire heap right after fork, and then after processing a few requests, see how much the process has grown and where these extra objects were allocated.

        The Case for Process-Based Servers

        So, while there are increased memory costs involved in using process-based servers, using more accurate memory metrics and optimizations like CoW to share memory between processes can alleviate some of this. But why use process-based servers such as Unicorn or Resque at all, given the increased memory cost? There are actually advantages to process-based servers that shouldn’t be overlooked, so let’s go through those. 

        Clean Timeout Mechanism

        When running large applications, you may run into bugs that cause some requests to take much longer than desirable. There could be many reasons for that—they might be specifically crafted by a malicious actor to try to DOS your service, or they might be processing an unexpectedly large amount of data. When this occurs, being able to cleanly interrupt this request is paramount for resiliency. Process-based servers can kill the worker process and fork a fresh one to replace it, ensuring the request is cleanly interrupted.

        Threads, however, can’t be interrupted cleanly. Since they directly share mutable resources with other threads, if you attempt to kill a single thread, you may leave some resources such as mutexes or database connections in an unrecoverable state, causing the other threads to run into various unrecoverable errors. 

        The Black Box of Global VM Lock Latency

        Improved latency is another major advantage of processes over threads in Ruby (and other languages with similar constraints such as Python). A typical web application process will do two types of work: CPU and I/O. So two Ruby process might look like this:

        Two rows of text boxes, containing separate boxes with the text "IO", "CPU", and "GC", representing the work of processes in a Ruby web application.
        CPU and IOs in two processes in a Ruby application.

        But in a Ruby process, because of the infamous Global VM Lock (GVL) , only one thread at a time can execute Ruby code, and when the garbage collector (GC) triggers, all threads are paused. So if we were to use two threads, the picture may instead look like this:

        Two rows of text boxes, with the individual boxes containing the text "CPU", "IO", "GVL wait" and GC", representing the work of threads in a Ruby web application and the latency introduced by the GVL.
        Global VM Lock (GVL) increases latency in Ruby threads.

        So every time two threads need to execute Ruby code at the same time, the service latency increases. How much this happens varies considerably from one application to another and even from one request to another. If you think about it, to fully saturate a process with N threads an application only needs to spend less than 1 / N of its time waiting on I/O. So 50 percent I/O for two threads, 75 percent I/O for four threads, etc. And that’s only the saturation limit, given that a request’s use of I/O and CPU is very much unpredictable, an application doing 75 percent I/O with two threads will still frequently wait on the GVL.

        The common wisdom in the Ruby community is that Ruby applications are relatively I/O heavy, but from my experience it’s not quite true, especially once you consider that GC pauses do acquire the GVL too, and Ruby applications tend to spend quite a lot of time in GC.

        Web applications are often specifically crafted to avoid long I/O operations in the web request cycle. Any potentially slow or unreliable I/O operation like calling a third-party API or sending an email notification is generally deferred to a background job queue, so the remaining I/O in web requests are mostly reasonably fast database and cache queries. A corollary is that the job processing side of applications tends to be much more I/O intensive than the web side. So job processors like Sidekiq can more frequently benefit from a higher thread count. But even for web servers, using threads can be seen as a perfectly acceptable tradeoff between throughput per dollar and latency. 

        The main problem is that as of today there isn’t really a good way to measure how much the service latency is impacted by the GVL, so service owners are left in the dark. Since Ruby doesn’t provide any way to instrument the GVL, all we’re left with are proxy metrics, like gradually increasing or decreasing the number of threads and measuring the impact on the latency metrics, but that’s far from enough.

        That’s why I recently put together a feature request and a proof of concept implementation for Ruby 3.2 to provide a GVL instrumentation API . It's a really low-level and hard to use API, but if it’s accepted I plan to publish a gem to expose simple metrics to know exactly how much time is spent waiting for the GVL, and I hope application performance monitoring services include it.

        Ractors and Fibers—Not a Silver Bullet Solution

        In the last few years, the Ruby community has been experimenting heavily with other concurrency constructs to potentially replace threads, known as Ractors and Fibers.

        Ractors can execute Ruby code in parallel, rather than having one single GVL, each Ractor has its own lock, so they theoretically could be game changing. However Ractors can’t share any global mutable state, so even sharing a database connection pool or a logger between Ractors isn’t possible. That’s a major architectural challenge that would require most libraries to be heavily refactored, and the result would likely not be as usable. I hope to be proven wrong, but I don’t expect Ractors to be used as units of execution for sizable web applications any time soon. 

        As for Fibers, they’re essentially lighter threads that are cooperatively scheduled. So everything said in the previous sections about threads and the GVL applies to them as well. They’re very well suited for I/O intensive applications that mostly just move bytes streams around and don’t spend much time executing code, but any application that doesn’t benefit from more than a handful of threads won’t benefit from using fibers.

        YJIT May Change the Status Quo

        While it’s not yet the case, the advent of YJIT may significantly increase the need to run threaded servers in the future. Since just-in-time (JIT) compilers speedup code execution at the expense of unshareable memory usage, JITing Ruby will decrease CoW performance, but will also make applications proportionally more I/O intensive.

        Right now, YJIT only offers modest speed gains, but if in the future it manages to provide even a two times speedup, it would certainly allow application owners to ramp up their number of web threads by as much to compensate for the increased memory cost.

        Tips to Remember

        Ultimately choosing between process versus thread-based servers involves many trade-offs, so it’s unreasonable to recommend either without first looking at an application’s metrics.

        But in the abstract, here are a few quick takeaways to keep in mind: 

        • Always enable application preloading to benefit from CoW as much as possible. 
        • Unless your application fits on the smallest offering or your hosting provider, use a smaller number of larger containers instead of a bigger number of smaller containers. For instance a single 4CPU 2GiB of RAM box is more efficient than 4 boxes with 1CPU 512MiB of RAM each.
        • If latency is more important to you than keeping costs low, or if you have enough free memory for it, use Unicorn to benefit from the reliable request timeout. 
          • Note: Unicorn must be protected from slow client attacks by a reverse proxy that buffer requests. If that’s a problem, Puma can be configured to run with a single thread per worker.
        • If using threads, start with only two threads unless you’re confident your application is indeed spending more than half its time waiting on I/O operations. This doesn’t apply to job processors since they tend to be much more I/O intensive and are much less latency sensitive, so they can easily benefit from higher thread counts. 

        Looking Ahead: Future Improvements to the Ruby Ecosystem

        We’re exploring a number of avenues to improve the situation for both process and thread-based servers.

        First, there’s the GVL instrumentation API mentioned previously that should hopefully allow application owners to make more informed trade-offs between throughput and latency. We could even try to use it to automatically apply backpressure by dynamically adjusting concurrency when GVL contention is over some threshold.

        Additionally, threaded web servers could theoretically implement a reliable request timeout mechanism. When a request takes longer than expected, they could stop forwarding requests to the impacted worker and wait for all other requests to either complete or timeout before killing the worker and reforking it. That’s something Matthew Draper explored a few years ago and that seems doable.

        Then, the CoW performance of Ruby itself could likely be improved further. Several patches have been merged for this purpose over the years, but we can probably do more. Notably we suspect that Ruby’s inline caches cause most of the VM bytecode to be unshared once it’s executed. I think we could also take some inspiration from what the Instagram engineering team did to improve Python’s CoW performance . For instance they introduced a gc.freeze() method that instructs the GC that all existing memory regions will become shared. Python uses this information to make more intelligent decisions around memory usage, like not using any free slots in these shared regions since it’s more efficient to allocate a new page than to dirty an old one.

        Jean Boussier is a Rails Core team member, Ruby committer, and Senior Staff Engineer on Shopify's Ruby and Rails infrastructure team. You can find him on GitHub as @byroot or on Twitter at @_byroot.

        If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

        Continue reading

        Implementing Equality in Ruby

        Implementing Equality in Ruby

        Ruby is one of the few programming languages that get equality right. I often play around with other languages, but keep coming back to Ruby. This is largely because Ruby’s implementation of equality is so nice.

        Nonetheless, equality in Ruby isn't straightforward. There is #==, #eql?, #equal?, #===, and more. Even if you’re familiar with how to use them, implementing them can be a whole other story.

        Let's walk through all forms of equality in Ruby and how to implement them.

        Why Properly Implementing Equality Matters

        We check whether objects are equal all the time. Sometimes we do this explicitly, sometimes implicitly. Here are some examples:

        • Do these two Employees work in the same Team? Or, in code: ==
        • Is the given DiscountCode valid for this particular Product? Or, in code: product.discount_codes.include?(given_discount_code).
        • Who are the (distinct) managers for this given group of employees? Or, in code:

        A good implementation of equality is predictable; it aligns with our understanding of equality.

        An incorrect implementation of equality, on the other hand, conflicts with what we commonly assume to be true. Here is an example of what happens with such an incorrect implementation:

        The geb and geb_also objects should definitely be equal. The fact that the code says they’re not is bound to cause bugs down the line. Luckily, we can implement equality ourselves and avoid this class of bugs.

        No one-size-fits-all solution exists for an equality implementation. However, there are two kinds of objects where we do have a general pattern for implementing equality: entities and value objects. These two terms come from domain-driven design (DDD), but they’re relevant even if you’re not using DDD. Let’s take a closer look.


        Entities are objects that have an explicit identity attribute. Often, entities are stored in some database and have a unique id attribute corresponding to a unique id table column. The following Employee example class is such an entity:

        Two entities are equal when their IDs are equal. All other attributes are ignored. After all, an employee’s name might change, but that does not change their identity. Imagine getting married, changing your name, and not getting paid anymore because HR has no clue who you are anymore!

        ActiveRecord, the ORM that is part of Ruby on Rails, calls entities "models" instead, but they’re the same concept. These model objects automatically have an ID. In fact, ActiveRecord models already implement equality correctly out of the box!

        Value Objects

        Value objects are objects without an explicit identity. Instead, their value as a whole constitutes identity. Consider this Point class:

        Two Points will be equal if their x and y values are equal. The x and y values constitute the identity of the point.

        In Ruby, the basic value object types are numbers (both integers and floating-point numbers), characters, booleans, and nil. For these basic types, equality works out of the box:

        Arrays of value objects are in themselves also value objects. Equality for arrays of value objects works out of the box—for example, [17, true] == [17, true]. This might seem obvious, but this isn’t true in all programming languages.

        Other examples of value objects are timestamps, date ranges, time intervals, colors, 3D coordinates, and money objects. These are built from other value objects; for example, a money object consists of a fixed-decimal number and a currency code string.

        Basic Equality (Double Equals)

        Ruby has the == and != operators for checking whether two objects are equal or not:

        Ruby’s built-in types all have a sensible implementation of ==. Some frameworks and libraries provide custom types, which will have a sensible implementation of ==, too. Here is an example with ActiveRecord:

        For custom classes, the == operator returns true if and only if the two objects are the same instance. Ruby does this by checking whether the internal object IDs are equal. These internal object IDs are accessible using #__id__. Effectively, gizmo == thing is the same as gizmo.__id__ == thing.__id__.

        This behavior is often not a good default, however. To illustrate this, consider the Point class from earlier:

        The == operator will return true only when calling it on itself:

        This default behavior is often undesirable in custom classes. After all, two points are equal if (and only if) their x and y values are equal. This behavior is undesirable for value objects (such as Point) and entities (such as the Employee class mentioned earlier).

        The desired behavior for value objects and entities is as follows:

        Image showing the desired behavior for value objects and entities. The first pairing for value objects checks if x and y (all attributes) are equal. The second pair for entities, checks whether the id attributes are equal. The third pair shows the default ruby check, which is whether internal object ids are equal
        • For value objects (a), we’d like to check whether all attributes are equal.
        • For entities (b), we’d like to check whether the explicit ID attributes are equal.
        • By default (c), Ruby checks whether the internal object IDs are equal.

        Instances of Point are value objects. With the above in mind, a good implementation of == for Point would look as follows:

        This implementation checks all attributes and the class of both objects. By checking the class, checking equality of a Point instance and something of a different class return false rather than raise an exception.

        Checking equality on Point objects now works as intended:

        The != operator works too:

        A correct implementation of equality has three properties: reflexivity, symmetry, and transitivity.

        Image with simple circles to describe the implementation of equality having three properties: reflexivity, symmetry, and transitivity, described below the image for more context
        • Reflexivity (a): An object is equal to itself: a == a
        • Symmetry (b): If a == b, then b == a
        • Transitivity (c): If a == b and b == c, then a == c

        These properties embody a common understanding of what equality means. Ruby won’t check these properties for you, so you’ll have to be vigilant to ensure you don’t break these properties when implementing equality yourself.

        IEEE 754 and violations of reflexivity

        It seems natural that something would be equal to itself, but there is an exception. IEEE 754 defines NaN (Not a Number) as a value resulting from an undefined floating-point operation, such as dividing 0 by 0. NaN, by definition, is not equal to itself. You can see this for yourself:

        This means that == in Ruby is not universally reflexive. Luckily, exceptions to reflexivity are exceedingly rare; this is the only exception I am aware of.

        Basic Equality for Value Objects

        The Point class is an example of a value object. The identity of a value object, and thereby equality, is based on all its attributes. That is exactly what the earlier example does:

        Basic Equality for Entities

        Entities are objects with an explicit identity attribute, commonly @id. Unlike value objects, an entity is equal to another entity if and only if their explicit identities are equal.

        Entities are uniquely identifiable objects. Typically, any database record with an id column corresponds to an entity. Consider the following Employee entity class:

        Other forms of ID are possible too. For example, books have an ISBN, and recordings have an ISRC. But if you have a library with multiple copies of the same book, then ISBN won’t uniquely identify your books anymore.

        For entities, the == operator is more involved to implement than for value objects:

        This code does the following:

        • The super call invokes the default implementation of equality: Object#==. On Object, the #== method returns true if and only if the two objects are the same instance. This super call, therefore, ensures that the reflexivity property always holds.
        • As with Point, the implementation Employee#== checks class. This way, an Employee instance can be checked for equality against objects of other classes, and this will always return false.
        • If @id is nil, the entity is considered not equal to any other entity. This is useful for newly-created entities which have not been persisted yet.
        • Lastly, this implementation checks whether the ID is the same as the ID of the other entity. If so, the two entities are equal.

        Checking equality on entities now works as intended:

        Blog post of Theseus

        Implementing equality on entity objects isn’t always straightforward. An object might have an id attribute that doesn’t quite align with the object’s conceptual identity.

        Take a BlogPost class, for example, with id, title, and body attributes. Imagine creating a BlogPost, then halfway through writing the body for it, scratching everything and starting over with a new title and a new body. The id of that BlogPost will still be the same, but is it still the same blog post?

        If I follow a Twitter account that later gets hacked and turned into a cryptocurrency spambot, is it still the same Twitter account?

        These questions don’t have a proper answer. That’s not surprising, as this is essentially the Ship of Theseus thought experiment. Luckily, in the world of computers, the generally accepted answer seems to be yes: if two entities have the same id, then the entities are equal as well.

        Basic Equality with Type Coercion

        Typically, an object is not equal to an object of a different class. However, this isn’t always the case. Consider integers and floating-point numbers:

        Here, float_two is an instance of Float, and integer_two is an instance of Integer. They are equal: float_two == integer_two is true, despite different classes. Instances of Integer and Float are interchangeable when it comes to equality.

        As a second example, consider this Path class:

        This Path class provides an API for creating paths:

        The Path class is a value object, and implementing #== could be done just as with other value objects:

        However, the Path class is special because it represents a value that could be considered a string. The == operator will return false when checking equality with anything that isn’t a Path:

        It can be beneficial for path == "/usr/bin/ruby" to be true rather than false. To make this happen, the == operator needs to be implemented differently:

        This implementation of == coerces both objects to Strings, and then checks whether they are equal. Checking equality of a Path now works:

        This class implements #to_str, rather than #to_s. These methods both return strings, but by convention, the to_str method is only implemented on types that are interchangeable with strings.

        The Path class is such a type. By implementing Path#to_str, the implementation states that this class behaves like a String. For example, it’s now possible to pass a Path (rather than a String) to, and it will work because accepts anything that responds to #to_str.

        String#== also uses the to_str method. Because of this, the == operator is reflexive:

        Strict Equality

        Ruby provides #equal? to check whether two objects are the same instance:

        Here, we end up with two String instances with the same content. Because they are distinct instances, #equal? returns false, and because their content is the same, #== returns true.

        Do not implement #equal? in your own classes. It isn’t meant to be overridden. It’ll all end in tears.

        Earlier in this post, I mentioned that #== has the property of reflexivity: an object is always equal to itself. Here is a related property for #equal?:

        Property: Given objects a and b. If a.equal?(b), then a == b.

        Ruby won't automatically validate this property for your code. It’s up to you to ensure that this property holds when you implement the equality methods.
        For example, recall the implementation of Employee#== from earlier in this article:

        The call to super on the first line makes this implementation of #== reflexive. This super invokes the default implementation of #==, which delegates to #equal?. Therefore, I could have used #equal? rather than super:

        I prefer using super, though this is likely a matter of taste.

        Hash Equality

        In Ruby, any object can be used as a key in a Hash. Strings, symbols, and numbers are commonly used as Hash keys, but instances of your own classes can function as Hash keys too—provided that you implement both #eql? and #hash.

        The #eql? Method

        The #eql? method behaves similarly to #==:

        However, #eql?, unlike #==, does not perform type coercion:

        If #== doesn’t perform type coercion, the implementations of #eql? and #== will be identical. Rather than copy-pasting, however, we’ll put the implementation in #eql?, and let #== delegate to #eql?:

        I made the deliberate decision to put the implementation in #eql? and let #== delegate to it, rather than the other way around. If we were to let #eql? delegate to #==, there’s an increased risk that someone will update #== and inadvertently break the properties of #eql? (mentioned below) in the process.

        For the Path value object, whose #== method does perform type coercion, the implementation of #eql? will differ from the implementation of #==:

        Here, #== does not delegate to #eql?, nor the other way around.

        A correct implementation of #eql? has the following two properties:

        • Property: Given objects a and b. If a.eql?(b), then a == b.
        • Property: Given objects a and b. If a.equal?(b), then a.eql?(b).

        These two properties are not explicitly called out in the Ruby documentation. However, to the best of my knowledge, all implementations of #eql? and #== respect this property.

        Ruby will not automatically validate that these properties hold in your code. It’s up to you to ensure that these properties aren’t violated.

        The #hash Method

        For an object to be usable as a key in a Hash, it needs to implement not only #eql?, but also #hash. This #hash method will return an integer, the hash code, that respects the following property:

        Property: Given objects a and b. If a.eql?(b), then a.hash == b.hash.

        Typically, the implementation of #hash creates an array of all attributes that constitute identity and returns the hash of that array. For example, here is Point#hash:

        For Path, the implementation of #hash will look similar:

        For the Employee class, which is an entity rather than a value object, the implementation of #hash will use the class and the @id:

        If two objects are not equal, the hash code should ideally be different, too. This isn’t mandatory, however. It’s okay for two non-equal objects to have the same hash code. Ruby will use #eql? to tell objects with identical hash codes apart.

        Avoid XOR for Calculating Hash Codes

        A popular but problematic approach for implementing #hash uses XOR (the ^ operator). Such an implementation would calculate the hash codes of each individual attribute, and combine these hash codes with XOR. For example:

        With such an implementation, the chance of a hash code collision, which means that multiple objects have the same hash code, is higher than with an implementation that delegates to Array#hash. Hash code collisions will degrade performance and could potentially pose a denial-of-service security risk.

        A better way, though still flawed, is to multiply the components of the hash code by unique prime numbers before combining them:

        Such an implementation has additional performance overhead due to the new multiplication. It also requires mental effort to ensure the implementation is and remains correct.

        An even better way of implementing #hash is the one I’ve laid out before—making use of Array#hash:

        An implementation that uses Array#hash is simple, performs quite well, and produces hash codes with the lowest chance of collisions. It’s the best approach to implementing #hash.

        Putting it Together

        With both #eql? and #hash in place, the Point, Path, and Employee objects can be used as hash keys:

        Here, we use a Hash instance to keep track of a collection of Points. We can also use a Set for this, which uses a Hash under the hood, but provides a nicer API:

        Objects used in Sets need to have an implementation of both #eql? and #hash, just like objects used as hash keys.

        Objects that perform type coercion, such as Path, can also be used as hash keys, and thus also in sets:

        We now have an implementation of equality that works for all kinds of objects.

        Mutability, Nemesis of Equality

        So far, the examples for value objects have assumed that these value objects are immutable. This is with good reason because mutable value objects are far harder to deal with.

        To illustrate this, consider a Point instance used as a hash key:

        The problem arises when changing attributes of this point:

        Because the hash code is based on the attributes, and an attribute has changed, the hash code is no longer the same. As a result, collection no longer seems to contain the point. Uh oh!

        There are no good ways to solve this problem except for making value objects immutable.

        This isn’t a problem with entities. This is because the #eql? and #hash methods of an entity are solely based on its explicit identity—not its attributes.

        So far, we’ve covered #==, #eql?, and #hash. These three methods are sufficient for a correct implementation of equality. However, we can go further to improve that sweet Ruby developer experience and implement #===.

        Case Equality (Triple Equals)

        The #=== operator, also called the case equality operator, isn’t really an equality operator at all. Rather, it’s better to think of it as a membership testing operator. Consider the following:

        Here, Range#=== checks whether a range covers a certain element. It’s also common to use case expressions to achieve the same:

        This is also where case equality gets its name. Triple-equals is called case equality, because case expressions use it.

        You never need to use case. It’s possible to rewrite a case expression using if and ===. In general, case expressions tend to look cleaner. Compare:

        The examples above all use Range#===, to check whether the range covers a certain number. Another commonly used implementation is Class#===, which checks whether an object is an instance of a class:

        I’m rather fond of the #grep method, which uses #=== to select matching elements from an array. It can be shorter and sweeter than using #select:

        Regular expressions also implement #===. You can use it to check whether a string matches a regular expression:

        It helps to think of a regular expression as the (infinite) collection of all strings that can be produced by it. The set of all strings produced by /[a-z]/ includes the example string "+491573abcde". Similarly, you can think of a Class as the (infinite) collection of all its instances, and a Range as the collection of all elements in that range. This way of thinking clarifies that #=== really is a membership testing operator.

        An example of a class that could implement #=== is a PathPattern class:

        An example instance is"/bin/*"), which matches anything directly under the /bin directory, such as /bin/ruby, but not /var/log.

        The implementation of PathPattern#=== uses Ruby’s built-in File.fnmatch to check whether the pattern string matches. Here is an example of it in use:

        Worth noting is that File.fnmatch calls #to_str on its arguments. This way, #=== automatically works on other string-like objects as well, such as Path instances:

        The PathPattern class implements #===, and therefore PathPattern instances work with case/when, too:

        Ordered Comparison

        For some objects, it’s useful not only to check whether two objects are the same, but how they are ordered. Are they larger? Smaller? Consider this Score class, which models the scoring system of my university in Ghent, Belgium.

        (I was a terrible student. I’m not sure if this was really how the scoring even worked — but as an example, it will do just fine.)

        In any case, we benefit from having such a Score class. We can encode relevant logic there, such as determining the grade and checking whether or not a score is passing. For example, it might be useful to get the lowest and highest score out of a list:

        However, as it stands right now, the expressions scores.min and scores.max will result in an error: comparison of Score with Score failed (ArgumentError). We haven’t told Ruby how to compare two Score objects. We can do so by implementing Score#&<=>:

        An implementation of #<=> returns four possible values:

        • It returns 0 when the two objects are equal.
        • It returns -1 when self is less than other.
        • It returns 1 when self is greater than other.
        • It returns nil when the two objects cannot be compared.

        The #<=> and #== operators are connected:

        • Property: Given objects a and b. If (a <=> b) == 0, then a == b.
        • Property: Given objects a and b. If (a <=> b) != 0, then a != b.

        As before, it’s up to you to ensure that these properties hold when implementing #== and #<=>. Ruby won’t check this for you.

        For simplicity, I’ve left out the implementation Score#== in the Score example above. It’d certainly be good to have that, though.

        In the case of Score#<=>, we bail out if other is not a score, and otherwise, we call #<=> on the two values. We can check that this works: the expression <=> evaluates to -1, which is correct because a score of 6 is lower than a score of 12. (Did you know that the Belgian high school system used to have a scoring system where 1 was the highest and 10 was the lowest? Imagine the confusion!)

        With Score#<=> in place, scores.max now returns the maximum score. Other methods such as #min, #minmax, and #sort work as well.

        However, we can’t yet use operators like <. The expression scores[0] < scores[1], for example, will raise an undefined method error: undefined method `<' for #<Score:0x00112233 @value=6>. We can solve that by including the Comparable mixin:

        By including Comparable, the Score class automatically gains the <, <=, >, and >= operators, which all call <=> internally. The expression scores[0] < scores[1] now evaluates to a boolean, as expected.

        The Comparable mixin also provides other useful methods such as #between? and #clamp.

        Wrapping Up

        We talked about the following topics:

        • the #== operator, used for basic equality, with optional type coercion
        • #equal?, which checks whether two objects are the same instance
        • #eql? and #hash, which are used for testing whether an object is a key in a hash
        • #===, which isn’t quite an equality operator, but rather a “is kind of” or “is member of” operator
        • #<=> for ordered comparison, along with the Comparable module, which provides operators such as < and >=

        You now know all you need to know about implementing equality in Ruby. For more information check out the following resources:

        The Ruby documentation is a good place to find out more about equality:

        I also found the following resources useful:

        Denis is a Senior Software Engineer at Shopify. He has made it a habit of thanking ATMs when they give him money, thereby singlehandedly staving off the inevitable robot uprising.

        If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

        Continue reading

        Lessons Learned From Running Apache Airflow at Scale

        Lessons Learned From Running Apache Airflow at Scale

        By Megan Parker and Sam Wheating

        Apache Airflow is an orchestration platform that enables development, scheduling and monitoring of workflows. At Shopify, we’ve been running Airflow in production for over two years for a variety of workflows, including data extractions, machine learning model training, Apache Iceberg table maintenance, and DBT-powered data modeling. At the time of writing, we are currently running Airflow 2.2 on Kubernetes, using the Celery executor and MySQL 8.

        System diagram showing Shopify's Airflow Architecture
        Shopify’s Airflow Architecture

        Shopify’s usage of Airflow has scaled dramatically over the past two years. In our largest environment, we run over 10,000 DAGs representing a large variety of workloads. This environment averages over 400 tasks running at a given moment and over 150,000 runs executed per day. As adoption increases within Shopify, the load incurred on our Airflow deployments will only increase. As a result of this rapid growth, we have encountered a few challenges, including slow file access, insufficient control over DAG (directed acyclic graph) capabilities, irregular levels of traffic, and resource contention between workloads, to name a few.

        Below we’ll share some of the lessons we learned and solutions we built in order to run Airflow at scale.

        1. File Access Can Be Slow When Using Cloud Storage

        Fast file access is critical to the performance and integrity of an Airflow environment. A well defined strategy for file access ensures that the scheduler can process DAG files quickly and keep your jobs up-to-date.

        Airflow keeps its internal representation of its workflows up-to-date by repeatedly scanning and reparsing all the files in the configured DAG directory. These files must be scanned often in order to maintain consistency between the on-disk source of truth for each workload and its in-database representation. This means the contents of the DAG directory must be consistent across all schedulers and workers in a single environment (Airflow suggests a few ways of achieving this).

        At Shopify, we use Google Cloud Storage (GCS) for the storage of DAGs. Our initial deployment of Airflow utilized GCSFuse to maintain a consistent set of files across all workers and schedulers in a single Airflow environment. However, at scale this proved to be a bottleneck on performance as every file read incurred a request to GCS. The volume of reads was especially high because every pod in the environment had to mount the bucket separately.

        After some experimentation we found that we could vastly improve performance across our Airflow environments by running an NFS (network file system) server within the Kubernetes cluster. We then mounted this NFS server as a read-write-many volume into the worker and scheduler pods. We wrote a custom script which synchronizes the state of this volume with  GCS, so that users only have to interact with GCS for uploading or managing DAGs. This script runs in a separate pod within the same cluster. This also allows us to conditionally sync only a subset of the DAGs from a given bucket, or even sync DAGs from multiple buckets into a single file system based on the environment’s configuration (more on this later).

        Altogether this provides us with fast file access as a stable, external source of truth, while maintaining our ability to quickly add or modify DAG files within Airflow. Additionally, we can use Google Cloud Platform’s IAM (identify and access management) capabilities to control which users are able to upload files to a given environment. For example, we allow users to upload DAGs directly to the staging environment but limit production environment uploads to our continuous deployment processes.

        Another factor to consider when ensuring fast file access when running Airflow at scale is your file processing performance. Airflow is highly configurable and offers several ways to tune the background file processing (such as the sort modethe parallelism, and the timeout). This allows you to optimize your environments for interactive DAG development or scheduler performance depending on the requirements.

        2. Increasing Volumes Of Metadata Can Degrade Airflow Operations

        In a normal-sized Airflow deployment, performance degradation due to metadata volume wouldn’t be an issue, at least within the first years of continuous operation.

        However, at scale the metadata starts to accumulate pretty fast. After a while this can start to incur additional load on the database. This is noticeable in the loading times of the Web UI and even more so during Airflow upgrades, during which migrations can take hours.

        After some trial and error, we settled on a metadata retention policy of 28 days, and implemented a simple DAG which uses ORM (object–relational mapping) queries within a PythonOperator to delete rows from any tables containing historical data (DagRuns, TaskInstances, Logs, TaskRetries, etc). We settled on 28 days as this gives us sufficient history for managing incidents and tracking historical job performance, while keeping the volume of data in the database at a reasonable level.

        Unfortunately, this means that features of Airflow which rely on durable job history (for example, long-running backfills) aren’t supported in our environment. This wasn’t a problem for us, but it may cause issues depending on your retention period and usage of Airflow.

        As an alternative approach to a custom DAG, Airflow has recently added support for a db clean command which can be used to remove old metadata. This command is available in Airflow version 2.3.

        3. DAGs Can Be Hard To Associate With Users And Teams

        When running Airflow in a multi-tenant setting (and especially at a large organization), it’s important to be able to trace a DAG back to an individual or team. Why? Because if a job is failing, throwing errors or interfering with other workloads, us administrators can quickly reach out to the appropriate users.

        If all of the DAGs were deployed directly from one repository, we could simply use git blame to track down the job owner. However, since we allow users to deploy workloads from their own projects (and even dynamically generate jobs at deploy-time), this becomes more difficult.

        In order to easily trace the origin of DAGs, we introduced a registry of Airflow namespaces, which we refer to as an Airflow environment’s manifest file.

        The manifest file is a YAML file where users must register a namespace for their DAGs. In this file they will include information about the jobs’ owners and source github repository (or even source GCS bucket), as well as define some basic restrictions for their DAGs. We maintain a separate manifest per-environment and upload it to GCS alongside with the DAGs.

        4. DAG Authors Have A Lot Of Power

        By allowing users to directly write and upload DAGs to a shared environment, we’ve granted them a lot of power. Since Airflow is a central component of our data platform, it ties into a lot of different systems and thus jobs have wide-ranging access. While we trust our users, we still want to maintain some level of control over what they can and cannot do within a given Airflow Environment. This is especially important at scale as it becomes unfeasible for the Airflow administrators to review all jobs before they make it to production.

        In order to create some basic guardrails, we’ve implemented a DAG policy which reads configuration from the previously mentioned Airflow manifest, and rejects DAGs which don’t conform to their namespace’s constraints by raising an AirflowClusterPolicyViolation.

        Based on the contents of the manifest file, this policy will apply a few basic restrictions to DAG files, such as:

        • A DAG ID must be prefixed with the name of an existing namespace, for ownership.
        • Tasks in a DAG must only enqueue tasks to the specified celery queue—more on this later.
        • Tasks in a DAG can only be run in specified pools, to prevent one workload from taking over another’s capacity.
        • Any KubernetesPodOperators in this DAG must only launch pods in the specified namespaces, to prevent access to other namespace’s secrets.
        • Tasks in a DAG can only launch pods into specified sets of external kubernetes clusters

        This policy can be extended to enforce other rules (for example, only allowing a limited set of operators), or even mutate tasks to conform to a certain specification (for exampke, adding a namespace-specific execution timeout to all tasks in a DAG).

        Here’s a simplified example demonstrating how to create a DAG policy which reads the previously shared manifest file, and implements the first three of the controls mentioned above:

        These validations provide us with sufficient traceability while also creating some basic controls which reduce DAGs abilities to interfere with each other.

        5. Ensuring A Consistent Distribution Of Load Is Difficult

        It’s very tempting to use an absolute interval for your DAGs schedule interval—simply set the DAG to run every timedelta(hours=1) and you can walk away, safely knowing that your DAG will run approximately every hour. However, this can lead to issues at scale.

        When a user merges a large number of automatically-generated DAGs, or writes a python file which generates many DAGs at parse-time, all the DAGRuns will be created at the same time. This creates a large surge of traffic which can overload the Airflow scheduler, as well as any external services or infrastructure which the job is utilizing (for example, a Trino cluster).

        After a single schedule_interval has passed, all these jobs will run again at the same time, thus leading to another surge of traffic. Ultimately, this can lead to suboptimal resource utilization and increased execution times.

        While crontab-based schedules won’t cause these kinds of surges, they come with their own issues. Humans are biased towards human-readable schedules, and thus tend to create jobs which run at the top of every hour, every hour, every night at midnight, etc. Sometimes there’s a valid application-specific reason for this (for example, every night at midnight we want to extract the previous day’s data), but often we have found users just want to run their job on a regular interval. Allowing users to directly specify their own crontabs can lead to bursts of traffic which can impact SLOs and put uneven load on external systems.

        As a solution to both these issues, we use a deterministically randomized schedule interval for all automatically generated DAGs (which represent the vast majority of our workflows). This is typically based on a hash of a constant seed such as the dag_id.

        The below snippet provides a simple example of a function which generates deterministic, random crontabs which yield constant schedule intervals. Unfortunately, this limits the range of possible intervals since not all intervals can be expressed as a single crontab. We have not found this limited choice of schedule intervals to be limiting, and in cases when we really need to run a job every five hours, we just accept that there will be a single four hours interval each day.

        Thanks to our randomized schedule implementation, we were able to smooth the load out significantly. The below image shows the number of tasks completed every 10 minutes over a twelve hour period in our single largest Airflow environment.

        Bar graph showing the Tasks Executed versus time. Shows a per 10–minute Interval in our Production Airflow Environment
        Tasks Executed per 10–minute Interval in our Production Airflow Environment

        6. There Are Many Points of Resource Contention

        There’s a lot of possible points of resource contention within Airflow, and it’s really easy to end up chasing bottlenecks through a series of experimental configuration changes. Some of these resource conflicts can be handled within Airflow, while others may require some infrastructure changes. Here’s a couple of ways which we handle resource contention within Airflow at Shopify:


        One way to reduce resource contention is to use Airflow pools. Pools are used to limit the concurrency of a given set of tasks. These can be really useful for reducing disruptions caused by bursts in traffic. While pools are a useful tool to enforce task isolation, they can be a challenge to manage because only administrators have access to edit them via the Web UI.

        We wrote a custom DAG which synchronizes the pools in our environment with the state specified in a Kubernetes Configmap via some simple ORM queries. This lets us manage pools alongside the rest of our Airflow deployment configuration and allows users to update pools via a reviewed Pull Request without needing elevated access. 

        Priority Weight

        Priority_weight allows you to assign a higher priority to a given task. Tasks with a higher priority will float to the top of the pile to be scheduled first. Although not a direct solution to resource contention, priority_weight can be useful to ensure that latency-sensitive critical tasks are run prior to lower priority tasks. However, given that the priority_weight is an arbitrary scale, it can be hard to determine the actual priority of a task without comparing it to all other tasks. We use this to ensure that our basic Airflow monitoring DAG (which emits simple metrics and powers some alerts) always runs as promptly as possible.

        It’s also worthwhile to note that by default, the effective priority_weight of a task used when making scheduling decisions is the sum of its own weight and that of all its downstream tasks. What this means is that upstream tasks in large DAGs are often favored over tasks in smaller DAGs. Therefore, using priority_weight requires some knowledge of the other DAGs running in the environment.

        Celery Queues and Isolated Workers

        If you need your tasks to execute in separate environments (for example, dependencies on different python libraries, higher resource allowances for intensive tasks, or differing level of access), you can create additional queues which a subset of jobs submit tasks to. Separate sets of workers can then be configured to pull from separate queues. A task can be assigned to a separate queue using the queue argument in operators. To start a worker which runs tasks from a different queue, you can use the following command:

        bashAirflow celery worker –queues <list of queues>

        This can help ensure that sensitive or high-priority workloads have sufficient resources, as they won’t be competing with other workloads for worker capacity.

        Any combination of pools, priority weights and queues can be useful in reducing resource contention. While pools allow for limiting concurrency within a single workload, a priority_weight can be used to make individual tasks run at a lower latency than others. If you need even more flexibility, worker isolation provides fine-grained control over the environment in which your tasks are executed.

        It’s important to remember that not all resources can be carefully allocated in Airflow—scheduler throughput, database capacity and Kubernetes IP space are all finite resources which can’t be restricted on a workload-by-workload basis without the creation of isolated environments.

        Going Forward…

        There are many considerations that go into running Airflow with such high throughput, and any combination of solutions can be useful. We’ve learned a ton and we hope you’ll remember these lessons and apply some of our solutions in your own Airflow infrastructure and tooling.

        To sum up our key takeaways:

        • A combination of GCS and NFS allows for both performant and easy to use file management.
        • Metadata retention policies can reduce degradation of Airflow performance.
        • A centralized metadata repository can be used to track DAG origins and ownership.
        • DAG Policies are great for enforcing standards and limitations on jobs.
        • Standardized schedule generation can reduce or eliminate bursts in traffic.
        • Airflow provides multiple mechanisms for managing resource contention.

        What’s next for us? We’re currently working on applying the principles of scaling Airflow in a single environment as we explore splitting our workloads across multiple environments. This will make our platform more resilient, allow us to fine-tune each individual Airflow instance based on its workloads’ specific requirements, and reduce the reach of any one Airflow deployment.

        Got questions about implementing Airflow at scale? You can reach out to either of the authors on the Apache Airflow slack community.

        Megan has worked on the data platform team at Shopify for the past 9 months where she has been working on enhancing the user experience for Airflow and Trino. Megan is located in Toronto, Canada where she enjoys any outdoor activity, especially biking and hiking.

        Sam is a Senior developer from Vancouver, BC who has been working on the Data Infrastructure and Engine Foundations teams at Shopify for the last 2.5 years. He is an internal advocate for open source software and a recurring contributor to the Apache Airflow project.

        Interested in tackling challenging problems that make a difference? Visit our Data Science & Engineering career page to browse our open positions. You can also contribute to Apache Airflow to improve Airflow for everyone.

        Continue reading

        Asynchronous Communication is the Great Leveler in Engineering

        Asynchronous Communication is the Great Leveler in Engineering

        In March 2020—the early days of the pandemic—Shopify transitioned to become a remote first- company. We call it being Digital by Design. We are now proud to employ Shopifolk around the world.

        Not only has being Digital by Design allowed our staff the flexibility to work from wherever they work best, it has also increased the amount of time that they are able to spend with their families, friends, or social circles. In the pre-remote world, many of us had to move far away from our hometowns in order to get ahead in our careers. However, recently, my own family has been negotiating with the reality of aging parents, and remote working has allowed us to move back closer to home so we are always there for them whilst still doing the jobs we love.

        However, being remote isn’t without its challenges. Much of the technology industry has spent decades working in colocated office space. This has formed habits in all of us that aren’t compatible with effective remote work. We’re now on a journey of mindfully unraveling these default behaviors and replacing them with remote-focused ways of working.

        If you’ve worked in an office before, you’ll be familiar with synchronous communication and how it forms strong bonds between colleagues: a brief chat in the kitchen whilst getting a coffee, a discussion at your desk that was prompted by a recent code change, or a conversation over lunch.

        With the layout of physical office spaces encouraging spontaneous interactions, context could be gathered and shared through osmosis with little specific effort—it just happened.

        There are many challenges that you face in engineering when working on a globally distributed team. Not everyone is online at the same time of the day, meaning that it can be harder to get immediate answers to questions. You might not even know who to ask when you can’t stick your head up and look around to see who is at their desks. You may worry about ensuring that the architectural direction that you’re about to take in the codebase is the right one when building for the long term—how can you decide when you’re sitting at home on your own?

        Teams have had to shift to using a different toolbox of skills now that everyone is remote. One such skill is the shift to more asynchronous communication: an essential glue that holds a distributed workforce together. It’s inclusive of teams in different time zones, it leaves an audit trail of communication and decision making, encourages us to communicate concisely, and enables everybody the same window into the company, regardless of where they are in the world.

        However, an unstructured approach can be challenging, especially when teams are working on establishing their communication norms. It helps to have a model with which to reason about how best to communicate for a given purpose and to understand what side-effects of that communication might be.

        The Spectrum of Synchronousness

        When working remotely, we have to adapt to a different landscape. The increased flexibility of working hours, the ability to find flow and do deep work, and the fact that our colleagues are in different places means we can’t rely on the synchronous, impromptu interactions of the office anymore. We have to navigate a continuum of communication choices between synchronous and asynchronous, choosing the right way to communicate for the right group at the right time.

        It’s possible to represent different types of communication on a spectrum, as seen in the diagram below.

        A diagram showing spectrum of communication types with the extremes being Synchronous Impermanent Connect and Asynchronous Permanent Disconnected
        The spectrum of communication types

        Let’s walk the spectrum from left to right—from synchronous to asynchronous—to understand the kinds of choices that we need to make when communicating in a remote environment.

        • Video calls and pair programming are completely synchronous: all participants need to be online at the same time.
        • Chats are written and can be read later, but due to their temporal nature have a short half-life. Usually there’s an expectation that they’ll be read or replied to fairly quickly, else they’re gone.
        • Recorded video is more asynchronous, however they’re typically used as a way of broadcasting some information or complimenting a longer document, and their relevance can fade rapidly.
        • Email is archival and permanent and is typically used for important communication. People may take many days to reply or not reply at all.
        • Written documents are used for technical designs, in-depth analysis, or cornerstones of projects. They may be read many years after they were written but need to be maintained and often represent a snapshot in time.
        • Wikis and READMEs are completely asynchronous, and if well-maintained, can last and be useful forever.

        Shifting to Asynchronous

        When being Digital by Design, we have to be intentionally more asynchronous. It’s a big relearning of how to work collaboratively. In offices, we could get by synchronously, but there was a catch: colleagues at home, on vacation, or in different offices would have no idea what was going on. Now we’re all in that position, we have to adapt in order to harness all of the benefits of working with a global workforce.

        By treating everyone as remote, we typically write as a primary form of communication so that all employees can have access to the same information wherever they are. We replace meetings with asynchronous interactions where possible so that staff have more flexibility over their time. We record and rebroadcast town halls so that staff in other timezones can experience the feeling of watching them together. We document our decisions so that others can understand the archeology of codebases and projects. We put effort into editing and maintaining our company-wide documentation in all departments, so that all employees have the same source of truth about teams, the organization chart, and projects.

        This shift is challenging, but it’s worthwhile: effective asynchronous collaboration is how engineers solve hard problems for our merchants at scale, collaborating as part of a global team. Whiteboarding sessions have been replaced with the creation of collaborative documents in tools such as Miro. In-person Town Halls have been replaced with live streamed events that are rebroadcast on different time zones with commentary and interactions taking place in Slack. The information that we all have in our heads has needed to be written, recorded, and documented. Even with all of the tools provided, it requires a total mindset shift to use them effectively.

        We’re continually investing in our developer tools and build systems to enable our engineers to contribute to our codebases and ship to production any time, no matter where they are. We’re also investing in internal learning resources and courses so that new hires can autonomously level up their skills and understand how we ship software. We have regular broadcasts of show and tell and demo sessions so that we can all gather context on what our colleagues are building around the world. And most importantly, we take time to write regular project and mission updates so that everyone in the company can feel the pulse of the organization.

        Asynchronous communication is the great leveler: it connects everyone together and treats everyone equally.

        Permanence in Engineering

        In addition to giving each employee the same window into our culture, asynchronous communication also has the benefit of producing permanent artifacts. These could be written documents, pull requests, emails, or videos. As per our diagram, the more asynchronous the communication, the more permanent the artifact. Therefore, shifting to asynchronous communication means that not only are teams able to be effective remotely, but they also produce archives and audit trails for their work.

        The whole of Shopify uses a single source of truth—an internal archive of information called the Vault. Here, Shopifolk can find all of the information that they need to get their work done: information on teams, projects, the latest news and video streams, internal podcasts and blog posts. Engineers can quickly find architecture diagrams and design documents for active projects.

        When writing design documents for major changes to the codebase, a team produces an archive of their decisions and actions through time. By producing written updates to projects every week, anyone in the company can capture the current context and where it has been derived from. By recording team meetings and making detailed minutes, those that were unable to attend can catch up later on-demand. A shift to asynchronous communication means a shift to implied shift to permanence of communication, which is beneficial for discovery[g][h][i], reflection and understanding.

        For example, when designing new features and architecture, we collaborate asynchronously on design documents via GitHub. New designs are raised as issues in our technical designs repository, which means that all significant changes to our codebase are reviewed, ratified and archived publicly. This mirrors how global collaboration works on the open source projects we know and love. Working so visibly can be intimidating for those that haven’t done it before, so we ensure that we mentor and pair with those that are doing it for the first time.

        Establishing Norms and Boundaries

        Yet, multiple mediums of communication incur many choices in how to use them effectively. When you have the option to communicate via chat, email, collaborative document or GitHub issue, picking the right one can become overwhelming and frustrating. Therefore we encourage our teams to establish their preferred norms and to write them down. For example:

        • What are the response time expectations within a team for chat versus email?
        • How are online working hours clearly discoverable for each team member?
        • How is consensus reached on important decisions?
        • Is a synchronous meeting ever necessary?
        • What is the correct etiquette for “raising your hand” in video calls?
        • Where are design documents stored so they’re easily accessible in the future?

        By agreeing upon the right medium to use for given situations, teams can work out what’s right for them in a way that supports flexibility, autonomy, and clarity. If you’ve never done this in your team, give it a go. You’ll be surprised how much easier it makes your day to day work.

        The norms that our teams define bridge both synchronous and asynchronous expectations. At Shopify, my team members ensure that they make the most of the windows of overlap that they have each day, setting aside time to be interruptible for pair programming, impromptu chats and meetings, and collaborative design sessions. Conversely, the times of the day when teams have less overlap are equally important. Individuals are encouraged to block out time in the calendar, turn on their “do not disturb” status, and find the space and time to get into a flow state and be productive

        A natural extension of the communication norms cover when writing and shipping code. Given that our global distribution of staff can potentially incur delays when it comes to reviewing, merging, and deploying, teams are encouraged to explore and define how they reach alignment and get things shipped. This can happen by raising the priority of reviewing pull requests created in other timezones first thing in the morning before getting on with your own work, through to finding additional engineers that may be external to the team but on the same timezone to lean in and offer support, review, and pairing.

        Maintaining Connectivity

        Once you get more comfortable with working ever more asynchronously, it can be tempting to want to make everything asynchronous: stand-ups on Slack, planning in Miro, all without needing anyone to be on a video call at all. However, if we look back at the diagram one more time, we’ll see that there’s an important third category: connectivity. Humans are social beings, and feeling connected to other, real humans—not just avatars—is critical to our wellbeing. This means that when shifting to asynchronous work we also need to ensure that we maintain that connection. Sometimes having a synchronous meeting can be a great thing, even if it’s less efficient—the ability to see other faces, and to chat, can’t be taken for granted.

        We actively work to ensure that we remain connected to each other at Shopify. Pair programming is a core part of our engineering culture, and we love using Tuple to solve problems collaboratively, share context about our codebases, and provide an environment to help hundreds of new engineers onboard and gain confidence working together with us.

        We also strongly advocate for plenty of time to get together and have fun. And no, I’m not talking about generic, awkward corporate fun. I’m talking about hanging out with colleagues and throwing things at them in our very own video game: Shopify Party (our internal virtual world for employees to play games or meet up). I’m talking about your team spending Friday afternoon playing board games together remotely. And most importantly, I’m talking about at least twice a year, we encourage teams to come together in spaces we’ve invested in around the world for meaningful and intentional moments for brainstorming, team building, planning, and establishing connections offline.

        Asynchronous brings efficiency, and synchronous brings connectivity. We’ve got both covered at Shopify.

        James Stanier is Director of Engineering at Shopify. He is also the author of Become an Effective Software Manager and Effective Remote Work. He holds a Ph.D. in computer science and runs

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Double Entry Transition Tables: How We Track State Changes At Shopify

        Double Entry Transition Tables: How We Track State Changes At Shopify

        Recently we launched Shopify Balance, a money management account and card that gives Shopify merchants quick access to their funds with no fees. After the beta launch of Shopify Balance, the Shopify Data team was brought in to answer the question: how do we reliably count the number of merchants using Balance? In particular, how do we count this historically?

        While this sounds like a simple question, it’s foundationally critical to knowing if our product is a success and if merchants are actually using it. It’s also more complicated than it seems to answer.

        To be considered as using Shopify Balance, a merchant has to have both an active Shopify Balance account and an active Shopify account. This means we needed to build something to track the state changes of both accounts simultaneously, and make that tracking robust and reliable over time. Enter double entry transition tables. While very much an “invest up front and save a ton of time in the long run” strategy, double entry transition tables give us the flexibility to see the individual inputs that cause a given change. It does all of this while simplifying our queries and reducing long term maintenance on our reporting.

        In this post, we’ll explore how we built a data pipeline using double entry transition tables to answer our question: how many Shopify merchants are using Shopify Balance? We’ll go over how we designed something that scales as our product grows in complexity, the benefits of using double entry transition tables—from ease of use to future proofing our reporting—and some sample queries using our new table.

        What Are Double Entry Transition Tables?

        Double entry transition tables are essentially a data presentation format that tracks changes in attributes of entities over time. At Shopify, one of our first use cases of a double entry transition table was used to track the state of merchants using the platform, allowing us to report on how many merchants have active accounts. In comparison to a standard transition table that has from and to columns, double entry transition tables output two rows for each state change, along with a new net_change column. They can also combine many individual tracked attributes into a single output.

        It took me a long time to wrap my head around this net_change column, but it essentially works like this: if you want to track the status of something over time, every time the status changes from one state to another or vice versa, there will be two entries:

        1. net_change = -1: this row is the previous state
        2. net_change = +1: this row is the new state

        Double entry transition tables have many advantages including:

        • The net_change column is additive: this is the true benefit of using this type of table. This allows you to quickly get the number of entities that are in a certain state by summing up net_change while filtering for the state you care about.
        • Identifying cause of change: for situations where you care about an overall status (one that depends on several underlying statuses), you can go into the table and see which of the individual attributes caused the change.
        • Preserving all timing information: the output preserves all timing information, and even correctly orders transitions that have identical timestamps. This is helpful for situations where you need to know something like the duration of a given status.
        • Easily scaled with additional attributes: if the downstream dependencies are written correctly, you can add additional attributes to your table as the product you’re tracking grows in complexity. The bonus is that you don’t have to rewrite any existing SQL or PySpark, all thanks to the additive nature of the net_change column.

        For our purpose of identifying how many merchants are using Shopify Balance, double entry transition tables allow us to track state changes for both the Shopify Balance account and the Shopify account in a single table. It also gives us a clean way to query the status of each entity over time. But how do we do this?

        Building Our Double Entry Transition Pipelines

        First, we need to prepare individual attribute tables to be used as inputs for our double entry transition data infrastructure. We need at least one attribute, but it can scale to any number of attributes as the product we’re tracking grows.

        In our case, we created individual attribute tables for both the Shopify Balance account status and the Shopify account status. An attribute input table must have a specific set of columns:

        • a partition key that’s common across attribute, which in our case is an account_id
        • a sort key, generally a transition_at timestamp and an index
        • an attribute you want to track.

        Using a standard transition table, we can convert it to an attribute with a simple PySpark job:

        Note the index column. We created this index using a row number window function, ordering by the transition_id any time we have duplicate account_id and transition_at sets in our original data. While simple, it serves the role of a tiebreak should there be two transition events with identical timestamps. This ensures we always have a unique account_id, transition_at, index set in our attribute for correct ordering of events. The index plays a key role later on when we create our double entry transition table, ensuring we’re able to capture the order of our two states.

        Our Shopify Balance status attribute table showing a merchant that joined and left Shopify Balance.

        Now that we have our two attribute tables, it’s time to feed these into our double entry transition pipelines. This system (called build merge state transitions) takes our individual attribute tables and first generates a combined set of unique rows using a partition_key (in our case, the account_id column), and a sort_key (in our case, the transition_at and index columns). It then creates one column per attribute, and fills in the attribute columns with values from their respective tables, in the order defined by the partition_key and sort_key. Where values are missing, it fills in the table using the previous known value for that attribute. Below you can see two example attributes being merged together and filled in:

        Two example attributes merged into a single output table.

        This table is then run through another process that creates our net_change column and assigns a +1 value to all current rows. It also inserts a second row for each state change with a net_change value of -1. This net_change column now represents the direction of each state change as outlined earlier.

        Thanks to our pipeline, setting up a double entry transition table is a very simple PySpark job:

        Note in the code above we’ve specified default values. These are used to fill in the initial null values for the attributes. Now below is the output of our final double entry transition table, which we call our accounts_transition_facts table. The table captures both a merchant’s Shopify and Shopify Balance account statuses over time. Looking at the shopify_status column, we can see they went from inactive to active in 2018, while the balance_status column shows us that they went from not_on_balance to active on March 14, 2021, and subsequently from active to inactive on April 23, 2021:

        A merchant that joined and left Shopify Balance in our accounts_transition_facts double entry transition table.

        Using Double Entry Transition Tables

        Remember how I mentioned that the net_change column is additive? This makes working with double entry transition tables incredibly easy. The ability to sum the net_change column significantly reduces the SQL or PySpark needed to get counts of states. For example, using our new account_transition_facts table, we can identify the total number of active accounts on Shopify Balance, using both the Shopify Balance status and Shopify status. All we have to do is sum our net_change column while filtering for the attribute statuses we care about:

        Add in a grouping on a date column and we can see the net change in accounts over time:

        We can even use the output in other PySpark jobs. Below is an example of a PySpark job consuming the output of our account_transition_facts table. In this case, we are adding the daily net change in account numbers to an aggregate daily snapshot table for Shopify Balance:

        There are many ways you can achieve the same outputs using SQL or PySpark, but having a double entry transition table in place significantly simplifies the code at query time. And as mentioned earlier, if you write the code using the additive net_change column, you won’t need to rewrite any SQL or PySpark when you add more attributes to your double entry transition table.

        We won’t lie, it took a lot of time and effort to build the first version of our account_transition_facts table. But thanks to our investment, we now have a reliable way to answer our initial question: how do we count the number of merchants using Balance? It’s easy with our double entry transition table! Grouping by the status we care about, simply sum net_change and viola, we have our answer.

        Not only does our double entry transition table simply and elegantly answer our question, but it also easily scales with our product. Thanks to the additive nature of the net_change column, we can add additional attributes without impacting any of our existing reporting. This means this is just the beginning for our account_transition_facts table. In the coming months, we’ll be evaluating other statuses that change over time, and adding in those that make sense for Shopify Balance into our table. Next time you need to reliably count multiple states, try exploring double entry transition tables.

        Justin Pauley is a Data Scientist working on Shopify Balance. Justin has a passion for solving complex problems through data modeling, and is a firm believer that clean data leads to better storytelling. In his spare time he enjoys woodworking, building custom Lego creations, and learning new technologies. Justin can be reached via LinkedIn.

        Are you passionate about data discovery and eager to learn more, we’re always hiring! Reach out to us or apply on our careers page.

        If you’re interested in building solutions from the ground up and would like to come work with us, please check out Shopify’s career page.

        Continue reading

        Shopify Invests in Research for Ruby at Scale

        Shopify Invests in Research for Ruby at Scale

        Shopify is continuing to invest on Ruby on Rails at scale. We’ve taken that further recently by funding high-profile academics to focus their work towards Ruby and the needs of the Ruby community. Over the past year we have given nearly half a million dollars in gifts to influential researchers that we trust to make a significant impact on the Ruby community for the long term.

        Shopify engineers and researchers at a recent meetup in London

        We want developments in programming languages and their implementations to be explored in Ruby, so that support for Ruby's unique properties are built in from the start. For example, Ruby's prevalent metaprogramming motivated a whole new kind of inline caching to be developed and presented as a paper at one of the top programming language conferences, and Ruby's unusually loose C extension API motivated a new kind of C interpreter to run virtualized C. These innovations wouldn't have happened if academics weren't looking at Ruby.

        We want programming language research to be evaluated against the workloads that matter to companies using Ruby. We want researchers to understand the scale of our code bases, how frequently they're deployed, and the code patterns we use in them. For example, a lot of VM research over the last couple of decades has traded off a long warmup optimization period for better peak performance, but this doesn't work for companies like Shopify where we're redeploying very frequently. Researchers aren't aware of these kinds of problems unless we partner with them and guide them.

        We think that working with academics like this will be self-perpetuating. With key researchers thinking and talking about Ruby, more early career researchers will consider working with Ruby and solving problems that are important to the Ruby community.

        Let’s meet Shopify’s new research collaborators.

        Professor Laurence Tratt

        Professor Laurence Tratt describes his vision for optimizing Ruby

        Professor Laurence Tratt is the Shopify and Royal Academy of Engineering Research Chair in Language Engineering at King’s College London. Jointly funded by Shopify, the Royal Academy, and King’s College, Laurie is looking at the possibility of automatically generating a just-in-time compiler from the existing Ruby interpreter through hardware meta-tracing and basic-block stitching.

        Laurie has an eclectic and influential research portfolio, and extensive writing on many aspects of improving dynamic languages and programming. He has context from the Python community and the groundbreaking work towards meta-tracing in the PyPy project. Laurie also works to build the programming language implementation community for the long term by co-organising a summer school series for early career researchers, bringing them together with experienced researchers from academia and industry.

        Professor Steve Blackburn

        Professor Steve Blackburn is building a new model for applied garbage collection

        Professor Steve Blackburn is an academic at the Australian National University and Google Research. Shopify funded his group’s work on MMTk, the memory management toolkit, a general library for garbage collection that brings together proven garbage collection algorithms with a framework for research into new ideas for garbage collection. We’re putting MMTk into Ruby so that Ruby can get the best current collectors today and future garbage collectors can be tested against Ruby.

        Steve is a world-leading expert in garbage collection, and Shopify’s funding is putting Ruby’s unique requirements for memory management into his focus.

        Dr Stefan Marr

        Dr Stefan Marr is an expert in benchmarking dynamic language implementations

        Dr Stefan Marr is a Senior Lecturer at the University of Kent in the UK and a Royal Society Industrial Fellow. With the support of Shopify, he’s examining how we can make interpreters faster and improve interpreter startup and warmup time.

        Stefan has a distinguished reputation for benchmarking techniques, differential analysis between languages and implementation techniques, and dynamic language implementation. He co-invented a new method for inline caching that has been instrumental for improving the performance of Ruby’s metaprogramming in TruffleRuby.

        Shopify engineers and research collaborators discuss how to work together to improve Ruby

        We’ve been bringing together the researchers that we’re funding with our senior Ruby community engineers to share their knowledge of what’s already possible and what could be possible, combining our understanding of how Ruby and Rails are used at scale today and what the community needs.

        These external researchers are all in addition to our own internal teams doing publishable research-level work on Ruby, with YJIT and TruffleRuby, and more efforts.

        Part of Shopify’s Ruby and Rails Infrastructure Team listening to research proposals

        We’ll be looking forward to sharing more about our investments in Ruby research over the coming years in blog posts and academic papers.

        Chris Seaton has a PhD in optimizing Ruby and works on TruffleRuby, a highly optimizing implementation of Ruby, and research projects at Shopify.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Maestro: The Orchestration Language Powering Shopify Flow

        Maestro: The Orchestration Language Powering Shopify Flow

        Adagio misterioso

        Shopify recently unveiled a new version of Shopify Flow. Merchants extensively use Flow’s workflow language and associated execution engine to customize Shopify, automate tedious, repetitive tasks, and focus on what matters. Flow comes with a comprehensive library of templates for common use cases, and detailed documentation to guide merchants in customizing their workflows.

        For the past couple of years my team has been working on transitioning Flow from a successful Shopify Plus App into a platform designed to power the increasing automation and customization needs across Shopify. One of the main technical challenges we had to address was the excessive coupling between the Flow editor and engine. Since they shared the same data structures, the editor and engine couldn't evolve independently, and we had limited ability to tailor these data structures for their particular needs. This problem was significant because editor and engine have fundamentally very different requirements.

        The Flow editor provides a merchant-facing visual workflow language. Its language must be declarative, capturing the merchant’s intent without dealing with how to execute that intent. The editor concerns itself mainly with usability, understandability, and interactive editing of workflows. The Flow engine, in turn, needs to efficiently execute workflows at scale in a fault-tolerant manner. Its language can be more imperative, but it must have good support for optimizations and have at-least-once execution semantics that ensures workflow executions recover from crashes. However, editor and engine also need to play together nicely. For example, they need to agree on the type system, which is used to find user errors and to support IDE-like features, such as code completion and inline error reporting within the visual experience.

        We realized it was important to tackle this problem right away, and it was crucial to get it right while minimizing disruptions to merchants. We proceeded incrementally.

        First, we designed and implemented a new domain-specific orchestration language that addressed the requirements of the Flow engine. We call this language Maestro. We then implemented a new, horizontally scalable engine to execute Maestro orchestrations. Next, we created a translation layer from original Flow workflow data structures into Maestro orchestrations. This allowed us to execute existing Flow workflows with the new engine. At last, we slowly migrated all Flow workflows to use the new engine, and by BFCM 2020 essentially all workflows were executing in the new engine.

        We were then finally in a position to deal with the visual language. So we implemented a brand new visual experience, including a new language for the Flow editor. This language is more flexible and expressive than the original, so any of the existing workflows could be easily migrated. The language also can be translated into Maestro orchestrations, so it could be directly executed by the new engine. Finally, once we were satisfied with the new experience, we started migrating existing Flow workflows, and by early 2022, all Flow workflows had been migrated to use the new editor and new engine.

        In the remainder of this post I want to focus on the new orchestration language, Maestro. I’ll give you an overview of its design and implementation, and then focus on how it neatly integrates with and addresses the requirements of the new version of Shopify Flow.


        A Sample of Maestro

        Allegro grazioso

        Let’s take a quick tour to get a taste of what Maestro looks like and what exactly it does. Maestro isn’t a general purpose programming language, but rather an orchestration language focused solely on coordinating the sequence in which calls to functions on some host language are made, while capturing which data is passed between those function calls. For example, suppose you want to implement code that calls a remote service to fetch some relevant customers and then deletes those customers from the database. The Maestro language can’t implement the remote service call nor the database call themselves, but it can orchestrate those calls in a fault-tolerant fashion. The main benefit of using Maestro is that the state of the execution is precisely captured and can be made durable, so you can observe the progression and restart where you left in the presence of crashes.

        The following Maestro code, slightly simplified for presentation, implements an orchestration similar to the example above. It first defines the shape of the data involved in the orchestration: an object type called Customer with a few attributes. It then defines three functions. Function fetch_customers takes no parameters and returns an array of Customers. Its implementation simply performs a GET HTTP request to the appropriate service. The delete_customer function, in this example, simulates the database deletion by calling the print function from the standard library. The orchestration function represents the main entry point. It uses the sequence expression to coordinate the function calls: first call fetch_customers, binding the result to the customers variable, then map over the customers calling delete_customer on each.

        Maestro functions declare interfaces to encapsulate expressions: the bodies of fetch_customers and delete_customer are call expressions, and the body of orchestration is a sequence expression that composes other expressions. But at some point we must yield to the host language to implement the actual service request, database call, print, and so on. This is accomplished by a function whose body is a primitive expression, meaning it binds to the host language code registered under the declared key. For example, these are the declarations of the get and print functions from the standard library of our Ruby implementation:

        We now can use the Maestro interpreter to execute the orchestration function. This is one possible simplified output from the command line:

        The output contains the result of calling print twice, once for each of the customers returned by the fetch service. The interesting aspect here is that the -c flag instructed the interpreter to also dump checkpoints to the standard output.

        Checkpoints are what Maestro uses to store execution state. They contain enough information to understand what has already happened in the orchestration and what wasn’t completed yet. For example, the first checkpoint contains the result of the service request that includes a JSON object with the information about customers to delete. In practice, checkpoints are sent to durable storage, such as Kafka, Redis, or MySQL. Then, if the interpreter stops for some reason, we can restart and point it to the existing checkpoints. The interpreter can recover by skipping expressions for which a checkpoint already exists. If we crash while deleting customers from the database, for example, we wouldn’t re-execute the fetch request because we already have its result.

        The checkpoints mechanism allows Maestro to provide at-least-once semantics for primitive calls, exactly what’s expected of Shopify Flow workflows. In fact, the new Flow engine, at a high level, is essentially a horizontally scalable, distributed pool of workers that execute the Maestro interpreter on incoming events for orchestrations generated by Flow. Checkpoints are used for fault tolerance as well as to give merchants feedback on each execution step, current status, and so on.

        Flow and Maestro Ensemble

        Presto festoso

        Now that we know what Maestro is capable of, let’s see how it plays together with Flow. The following workflow, for example, shows a typical Flow automation use case. It triggers when orders in a store are created and checks for certain conditions in that order, based on the presence of discount codes or the customer’s email. If the condition predicate matches successfully, it adds a tag to the order and subsequently sends an email to the store owner to alert of the discount.

        Screenshot of the Flow app showing the visualization of creating a workflow based on conditions
        A typical Flow automation use case

        Consider a merchant using the Flow App to create and execute this workflow. There are four main activities involved 

        1. navigating the possible tasks and types to use in the workflow
        2. validating that the workflow is correct
        3. activating the workflow so it starts executing on events
        4. monitoring executions.

        Catalog of Tasks and Types

        The Flow Editor displays a catalog of tasks for merchants to pick from. Those are triggers, conditions, and actions provided both by Shopify and Shopify Apps via Shopify Flow Connectors. Furthermore, Flow allows merchants to navigate Shopify’s GraphQL Admin API objects in order to select relevant data for the workflow. For example, the Order created trigger in this workflow conceptually brings an Order resource that represents the order that was just created. So, when the merchant is defining a condition or passing arguments to actions, Flow assists in navigating the attributes reachable from that Order object. To do so, Flow must have a model of the GraphQL API types and understand the interface expected and provided by tasks. Flow achieves this by building on top of Maestro types and functions, respectively.

        Flow models types as decorated Maestro types: the structure is defined by Maestro types, but Flow adds information, such as field and type descriptions. Most types involved in workflows come from APIs, such as the Shopify GraphQL Admin API. Hence, Flow has an automated process to consume APIs and generate the corresponding Maestro types. Additional types can be defined, for example, to model data included in the events that correspond to triggers, and model the expected interface of actions. For instance, the following types are simplified versions of the event data and Shopify objects involved in the example:

        Flow then uses Maestro functions and calls to model the behavior of triggers, conditions, and actions. The following Maestro code shows function definitions for the trigger and actions involved in the workflow above.

        Actions are mapped directly to Maestro functions that define the expected parameters and return types. An action used in a workflow is a call to the corresponding function. A trigger, however, is mapped to a data hydration function that takes event data, which often includes only references by IDs, and loads additional data required by the workflow. For example, the order_created function takes an OrderCreatedTrigger, which contains the order ID as an Integer, and performs API requests to load an Order object, which contains additional fields like name and discountCode. Finally, conditions are currently a special case in that they’re translated to a sequence of function calls based on the predicate defined for the condition (more on that in the next section).

        Workflow Validation

        Once a workflow is created, it needs validation. For that, Flow composes a Maestro function representing the whole workflow. The parameter of the workflow function is the trigger data since it’s the input for its execution. The body of the function corresponds to the transitions and configurations of tasks in the workflow. For example, the following function corresponds to the example:

        The first call in the sequence corresponds to the trigger function that’s used to hydrate objects from the event data. The next three steps correspond to the logical expression configured for the condition. Each disjunction branch becomes a function call (to eq and ends_with, respectively), and the result is computed with or. A Maestro match expression is used to pattern match on the result. If it’s true, the control flow goes to the sequence expression that calls the functions corresponding to the workflow actions.

        Flow now can rely on Maestro static analysis to validate the workflow function. Maestro will type check, verify that every referred variable is in scope, verify that object navigation is correct (for example, that is valid), and so on. Then, any error found through static analysis is mapped back to the corresponding workflow node and presented in context in the Editor. In addition to returning errors, static analysis results contain symbol tables for each expression indicating which variables are in scope and what their types are. This supports the Editor in providing code completion and other suggestions that are specific for each workflow step. The following screenshot, for example, shows how the Editor can guide users in navigating the fields present in objects available when selecting the Add order tags action.

        Flow App Editor screenshot shows how the Editor can guide users in navigating the fields present in objects available when selecting the Add order tags action. The editor shows detailed information for ther.
        Shopify Flow App editor

        Note that transformation and validation run while a Flow workflow is being edited, either in the Flow Editor or via APIs. This operation is synchronous and, thus, must be very fast since merchants are waiting for the results. This architecture is similar to how modern IDEs send source code to a language service that parses the code into a lower level representation and returns potential errors and additional static analysis results.

        Workflow Activation

        Once a workflow is ready, it needs to be activated to start executing. The process is initially similar to validation in that Flow generates the corresponding Maestro function. However, there are a few additional steps. First, Maestro performs a static usage analysis: for each call to a primitive function it computes which attributes of the returned type are used by subsequent steps. For example, the call to shopify::admin::order_created returns a tuple (Shop, Order) but not all attributes of those types are used. In particular, isn’t used by this workflow. It wouldn’t only be inefficient to hydrate that value, in the presence of recursive definitions (such as Order has a Customer who has Orders), it would be impossible to determine where to stop digging into the type graph. The result of usage analysis is then passed at runtime to the host function implementation. The runtime can use it to tailor how it computes the values it returns, for instance, by optimizing the queries to the Admin GraphQL API.

        Second, Maestro performs a compilation step. The idea is to apply optimizations, removing anything unnecessary for the runtime execution of the function, such as type definitions and auxiliary functions that aren’t invoked by the workflow function. The result is a simplified, small, and efficient Maestro function. The compiled function is then packaged together with the result of usage analysis and becomes an orchestration. Finally, the orchestration is serialized and deployed to the Flow Engine that observes events and runs the Maestro interpreter on the orchestration.

        Monitoring Executions

        As the Flow Engine executes orchestrations, the Maestro interpreter emits checkpoints. As we discussed before, checkpoints are used by the engine when restarting the interpreter to ensure at-least-once semantics for actions. Additionally, checkpoints are sent back to Flow to feed the Activity page, which lists workflow executions. Since checkpoints have detailed information about the output of every primitive function call, they can be used to map back to the originating workflow step and offer insight into the behavior of executions.

        A screenshot of the Flow run log from the Activity Page. It displays the results of the workflow execution at each step
        Shopify Flow Run Log from the Activity page

        For instance, the image above shows a Run Log for a specific execution of the example, which can be accessed from the Activity page. Note that Flow highlights the branch of the workflow that executed and which branch of the condition disjunction actually evaluated to true at runtime. All this information comes directly from interpreting checkpoints and mapping back to the workflow.

        Outro: Future Work

        Largo maestoso

        In this post I introduced Maestro, a domain-specific orchestration language we developed to power Shopify Flow. I gave a sample of what Maestro looks like and how it neatly integrates with Flow, supporting features of both the Flow Editor as well as the Flow Engine. Maestro has been powering Flow for a while, but we are planning more, such as:

        • Improving the expressiveness of the Flow workflow language, making better use of all the capabilities Maestro offers. For example, allowing the definition of variables to bind the result of actions for subsequent use, support for iteration, pattern matching, and so on.
        • Implementing additional optimizations on deployment, such as merging Flow workflows as a single orchestration to avoid redundant hydration calls for the same event.
        • Using the Maestro interpreter to support previewing and testing of Flow workflows, employing checkpoints to show results and verify assertions.

        If you are interested in working with Flow and Maestro or building systems from the ground up to solve real-world problems. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

        Continue reading

        React Native Skia—For Us, For You, and For Fun

        React Native Skia—For Us, For You, and For Fun

        Right now, you are likely reading this content on a Skia surface. It powers Chrome, Android, Flutter, and others. Skia is a cross-platform drawing library that provides a set of drawing primitives that you can trust to run anywhere: iOS, Android, macOS, Windows, Linux, the browser, and now React Native.

        Our goal with this project is twofold. First we want to provide React Native, which is notorious for its limited graphical capabilities, with a set of powerful 2D drawing primitives that are consistent across iOS, Android, and the Web.  Second is to bridge the gap between graphic designers and React Native: by providing the same UI capabilities as a tool like Figma. Everyone can now speak the same language.

        Skia logo image. The background is back and Skia is displayed in cursive rainbow font in the middle of the screen
        React Native Skia logo

        To bring the Skia library to React Native, we needed to rely on the new React Native architecture, JavaScript Interface (JSI). This new API enables direct communication between JavaScript and native modules using C++ instead of asynchronous messages between the two worlds. JSI allows us to expose the Skia API directly in the following way:

        We are making this API virtually 100% compatible with the Flutter API allowing us to do two things:

        1. Leverage the completeness and conciseness of their drawing API
        2. Eventually provide react-native-web support for Skia using CanvasKit, the Skia WebAssembly build used by Flutter for its web apps.

        React is all about declarative UIs, so we are also providing a declarative API built on top of the imperative one. The example above can also be written as:

        This API allows us to provide an impressive level of composability to express complex drawings, and it allows us to perform declarative optimizations. We leverage the React Reconciler to perform the work of diffing the internal representation states, and we pass the differences through to the Skia engine.

        React Native Skia offers a wide range of APIs such as advanced image filters, shaders, SVG, path operations, vertices, and text layouts. The demo below showcases a couple of drawing primitives previously unavailable in the React Native ecosystem. Each button contains a drop and inner shadow, the progress bar is rendered with an angular gradient,  and the bottom sheet uses a backdrop filter to blur the content below it.

        Below is an example of mesh gradients done using React Native Skia

        Reanimated 2 (a project also supported by Shopify) brought to life the vision of React Native developers writing animations directly in JavaScript code by running it on a dedicated thread. Animations in React native Skia work the same way. Below is an example of animation in Skia:

        Example of the Breathe code animated

        If your drawing animation depends on an outside view, like a React Native gesture handler, for instance, we also provide a direct connector to Reanimated 2.

        With React Native Skia, we expect to be addressing a big pain point of the React Native community. And it is safe to say that we are only getting started. We are working on powerful new features which we cannot wait to share with you in the upcoming months. We also cannot wait to see what you build with it. What are you waiting for!? npm install @shopify/react-native-skia.

        Christian Falch has been involved with React Native since 2018, both through open source projects and his Fram X consultancy. He has focused on low-level React Native coding integrating native functionality with JavaScript and has extensive experience writing C++ based native modules.

        William Candillon is the host of the “Can it be done in React Native?” YouTube series, where he explores advanced user-experiences and animations in the perspective of React Native development. While working on this series, William partnered with Christian to build the next-generation of React Native UIs using Skia.

        Colin Gray is a Principal Developer of Mobile working on Shopify’s Point of Sale application. He has been writing mobile applications since 2010, in Objective-C, RubyMotion, Swift, Kotlin, and now React Native. He focuses on stability, performance, and making witty rejoinders in engineering meetings. Reach out on LinkedIn to discuss mobile opportunities at Shopify!

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Data Is An Art, Not Just A Science—And Storytelling Is The Key

        Data Is An Art, Not Just A Science—And Storytelling Is The Key

        People often equate data science with statistics, but it’s so much more than that. When data science first emerged as a craft, it was a combination of three different skill sets: science, mathematics, and art. But over time, we’ve drifted. We’ve come to prioritize the scientific side of our skillset and have lost sight of the creative part.

        A venn diagram with three circles: Math, Art, and Science. In the middle is Data Science
        A Venn diagram of the skills that make up the craft of data science

        One of the most neglected, yet arguably most important, skills from the artistic side of data science is communication. Communication is key to everything we do as data scientists. Without it, our businesses won’t be able to understand our work, let alone act on it.

        Being a good data storyteller is key to being a good data scientist. Storytelling captures your stakeholders’ attention, builds trust with them, and invites them to fully engage with your work. Many people are intimidated by numbers. By framing a narrative for them, you create a shared foundation they can work from. That’s the compelling promise of data storytelling.

        Data science is a balancing act—math and science have their role to play, but so do art and communication. Storytelling can be the binding force that unites them all. In this article, I’ll explore how to tell an effective data story and illustrate with examples from our practice at Shopify. Let’s dive in.

        What Is Data Storytelling?

        When you Google data storytelling, you get definitions like: “Data storytelling is the ability to effectively communicate insights from a dataset using narratives and visualizations.” And while this isn’t untrue, it feels anemic. There’s a common misconception that data storytelling is all about charts, when really, that’s just the tip of the iceberg.

        Even if you design the most perfect visualization in the world—or run a report, or create a dashboard—your stakeholders likely won’t know what to do with the information. All of the burden of uncovering the story and understanding the data falls back onto them.

        At its core, data storytelling is about taking the step beyond the simple relaying of data points. It’s about trying to make sense of the world and leveraging storytelling to present insights to stakeholders in a way they can understand and act on. As data scientists, we can inform and influence through data storytelling by creating personal touch points between our audience and our analysis. As with any good story, you need the following key elements:

        1. The main character: Every story needs a hero. The central figure or “main character” in a data story is the business problem. You need to make sure to clearly identify the problem, summarize what you explored when considering the problem, and provide any reframing of the problem necessary to get deeper insight.
        2. The setting: Set the stage for your story with context. What background information is key to understanding the problem? You're not just telling the story; you're providing direction for the interpretation, ideally in as unbiased a way as possible. Remember that creating a data story doesn’t mean shoe-horning data into a preset narrative—as data scientists, it’s our job to analyze the data and uncover the unique narrative it presents.
        3. The narrator: To guide your audience effectively, you need to speak to them in a way they understand and resonate with. Ideally, you should communicate your data story in the language of the receiver. For example, if you’re communicating to a non-technical audience, try to avoid using jargon they won’t be familiar with. If you have to use technical terms or acronyms, be sure to define them so you’re all on the same page.
        4. The plot: Don’t leave your audience hanging—what happens next? The most compelling stories guide the reader to a response and data can direct the action by providing suggestions for next steps. By doing this, you position yourself as an authentic partner, helping your stakeholders figure out different approaches to solving the problem.

        Here’s how this might look in practice on a sample data story:


        Main Character Setting Narrator Plot
        Identify the business question you're trying to solve. What background information is key to understanding the problem. Ensure you're communicating in a way that your audience will understand. Use data to direct the action by providing next steps.
        Ex. Why aren't merchants using their data to guide their business decisions? Ex. How are they using existing analytic products and what might be preventing use? Ex. Our audience are busy execs who prefer short bulleted lists in a Slack message. Ex. Data shows merchants spend too much time going back and forth between their Analytics and Admin page. We recommend surfacing analytics right within their workflow.


        With all that in mind, how do you go about telling effective data stories? Let me show you.

        1. Invest In The Practice Of Storytelling

        In order to tell effective data stories, you need to invest in the right support structures. First of all, that means laying the groundwork with a strong data foundation. The right foundation ensures you have easy access to data that is clean and conformed, so you can move quickly and confidently. At Shopify, our data foundations are key to everything we do—it not only supports effective data storytelling, but also enables us to move purposefully during unprecedented moments.

        For instance, we’ve seen the impact data storytelling can have while navigating the pandemic. In the early days of COVID-19, we depended on data storytelling to give us a clear lens into what was happening, how our merchants were coping, and how we could make decisions based on what we were seeing. This is a story that has continued to develop and one that we still monitor to this day.

        Since then, our data storytelling approach has continued to evolve internally. The success of our data storytelling during the pandemic was the catalyst for us to start institutionalizing data storytelling through a dedicated working group at Shopify. This is a group for our data scientists, led by data scientists—so they fully own this part of our craft maturity.

        Formalizing this support network has been key to advancing our data storytelling craft. Data scientists can drop in or schedule a review of a project in process. This group also provides feedback and informed guidance on how to improve the story that the analysis is trying to tell so communications back to stakeholders is most impactful. The goal is to push our data scientists to take their practice to the next level—by providing context, explaining what angles they already explored, offering ways to reframe the problem, and sharing potential next steps.

        Taking these steps to invest in the practice of data storytelling ensures that when our audience receives our data communications, they’re equipped with accurate data and useful guidance to help them choose the best course of action. By investing in the practice of data storytelling, you too can ensure you’re producing the highest quality work for your stakeholders—establishing you as a trusted partner.

        2. Identify Storytelling Tools And Borrow Techniques From The Best

        Having the right support systems in place is key to making sure you’re telling the right stories—but how you tell those stories is just as important. One of our primary duties as data scientists is decision support. This is where the art and communication side of the practice comes in. It's not just a one-and-done, "I built a dashboard, someone else can attend to that story now." You’re committed to transmitting the story to your audience. The question then becomes, how can you communicate it as effectively as possible, both to technical and non-technical partners?

        At Shopify, we’ve been inspired by and have adopted design studio Duarte’s Slidedocs approach. Slidedocs is a way of using presentation software like PowerPoint to create visual reports that are meant to be read, not presented. Unlike a chart or a dashboard, what the Slidedoc gives you is a well-framed narrative. Akin to a “policy brief” (like in government), you can pack a dense amount of information and visuals into an easily digestible format that your stakeholders can read at their leisure. Storytelling points baked into our Slidedocs include:

        • The data question we’re trying to answer
        • A description of our findings
        • A graph or visualization of the data 
        • Recommendations based on our findings
        • A link to the in-depth report
        • How to contact the storyteller
        A sample slidedoc example from Shopify Data. It highlights information for a single question by describing findings and recommendations. It also links out to the in-depth report
        An example of how to use Slidedocs for data storytelling

        Preparing a Slidedoc is a creative exercise—there’s no one correct way to present the data, it’s about understanding your audience and shaping a story that speaks to them. What it allows us to do is guide our stakeholders as they explore the data and come to understand what it’s communicating. This helps them form personal touchpoints with the data, allowing them to make a better decision at the end.

        While the Slidedocs format is a useful method for presenting dense information in a digestible way, it’s not the only option. For more inspiration, you can learn a lot from teams who excel at effective communication, such as marketing, PR, and UX. Spend time with these teams to identify their methods of communication and how they scaffold stories to be consumed. The important thing is to find tools that allow you to present information in a way that’s action-oriented and tailored for the audience you’re speaking to.

        3. Turn Storytelling Into An Experience

        The most effective way to help your audience feel invested in your data story is to let them be a part of it. Introducing interactivity allows your audience to explore different facets of the story on demand, in a sense, co-creating the story with you. If you supply data visualizations, consider ways that you can allow your audience to filter them, drill into certain details, or otherwise customize them to tell bigger, smaller, or different stories. Showing, not telling, is a powerful storytelling technique.

        A unique way we’ve done this at Shopify is through a product we created for our merchants that lets them explore their own data. Last fall, we launched the BFCM 2021 Notebook—a data storytelling experience for our merchants with a comprehensive look at their store performance over Black Friday and Cyber Monday (BFCM).

        While we have existing features for our merchants that show, through reports and contextual analytics, how their business is performing, we wanted to take it to the next level by giving them more agency and a personal connection to their own data. That said, we understand it can be overwhelming for our merchants (or anyone!) to have access to a massive set of data, but not know how to explore it. People might not know where to start or feel scared that they’ll do it wrong.

        Example BFCM 2021 Notebook. The notebook shows a graph of the sales over time during BFCM weekend 2021
        Shopify’s BFCM Notebook

        What the BFCM Notebook provided was a scaffold to support merchants’ data exploration. It’s an interactive visual companion that enables merchants to dive into their performance data (e.g. total sales, top-performing products, buyer locations) during their busiest sale season. Starting with total sales, merchants could drill into their data to understand their results based on products, days of the week, or location. If they wanted to go even deeper, they could click over the visualizations to see the queries that powered them—enabling them to start thinking about writing queries of their own.

        Turning data storytelling into an experience has given our merchants the confidence to explore their own data, which empowers them to take ownership of it. When you’re creating a data story, consider: Are there opportunities to let the end user engage with the story in interactive ways?

        Happily Ever After

        Despite its name, data science isn’t just a science; it’s an art too. Data storytelling unites math, science, art, and communication to help you weave compelling narratives that help your stakeholders comprehend, reflect on, and make the best decisions about your data. By investing in your storytelling practice, using creative storytelling techniques, and including interactivity, you can build trust with your stakeholders and increase their fluency with data. The creative side of data science isn’t an afterthought—it’s absolutely vital to a successful practice.

        Wendy Foster is the Director of Engineering & Data Science for Core Optimize at Shopify. Wendy and her team are focused on exploring how to better support user workflows through product understanding, and building experiences that help merchants understand and grow their business.

        Are you passionate about data discovery and eager to learn more, we’re always hiring! Visit our Data Science and Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

        Continue reading

        Building a Business System Integration and Automation Platform at Shopify

        Building a Business System Integration and Automation Platform at Shopify

        Companies organize and automate their internal processes with a multitude of business systems. Since companies function as a whole, these systems need to be able to talk to one another. At Shopify, we took advantage of Ruby, Rails, and our scale with these technologies to build a business system integration solution.

        The Modularization of Business Systems

        In step with software design’s progression from monolithic to modular architecture, business systems have proliferated over the past 20 years, becoming smaller and more focused. Software hasn’t only targeted the different business domains like sales, marketing, support, finance, legal, and human resources, but the niches within or across these domains, like tax, travel, training, documentation, procurement, and shipment tracking. Targeted applications can provide the best experience by enabling rapid development within a small, well defined space.

        The Gap

        The transition from monolithic to modular architecture doesn’t remove the need for interaction between modules. Maintaining well-defined, versioned interfaces and integrating with other modules is one of the biggest costs of modularization. In the business systems space, however, it doesn’t always make sense for vendors to take responsibility for integration, or do it in the same way.

        Business systems are built on different tech stacks with different levels of competition and different customer requirements. This landscape leads to business systems with asymmetric interfaces (from SOAP to FTP to GraphQL) and integration capabilities (from complete integration platforms to nothing). Businesses are left with a gap between their systems and no clear, easy way to fill it.

        Organic Integration

        Connecting these systems on an as needed basis leads to a hacky hodgepodge of:

        • ad hoc code (often running on individual’s laptops)
        • integration platforms like Zapier
        • users downloading and uploading CSVs
        • third party integration add ons from app stores
        • out of the box integrations
        • custom integrations built on capable business systems.

        Frequently data won’t be going from the source system directly to the target system, but has multiple layovers in whatever systems it could integrate with. The only determining factors are the skillsets and creativity of the people involved in building the integration.

        When a company is small this can work, but as companies scale and the number of integrations grow it becomes unmanageable: data flows are convoluted, raising security questions, and making business critical automation fragile. Just like with monolithic architecture it can become too terrifying and complex to change anything, paralyzing the business systems and preventing them from adapting and scaling to support the company.

        Integration Platform as a Service

        The solution, as validated by the existence of numerous Integration Platform as a Service (IPaaS) solutions like Mulesoft, Dell Boomi, and Zapier, is yet another piece of software that’s responsible for integrating business systems. The consistency provided by using one application for all integration can solve the issues of visibility, fragility, reliability, and scalability.


        At Shopify, we ran into this problem, created a small team of business system integration developers and put them to work building on Mulesoft. This was an improvement, but, because Shopify is a software company, it wasn’t perfect.

        Isolation from Shopify Development

        Shopify employs thousands of developers. We have infrastructure, training, and security teams. We maintain a multitude of packages and have tons of Slack channels for getting help, discussing ideas, and learning about best practices. Shopify is intentionally narrow in the technologies it uses (Ruby, React, and Go) to benefit from this scale.

        Mulesoft is a proprietary platform leveraging XML configuration for the Java virtual machine. This isn’t part of Shopify’s tech stack, so we missed out on many of the advantages of developing at Shopify.

        Issues with Integrating Internal Applications

        Mulesoft’s cloud runtime takes care of infrastructure for its users, a huge advantage of using the platform. However, Shopify has a number of internal services, like shipment tracking, as well as infrastructure, like Kafka, that for security reasons can only be used from within Shopify’s cloud. This meant that we would need to build infrastructure skills on our team to host Mulesoft on our own cloud.

        Although using Mulesoft initially seemed to lower the costs of connecting business systems, due to our unique situation, it had more drawbacks than developing on Shopify’s tech stack.

        Building on Shopify’s Stack

        Unless performance is paramount, in which case we use Go, Ruby is Shopify’s choice for backend development. Generally Shopify uses the Rails framework, so if we’re going to start building business system integrations on Shopify’s tech stack, Ruby on Rails is our choice. The logic for choosing Ruby on Rails within the context of development at Shopify is straightforward, but how do we use it for business system integration?

        The Design Priorities

        When the platform is complete, we want to build reliable integrations quickly. To turn that idea into a design, we need to look at the technical aspects of business system integration that differentiate it from the standard application development Rails is designed around.


        Generally applications are structured around a domain and get to determine the requirements, the data they will and won’t accept. An integration, however, isn’t the source of truth for anything. Any validation we introduce in an integration will be, at best, a duplication of logic in the target application. At worst our logic will create erroneous errors.

        I did this the other day with a Sorbet Struct. I was using it to organize data before posting it. Unfortunately a field was required in the struct that wasn’t required in the target system. This resulted in records failing in transit when the target system would have accepted them.


        Many business systems are highly configurable. Changes in their configuration can lead to changes in their APIs, affecting integrations.

        Airtable, for example, uses the column names as the JSON keys in their API, so changing a column name in the user interface can break an integration. We need to provide visibility into exactly what integrations are doing to help system admins avoid creating errors and quickly resolve them when they arise.


        Business systems are diverse, created at different times by different developers using different technologies and design patterns. For integration work this⁠—most importantly⁠—leads to a wide variety of interfaces like FTP, REST, SOAP, JSON, XML, and GraphQL. If we want a centralized, standardized place to build integrations it needs to support whatever requirements are thrown at it.


        Integrations deal with sensitive information, personally identifiable information (PII), compensation, and anything else that needs to move between business systems. We need to make sure that we aren’t exposing this data.


        Small, point to point integrations are the most reliable and maintainable. This design has the potential to create a lot of duplicate code and infrastructure. If we want to build integrations quickly we need to reuse as much as possible.


        Those are some nice high-level design priorities. How did we implement them?


        From the beginning of the project, documentation has been a priority. We document

        • decisions that we’re making, so they’re understood and challenged in the future as needs change
        • the integrations living on our platform
        • the clients we’ve implemented for connecting to different systems and how to use them
        • how to build on the platform as a whole.

        Initially we were using GitHub’s built-in wiki, but being able to version control our documentation and commit updates alongside the code made it easier to trace changes and ensure documentation was being kept up to date. Fortunately Shopify’s infrastructure makes it very easy to add a static site to a git repository.

        Design priorities covered: transparency, reusability

        Language Features

        Ruby is a mature, feature-rich language. Beyond being Turing complete, over the years it’s added a plethora of features to make programming simpler and more concise. It also has an extensive package ecosystem thanks to Ruby’s wide usage, long life, and generous community. In addition to reusing our code, we’re able to leverage other developer’s and organization’s code. Many business systems have great, well-maintained gems, so integrating with them is as simple as adding the gem and credentials.

        Design priorities covered: reusability

        Rails Engines

        We reused Shopify Core’s architecture, designing our application as a modular monolith made up of Rails Engines. Initially the application didn’t take advantage of Rails Engines and simply used namespaces within the app directory. It quickly became apparent that this model made tracking down an individual integration’s code difficult. You have to go through every one of the app directories, controllers, helpers, and more to see if an integration’s namespace was present.

        After a lot of research and a few conversations with my Shopify engineering mentor, I began to understand Rails Engines. Rails engines are a great fit for our platform because integrations have relatively obvious boundaries, so it’s easy and advantageous to modularize them.

        This design enabled us to reuse the same infrastructure for all our integrations. It also enabled us to share code across integrations by creating a common Rails Engine, without the overhead of packaging it up into rubygems or duplicating it. This reduces both development and maintenance costs.

        In addition, this architecture benefitted transparency by keeping all of the code in one place and modularizing it. It’s easy to know what integrations exist and what code belongs to them.

        Design priorities covered: reusability, transparent

        Eliminating Data Storage

        Our business system integration platform won’t be the source of truth for any business data. The business data comes from other business systems and passes through our application.

        If we start storing data in our application it can become stale, out of sync with the source of truth. We could end up sending stale data to other systems and triggering the wrong processes. Tracking this all down requires digging through databases, logs, and timestamps in multiple systems, some without good visibility.

        Data storage adds complexity, hurts transparency, and introduces security and compliance concerns.

        Design priorities covered: transparent, minimal, secure


        Business system integration consists almost entirely of business logic. In Rails, there are multiple places this could live, but they generally involve abstractions designed around building standalone applications, not integrations. Using one of these abstractions would add complexity and obfuscate the logic.

        Actions were floating around Shopify as a potential home for business logic. They have the same structure as Active Jobs, one public method, perform, and don’t reference any other Actions. The Action concept provides consistency, making all integration logic easy to find. It also provides transparency by putting all business logic in one place, so it’s only necessary to look at one Action to understand a data flow.

        One of the side effects of Actions is code duplication. This was a trade-off we accepted. Given that integrations should be acting independently, we would prefer to duplicate some code than tightly couple integrations.

        Design priorities covered: transparent, minimal

        Embracing Hashes

        Dataflows are the purpose of our application. In every integration we are dealing with at least two API abstractions of complex systems. Introducing our own abstractions on top of these abstractions can quickly compound complexity. If we want the application to be transparent, it needs to be obvious what data is flowing through it and how the data is being modified.

        Most of the data we’re working with is JSON. In Ruby, JSON is represented as a hash, so working with hashes directly often provides the best transparency with the least room for introducing errors.

        I know, I know. We all hate to see strings in our code, but hear me out. You receive a JSON payload. You need to transform it and send out another JSON payload with different keys. You could map the original payload to an object, map that object to another object, and map the final object back to JSON. If you want to track that transformation, though, you need to track it through three transformations. On the other hand, you could use a hash and a transform function and have the mapping clearly displayed.

        Using hashes leads to more transparency than abstracting them away, but it also can lead to typos and therefore errors, so it’s important to be careful. If you’re using a string key multiple times, turn it into a constant.

        Design priorities covered: transparent, minimal

        Low-level Mocking

        At Shopify, we generally use Mocha for mocking, but for our use case we default to WebMock. WebMock mocks at the request level, so you see the URL, including query parameters, headers, and request body explicitly in tests. This makes it easy to work directly with business systems API documentation because this is the level it’s documented at, and it allows us to understand exactly what our integrations are doing.

        There are some cases, though, where we use Mocha, for example with SOAP. Reading a giant XML text string doesn’t provide useful visibility into what data is being sent. WebMock tests also become complex when many requests are involved in the integration. We’re working on improving the testing experience for complex integrations with common factories and prebuilt WebMocks.

        Design priorities covered: transparent


        Perhaps most importantly, we’ve been able to tap into development at Shopify by leveraging our:

        • infrastructure, so all we have to do to stand up an application or add a component is run dev runtime
        • training team to help onboard our developers
        • developer pipeline for hiring
        • observability through established logging, metrics and tracing setups
        • internal shipment tracking service
        • security team standards and best practices

        The list could go on forever.

        Design priorities covered: reusability, security

        It’s been a year since work on our Rails integration platform began. Now, we have 18 integrations running, have migrated all our Mulesoft apps to the new platform, have doubled the number of developers from one to two and have other teams building integrations on the platform. The current setup enables us to build simple integrations, the majority of our use case, quickly and securely with minimal maintenance. We’re continuing to work on ways to minimize and simplify the development process, while supporting increased complexity, without harming transparency. We’re currently focused on improving test mock management and the onboarding process and, of course, building new integrations.

        Will is a Senior Developer on the Solutions Engineering Team. He likes building systems that free people to focus on creative, iterative, connective work by taking advantage of computers' scalability and consistency.

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        Möbius: Shopify’s Unified Edge

        Möbius: Shopify’s Unified Edge

        While working on improvements to the Shopify platform in terms of how we handle traffic, we identified that through time each change became more and more challenging. When deploying a feature, if we wanted the whole platform to benefit from it, we couldn’t simply build it “one way fits all”—we already had more than six different ways for the traffic to reach us. To list some, we had a traffic path for:

        • “general population” (GenPop) Shopify Core: the monolith serving shops that’s already using an edge provider
        • GenPop Shopify applications and services
        • GenPop Shopify applications and services that’s using an edge provider
        • Personally identifiable information (PII) restricted Shopify Core: where we need to make sure that the traffic and related data are kept in a specific area of the world
        • PII restricted Shopify applications and services
        • publicly-accessible APIs that are required for Mutual Transport Layer Security (mTLS) authentication.
        LOTR Meme stating: "How many ways are there for traffic to reach Shopify?"
        LOTR Meme

        We had to choose which traffic path could use those new features or build the same feature in more than six different ways. Moreover, many different traffic paths means that observability gets blurred. When we receive requests and something doesn’t work properly, figuring out why and how to fix it requires more time. It also takes longer to onboard new team members to all of those possibilities and how to distinguish between them.

        LOTR meme image of Frodo holding the ring stating: One edge to reach them all, one edge to secure them; One edge to bring them all and in the clusters bind them.
        One edge to reach them all, one edge to secure them; One edge to bring them all and in the clusters bind them.

        This isn’t the way to build a highway to new features and improvements. I’d like to tell you why and how we built the one edge to front our services and systems.

        One Does Not Simply Know What “The Edge” Stands For

        LOTR meme image of Aragorn stating: One does not simply know what "The Edge" stands for.
        One does not simply know what "The Edge" stands for.

        The most straightforward definition of the edge, or network edge, is the point at which an enterprise-owned network connects to a third-party network. With cloud computing, lines are slightly blurred as we use third parties to provide us with servers and networks (even more when using a provider to front your network, like we do at Shopify). But in both those cases, as long as they’re used and controlled by Shopify, they’re considered part of our network.

        The edge of Shopify is where requests from outside our network are made to reach our network.

        The Fellowship of the Edge

        Unifying our edge became our next objective and two projects were born to make this possible: Möbius, which as the name taken from the “Möbius strip” suggests, was to be the one edge of Shopify and Shopify Front End (SFE), the routing layer that receives traffic from Möbius and dispatches it to where it needs to go.

        A flow diagram showing Möbius’s traffic path that takes requests from the internet to the routing layer and then sends traffic to the application’s clusters for traffic to be served. Purple entities are on the traffic path for PII restricted traffic, while the beige ones are for the GenPop traffic.
        Möbius’s traffic path takes requests from the internet to the routing layer and then sends traffic to the application’s clusters for traffic to be served. Purple entities are on the traffic path for PII restricted traffic, while the beige ones are for the GenPop traffic.

        About a year before starting Möbius, we already had a small number of applications handled through our edge, but we saw limitations in terms of how to properly automate such an approach at scale, while the gains to the platform justified the monetary costs to reach those gains. We designed SFE and Möbius together, leading to a better separation of concerns between the edge and the routing layers.

        The Shopify Front End

        SFE is designed to provide a unified routing layer behind Möbius. Deployed in many different regions, routing clusters can receive any kind of web traffic from Möbius, whether for Shopify Core or Applications. Those clusters are mainly nginx deployments with custom Lua code to handle the routing according to a number of criteria, including but not limited to the IP address a client connected to and the domain that was used to reach Shopify. For the PII restricted requirements, parallel deployments of the same routing clusters code are deployed in the relevant regions.

        To handle traffic for applications and services, SFE works by using a centralized API receiving requests from Kubernetes controllers deployed in every cluster using such applications and services. This allows linking the domain names declared by an application to the clusters where the application is deployed. We also use this to provide active/active (when two instances of a given service can receive requests at the same time) or active/passive (when only a single instance of a given service can receive requests) load balancing.

        Providing load balancing at the routing layer instead of DNS allows for near instantaneous traffic changes instead of depending on the Time to Live as described in my previous post. It avoids those decisions being made on the client side and thus provides us with better command and control over the traffic.


        Möbius’s core concerns are simple: we grab the traffic from outside of Shopify and make sure it makes its way inside of Shopify in a stable, secure, and performant manner. Outside of Shopify is any client connecting to Shopify from outside a Shopify cluster. Inside of Shopify is, as far as Möbius is concerned, the routing cluster with the lowest latency to the receiving edge’s point-of-presence (PoP).

        Möbius is responsible for TLS and TCP termination with the clients, and doing that termination as close as possible to the client. It brings faster requests and better DDoS protection, plus it allows us to filter malicious requests before the traffic even reaches our clusters. This is something that was already done for our GenPop Shopify Core traffic, but Möbius now standardizes. On top of handling the certificates for the shops, we added an automated path to handle certificates for applications domains.

        A flow diagram showing the configuration of the edge with Möbius and SFE. Domains updates are intercepted to update the edge provider’s domains and certificates store, making sure that we’re able to terminate TCP and TLS for those domains and let the request follow its path
        Configuration of the edge with Möbius and SFE. Domains updates are intercepted to update the edge provider’s domains and certificates store, making sure that we’re able to terminate TCP and TLS for those domains and let the request follow its path

        SFE already needs to be aware of the domains that the applications respond to, so instead of building the same logic a second time to configure the edge, we piggybacked on the work the SFE controller was already doing. We added handlers in the centralized API to configure those domains at the edge, through API requests to our vendor, and indicate we’re expecting to receive traffic on those, and to forward requests to SFE. Our API handler takes care of each and any DNS challenge to validate that we own the domain in order for the traffic to start flowing, but also obtains a valid certificate.

        Prior to Möbius, if an application owner wanted to take advantage of the edge, they had to configure their domain manually at the edge (validating ownership, obtaining a certificate, setting up the routing), but Möbius provides full automation of that setup, allowing application owners to simply configure their ingress and DNS and ripe the benefits of the edge right away.

        Finally, it’s never easy to have many systems migrate to use a new one. We aimed to make that change as easy as possible for application owners. With automation deploying all that was required, the last required step was a simple DNS change for applications domains, from targeting a direct-to-cluster record to targeting Möbius. We wanted to keep that change manual to make sure that application owners own the process and make sure that nothing gets broken.

        A screenshot of the dashboard for the application. It displays hostnames configured to serve an application and the status of it's edge
        Example dashboard for the application (accessible publicly and used for debugging connectivity issues with merchants). On the dashboard, we can find a link to its edge logs, see that the domains of the application are properly configured at the edge to receive traffic, and provide a TLS certificate. A test link also allows to simulate a connection to the platform using that domain so the response can be verified manually.

        To make sure all is fine for an application before (and after!) migration, we also added observability in the form of easy:

        • access to the logs for a given application at the edge
        • identification of which domains an application will have configured at the edge,
        • understanding of what is the status of those domains.

        This allows owners of applications and services to immediately identify if one of their domains isn’t configured or behaving as expected.

        Our Precious Edge

        A drawing of Gollum's face (from Lord of the Rings) staring at a Mobius strip like it's the ring

        On top of all the direct benefits that Möbius provides right away, it allows us to build the future of Shopify’s edge. Different teams are already working on improvements to the way we do caching at the edge, for instance, or on ways to use other edge features that we’re not already taking advantage of. We also have ongoing projects to handle cluster-to-cluster communications by avoiding the traffic from going through the edge and coming back to our clusters by taking advantage of SFE.

        Using new edge features and standardizing internal communications is possible because we unified the edge. There are exceptions where we need to avoid cross-dependency for applications and services on which either Möbius or SFE depend to function. If we were to onboard them to use Möbius and SFE, whenever an issue would happen, we would be in a crash-loop situation: Möbius/SFE requires that application to work, but that application requires Möbius/SFE to work.

        It’s now way easier to explain to new Shopifolk how traffic reaches us and what happens between a client and Shopify. There’s no need for as many conditionals in those explanations, nor as many whiteboards… but we might need more of those to explain all that we do as we grow the capabilities on our now-unified edge!

        Raphaël Beamonte holds a Ph.D. in Computer Engineering in systems performance analysis and tracing, and sometimes gives lectures to future engineers, at Polytechnique Montréal, about Distributed Systems and Cloud Computing.

        If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

        Continue reading

        Code Ranges: A Deeper Look at Ruby Strings

        Code Ranges: A Deeper Look at Ruby Strings

        Contributing to any of the Ruby implementations can be a daunting task. A lot of internal functionality has evolved over the years or been ported from one implementation to another, and much of it is undocumented. This post is an informal look at what makes encoding-aware strings in Ruby functional and performant. I hope it'll help you get started digging into Ruby on your own or provide some additional insight into all the wonderful things the Ruby VM does for you.

        Ruby has an incredibly flexible, if not unusual, string representation. Ruby strings are generally mutable, although the core library has both immutable and mutable variants of many operations. There’s also a mechanism for freezing strings that makes String objects immutable on a per-object or per-file basis. If a string literal is frozen, the VM will use an interned version of the string. Additionally, strings in Ruby are encoding-aware, and Ruby ships with 100+ encodings that can be applied to any string, which is in sharp contrast to other languages that use one universal encoding for all its strings or prevent the construction of invalid strings.

        Depending on the context, different encodings are applied when creating a string without an explicit encoding. By default, the three primary ones used are UTF-8, US-ASCII, and ASCII-8BIT (aliased as BINARY). The encoding associated with a string can be changed with or without validation. It is possible to create a string with an underlying byte sequence that is invalid in the associated encoding.

        The Ruby approach to strings allows the language to adapt to many legacy applications and esoteric platforms. The cost of this flexibility is the runtime overhead necessary to consider encodings in nearly all string operations. When two strings are appended, their encodings must be checked to see if they're compatible. For some operations, it's critical to know whether the string has valid data for its attached encoding. For other operations, it's necessary to know where the character or grapheme boundaries are.

        Depending on the encoding, some operations are more efficient than others. If a string contains only valid ASCII characters, each character is one byte wide. Knowing each character is only a byte allows operations like String#[], String#chr, and String#downcase to be very efficient. Some encodings are fixed width—each "character" is exactly N bytes wide. (The term "character" is vague when it comes to Unicode. Ruby strings (as of Ruby 3.1) have methods to iterate over bytes, characters, code points, and grapheme clusters. Rather than get bogged down in the minutiae of each, I'll focus on the output from String#each_char and use the term "character" throughout.) Many operations with fixed-width encodings can be efficiently implemented as character offsets are trivial to calculate. In UTF-8, the default internal string encoding in Ruby (and many other languages), characters are variable width, requiring 1 - 4 bytes each. That generally complicates operations because it's not possible to determine character offsets or even the total number of characters in the string without scanning all of the bytes in the string. However, UTF-8 is backwards-compatible with ASCII. If a UTF-8 string consists of only ASCII characters, each character will be one byte wide, and if the runtime knows that it can optimize operations on such strings the same as if the string had the simpler ASCII encoding.

        Code Ranges

        In general, the only way to tell if a string consists of valid characters for its associated encoding is to do a full scan of all the bytes. This is an O(n) process, and while not the least efficient operation in the world, it is something we want to avoid. Languages that don't allow invalid strings only need to do the validation once, at string creation time. Languages that ahead-of-time (AOT) compile can validate string literals during compilation. Languages that only have immutable strings can guarantee that once a string is validated, it can never become invalid. Ruby has none of those properties, so its solution to reducing unnecessary string scans is to cache information about each string in a field known as a code range.

        There are four code range values:


        The code range occupies an odd place in the runtime. As a place for the runtime to record profile information, it's an implementation detail. There is no way to request the code range directly from a string. However, since the code range records information about validity, it also impacts how some operations perform. Consequently, a few String methods allow you to derive the string's code range, allowing you to adapt your application accordingly.

        The mappings are:

        Code range
        Ruby code equivalent
        No representation*
        str.valid_encoding? && !str.ascii_only?

        Table 1:  Mapping of internal code range values to public Ruby methods.

        * – Code ranges are lazily calculated in most cases. However, when requesting information about a property that a code range encompasses, the code range is calculated on demand. As such, you may pass strings around that have an ENC_CODERANGE_UNKNOWN code range, but asking information about its validity or other methods that require the code range, such as a string's character length, will calculate and cache the code range before returning a value to the caller.

        Given its odd standing as somewhat implementation detail, somewhat not, every major Ruby implementation associates a code range with a string. If you ever work on a Ruby implementation's internals or a native extension involving String objects, you'll almost certainly run into working with and potentially managing the code range value.


        In MRI, the code range value is stored as an int value in the object header with bitmask flags representing the values. Each of the values is mutually exclusive from one another. This is important to note because logically, every string with an ASCII-compatible encoding and consists of only ASCII characters is a valid string. However, such a string will never have a code range value of ENC_CODERANGE_VALID. You should use the ENC_CODERANGE(obj) macro to extract the code range value and then compare it against one of the defined code range constants, treating the code range constants essentially the same as an enum (e.g., if (cr == ENC_CODERANGE_7BIT) { ... }).

        If you try to use the code range values as bitmasks directly, you'll have very confusing and difficult to debug results. Due to the way the masks are defined, if a string is annotated as being both ENC_CODERANGE_7BIT and ENC_CODERANGE_VALID it will appear to be ENC_CODERANGE_BROKEN. Conversely, if you try to branch on a combined mask like if (cr & (ENC_CODERANGE_7BIT | ENC_CODERANGE_VALID)) { ... }, that will include ENC_CODERANGE_BROKEN strings. This is because the four valid values are only represented by two bits in the object header. The compact representation makes efficient use of the limited space in the object header but can be misleading to anyone used to working with bitmasks to match and set attributes.

        To help illustrate the point a bit better, I've ported some of the relevant C code to Ruby (see Listing 1):

        Listing 1: MRI's original C code range representation recreated in Ruby.

        JRuby has a very similar implementation to MRI, storing the code range value as an int compactly within the object header, occupying only two bits. In TruffleRuby, the code range is represented as an enum and stored as an int in the object's shape. The enum representation takes up additional space but prevents the class of bugs from misapplication of bitmasks.

        String Operations and Code Range Changes

        The object's code range is a function of both its sequence of bytes and the encoding associated with the object to interpret those bytes. Consequently, when either the bytes change or the encoding changes, the code range value has the potential to be invalidated. When such an operation occurs, the safest thing to do is to perform a complete code range scan of the resulting string. To the best of our ability, however, we want to avoid recalculating the code range when it is not necessary to do so.

        MRI avoids unnecessary code range scans via two primary mechanisms. The first is to simply scan for the code range lazily by changing the string's code range value to ENC_CODERANGE_UNKNOWN. When an operation is performed that needs to know the real code range, MRI calculates it on demand and updates the cached code range with the new result. If the code range is never needed, it's never calculated. (MRI will calculate the code range eagerly when doing so is cheap. In particular, when lexing a source file, MRI already needs to examine every byte in a string and be aware of the string's encoding, so taking the extra step to discover and record the code range value is rather cheap.)

        The second way MRI avoids code range scans is to reason about the code range values of any strings being operated on and how an operation might result in a new code range. For example, when working with strings with an ENC_CODERANGE_7BIT code range value, most operations can preserve the code range value since all ASCII characters stay within the 0x00 - 0x7f range. Whether you take a substring, change the casing of characters, or strip off whitespace, the resulting string is guaranteed to also have the ENC_CODERANGE_7BIT value, so performing a full code range scan would be wasteful. The code in Listing 1 demonstrates some operations on a string with an ENC_CODERANGE_7BIT code range and how the resulting string always has the same code range.

        Listing 2: Changing the case of a string with an ENC_CODERANGE_7BIT code range will always result in a string that also has an ENC_CODERANGE_7BIT code range.

        Sometimes the code range value on its own is insufficient for a particular optimization, in which case MRI will consider additional context. For example, MRI tracks whether a string is "single-byte optimizable." A string is single-byte optimizable if its code range is ENC_CODERANGE_7BIT or if the associated encoding uses characters that are only one-byte wide, such as is the case with the ASCII-8BIT/BINARY encoding used for I/O. If a string is single-byte optimizable, we know that String#reverse must retain the same code range because each byte corresponds to a single character, so reversing the bytes can't change their meaning.

        Unfortunately, the code range is not always easily derivable, particularly when the string's code range is ENC_CODERANGE_VALID or ENC_CODERANGE_BROKEN, in which case a full code range scan may prove to be necessary. Operations performed on a string with an ENC_CODERANGE_VALID code range might result in an ENC_CODERANGE_7BIT string if the source string's encoding is ASCII-compatible; otherwise, it would result in a string with an ENC_CODERANGE_VALID encoding. (We've deliberately set aside the case of String#setbyte which could cause a string to have an ENC_CODERANGE_BROKEN code range value. Generally, string operations in Ruby are well-defined and won't result in a broken string.) In Listing 2, you can see some examples of operations performed against a string with an ENC_CODERANGE_VALID code range resulting in strings with either an ENC_CODERANGE_7BIT code range or an ENC_CODERANGE_VALID coderange.

        Listing 3: Changing the case of a string with an ENC_CODERANGE_VALID code range might result in a string with a different code range.

        Since the source string may have an ENC_CODERANGE_UNKNOWN value and the operation may not need the resolved code range, such as String#reverse called on a string with the ASCII-8BIT/BINARY encoding, it's possible to generate a resulting string that also has an ENC_CODERANGE_UNKNOWN code range. That is to say, it's quite possible to have a string that is ASCII-only but which has an unknown code range that, when operated on, still results in a string that may need to have a full code range scan performed later on. Unfortunately, this is just the trade-off between lazily computing code ranges and deriving the code range without resorting to a full byte scan of the string. To the end user, there is no difference because the code range value will be computed and be accurate by the time it is needed. However, if you're working on a native extension, a Ruby runtime's internals, or are just profiling your Ruby application, you should be aware of how a code range can be set or deferred.

        TruffleRuby and Code Range Derivations

        As a slight digression, I'd like to take a minute to talk about code ranges and their derivations in TruffleRuby. Unlike other Ruby implementations, such as MRI and JRuby, TruffleRuby eagerly computes code range values so that strings never have an ENC_CODERANGE_UNKNOWN code range value. The trade-off that TruffleRuby makes is that it may calculate code range values that are never needed, but string operations are simplified by never needing to calculate a code range on-demand. Moreover, TruffleRuby can derive the code range of an operation's result string without needing to perform a full byte scan in more situations than MRI or JRuby can.

        While eagerly calculating the code range may seem wasteful, it amortizes very well over the lifetime of a program due to TruffleRuby's extensive reuse of string data. TruffleRuby uses ropes as its string representation, a tree-based data structure where the leaves look like a traditional C-style string, while interior nodes represent string operations linking other ropes together. (If you go looking for references to "rope" in TruffleRuby, you might be surprised to see they're mostly gone. TruffleRuby still very much uses ropes, but the TruffleRuby implementation of ropes was promoted to a top-level library in the Truffle family of language implementations, which TruffleRuby has adopted. If you use any other language that ships with the GraalVM distribution, you're also using what used to be TruffleRuby's ropes.) A Ruby string points to a rope, and a rope holds the critical string data.

        For instance, on a string concatenation operation, rather than allocate a new buffer and copy data into it, with ropes we create a "concat rope" with each of the strings being concatenated as its children (see Fig.1). The string is then updated to point at the new concat rope. While that concat rope does not contain any byte data (delegating that to its children), it does store a code range value, which is easy to derive because each child rope is guaranteed to have both a code range value and an associated encoding object.

        Figure 1: A sample rope for the result of “Hello “ + “François”

        Moreover, rope metadata are immutable, so getting a rope's code range value will never incur more overhead than a field read. TruffleRuby takes advantage of that property to use ropes as guards in inline caches for its JIT compiler. Additionally, TruffleRuby can specialize string operations based on the code ranges for any argument strings. Since most Ruby programs never deal with ENC_CODERANGE_BROKEN strings, TruffleRuby's JIT will eliminate any code paths that deal with that code range. If a broken string does appear at runtime, the JIT will deoptimize and handle the operation on a slow path, preserving Ruby's full semantics. Likewise, while Ruby supports 100+ encodings out of the box, the TruffleRuby JIT will optimize a Ruby application for the small number of encodings it uses.

        A String By Any Other Name

        Often string performance discussions are centered around web template rendering or text processing. While important use cases, strings are also used extensively within the Ruby runtime. Every symbol or regular expression has an associated string, and they're consulted for various operations. The real fun comes with Ruby's metaprogramming facilities: strings can be used to access instance variables, look up methods, send messages to objects, evaluate code snippets, and more. Improvements (or degradations) in string performance can have large, cascading effects.

        Backing up a step, I don't want to oversell the importance of code ranges for fast metaprogramming. They are an ingredient in a somewhat involved recipe. The code range can be used to quickly disqualify strings known not to match, such as those with the ENC_CODERANGE_BROKEN code range value. In the past, the code range was used to fail fast when particular identifiers were only allowed to be ASCII-only. While not currently implemented in MRI, such a check could be used to dismiss strings with the ENC_CODERANGE_VALID code range when all identifiers are known to be ENC_CODERANGE_7BIT, and vice versa. However, once a string passes the code range check, there's still the matter of seeing if it matches an identifier (instance variable, method, constant, etc.). With TruffleRuby, that check can be satisfied quickly because its immutable ropes are interned and can be compared by reference. In MRI and JRuby, the equality check may involve a linear pass over the string data as the string is interned. Even that process gets murky depending on whether you're working with a dynamically generated string or a frozen string literal. If you're interested in a deep dive on the difficulties and solutions to making metaprogramming fast in Ruby, Chris Seaton has published a paper about the topic and I've presented a talk about it at RubyKaigi.


        More so than many other contemporary languages, Ruby exposes functionality that is difficult to optimize but which grants developers a great deal of expressivity. Code ranges are a way for the VM to avoid repeated work and optimize operations on a per-string basis, guiding away from slow paths when that functionality isn't needed. Historically, that benefit has been most keenly observed when running in the interpreter. When integrated with a JIT with deoptimization capabilities, such as TruffleRuby, code ranges can help eliminate generated code for the types of strings used by your application and the VM internally.

        Knowing what code ranges are and what they're used for can help you debug issues, both for performance and correctness. At the end of the day, a code range is a cache, and like all caches, it may contain the wrong value. While such instances within the Ruby VM are rare, they're not unheard of. More commonly, a native extension manipulating strings may fail to update a string's code range properly. Hopefully, with a firm understanding of code ranges, you’ll find Ruby's handling of strings less daunting.

        Kevin is a Staff Developer on the Ruby & Rails Infrastructure team where he works on TruffleRuby. When he’s not working on Ruby internals, he enjoys collecting browser tabs, playing drums, and hanging out with his family.

        If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

        Continue reading

        Leveraging Shopify’s API to Build the Latest Marketplace Kit

        Leveraging Shopify’s API to Build the Latest Marketplace Kit

        In February, we released the second version of Marketplace Kit: a collection of boilerplate app code and documentation allowing third-party developers to integrate with Shopify, build selected commerce features, and launch a world-class marketplace in any channel.

        Previously, we used the node app generated by the Shopify command-line interface (CLI) as a foundation. However, this approach came with two drawbacks: If changes are made to the Shopify CLI, it would affect our code and documentation. Also, we had limited control over best practices, which forced us to use the CLI's node app dependencies.

        Since then, we've decoupled code from the Shopify CLI and separated the Marketplace Kit sample app into two separate apps: a full-stack admin app and a buyer-facing client app. For these, we chose dependencies that were widely used, such as Express and NextJS, to appeal to the largest number of partners possible. Open-sourced versions of the apps are publicly available for anyone to try out.

        Shopify’s APIs mentioned or shown in this article:

        Here are a few ways we leveraged Shopify’s APIs to create the merchant-facing Marketplace admin app for version 2.0.

        Before We Get Started

        Here’s a brief overview of app development at Shopify. The most popular server-side language used within the Shopify CLI, to ease the development of apps is Node JS, a server-side JavaScript runtime. That’s why we used it for the Marketplace Kit’s sample admin app. With Node JS, we use a web framework library called Express JS, chosen for reasons such as ease of use and popularity word-wide.

        On the client-side of the admin and buyer apps, we use the main JavaScript frontend library at Shopify: React JS. In the buyer-side app, we chose Next JS, a framework for React JS. This mainly provides structure to the application, as well as built-in features like server-side rendering and typescript support. When sharing data between frontend and backend apps, we use GraphQL and their helper libraries for ease of integration, Apollo Client and Apollo Server.

        It’s also helpful to have familiarity with some key web development concepts, such as JSON Web Token (JWT) authentication, and Vanilla JavaScript, a compiled programming language best known for being used as a scripting language for web pages.

        JWT Authentication with App Bridge

        Let’s start with how we chose to handle authentication in our apps to assess a working example of using an internal library to ease development within Shopify. App Bridge is a standalone library offering React component wrappers for some app actions. It provides an out-of-the box solution for embedding your app inside of the Shopify admin, Shopify POS, and Shopify Mobile. Since we're using React for our embedded channel admin app, we leveraged additional App Bridge imports to handle authentication. Here is a client-side example from the channels admin app:

        The app object, which is an instance of useAppBridge, is used to pass contextual information about the embedded app. We chose to wrap the authenticatedFetch function usage inside of a function with custom auth redirecting. Notice that the authenticatedFetch import does many things under the hood. Notably, it adds two HTTP headers: Authorization with a JWT session token created on demand and X-Requested-With with XMLHttpRequest (basically narrows down the request type and improves security).

        This is the server-side snippet that handles the session token. It resides in our main server file, where we define our spec-compliant GraphQL server, using an Express app as middleware. Within the configuration of our ApolloServer’s context property, you'll see how we handle the auth header:

        Notice how we leverage Shopify’s Node API to decode the session token and then to load the session data, providing us with the store’s access token. Fantastic!

        Quick tip: To add more stores, you can switch out the store value in .env and run the Shopify CLI's shopify app serve command!

        Serving REST & GraphQL With Express

        In our server-side code, we use the apollo-server-express package instead of simply using apollo-server:

        The setup for a GraphQL server using the express-specific package is quite similar to how we would do it with the barebones default package. The difference is that we apply the Apollo Server instance as middleware to an Express HTTP instance with graphQLServer.applyMiddleware({ app }) (or whatever you named your instance).

        If you look at the entire file, you'll see that the webhooks and routes for the Express application are added after starting the GraphQL server. The advantage of using the apollo-server-express package over apollo-server is being able to serve REST and GraphQL at the same time using Express. Serving GraphQL within Express allows us to use Node middleware for common problems like rate-limiting, security and authentication. The trade-off is using a little bit more boilerplate, but since apollo-server is a wrapper to the Express specific one, there’s no noticeable performance difference.

        Check out the Apollo team’s blog post Using Express with GraphQL to read more.

        Custom Client Wrappers

        Here’s an example of custom API clients for data fetching from Shopify’s Node API, applying GraphQL and REST:

        This allows for easier control of our request configuration, like adding custom User-Agent headers with a unique header title for the project, including its npm package version.

        Although Shopify generally encourages using GraphQL, sometimes it makes sense to use REST. In the admin app, we used it for one call: getting the products listings count. There was no need to create a specific query when an HTTP-GET request contains all the information required—using GraphQL would not offer any advantage. It’s also a good example of using REST in the application, ensuring developers who use the admin app as a starting point see an example that takes advantage of both ways to fetch data depending on what’s best for the situation.

        Want to Challenge Yourself?

        For full instructions on getting started with Marketplace Kit, check out our official documentation. To give you an idea, here are screenshots of the embedded admin app and the buyer app, in that order, upon completion of the tutorials in the docs:

        More Stories on App Development and GraphQL:

        For articles aimed at partners, check out our Shopify’s Partners blog where we cover more content related to app development at Shopify.

        Kenji Duggan is a Backend Developer Intern at Shopify, working on the Strategic Partners team under Marketplace Foundation. Recently, when he’s not learning something new as a web developer, he is probably working out or watching anime

        Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

        Continue reading

        The Magic of Merlin: Shopify's New Machine Learning Platform

        The Magic of Merlin: Shopify's New Machine Learning Platform

        Shopify's machine learning platform team builds the infrastructure, tools and abstracted layers to help data scientists streamline, accelerate and simplify their machine learning workflows. There are many different kinds of machine learning use cases at Shopify, internal and external. Internal use cases are being developed and used in specialized domains like fraud detection and revenue predictions. External use cases are merchant and buyer facing, and include projects such as product categorization and recommendation systems.

        At Shopify we build for the long term, and last year we decided to redesign our machine learning platform. We need a machine learning platform that can handle different (often conflicting) requirements, inputs, data types, dependencies and integrations. The platform should be flexible enough to support the different aspects of building machine learning solutions in production, and enable our data scientists to use the best tools for the job.

        In this post, we walk through how we built Merlin, our magical new machine learning platform. We dive into the architecture, working with the platform, and a product use case.

        The Magic of Merlin

        Our new machine learning platform is based on an open source stack and technologies. Using open source tooling end-to-end was important to us because we wanted to both draw from and contribute to the most up-to-date technologies and their communities as well as provide the agility in evolving the platform to our users’ needs.

        Merlin’s objective is to enable Shopify's teams to train, test, deploy, serve and monitor machine learning models efficiently and quickly. In other words, Merlin enables:

        1. Scalability: robust infrastructure that can scale up our machine learning workflows
        2. Fast Iterations: tools that reduce friction and increase productivity for our data scientists and machine learning engineers by minimizes the gap between prototyping and production
        3. Flexibility: users can use any libraries or packages they need for their models

        For the first iteration of Merlin, we focused on enabling training and batch inference on the platform.

        Merlin Architecture

        A high level diagram of Merlin’s architecture

        Merlin gives our users the tools to run their machine learning workflows. Typically, large scale data modeling and processing at Shopify happens in other parts of our data platform, using tools such as Spark. The data and features are then saved to our data lake or Pano, our feature store. Merlin uses these features and datasets as inputs to the machine learning tasks it runs, such as preprocessing, training, and batch inference.

        With Merlin, each use case runs in a dedicated environment that can be defined by its tasks, dependencies and required resources — we call these environments Merlin Workspaces. These dedicated environments also enable distributed computing and scalability for the machine learning tasks that run on them. Behind the scenes, Merlin Workspaces are actually Ray clusters that we deploy on our Kubernetes cluster, and are designed to be short lived for batch jobs, as processing only happens for a certain amount of time.

        We built the Merlin API as a consolidated service to allow the creation of Merlin Workspaces on demand. Our users can then use their Merlin Workspace from Jupyter Notebooks to prototype their work, or orchestrate it through Airflow or Oozie.

        Merlin’s architecture, and Merlin Workspaces in particular, are enabled by one of our core components—Ray.

        What Is Ray?

        Ray is an open source framework that provides a simple, universal API for building distributed systems and tools to parallelize machine learning workflows. Ray is a large ecosystem of applications, libraries and tools dedicated to machine learning such as distributed scikit-learn, XGBoost, TensorFlow, PyTorch, etc.

        When using Ray, you get a cluster that enables you to distribute your computation across multiple CPUs and machines. In the following example, we train a model using Ray:

        We start by importing the Ray package. We call ray.init() to start a new Ray runtime that can run either on a laptop/machine or connect to an existing Ray cluster locally or remotely. This enables us to seamlessly take the same code that runs locally, and run it on a distributed cluster. When working with a remote Ray cluster, we can use the Ray Client API to connect to it and distribute the work.

        In the example above, we use the integration between Ray and XGBoost to train a new model and distribute the training across a Ray cluster by defining the number of Ray actors for the job and different resources each Ray actor will use (CPUs, GPUs, etc.).

        For more information, details and examples for Ray usage and integrations, check out the Ray documentation.

        Ray In Merlin

        At Shopify, machine learning development is usually done using Python. We chose to use Ray for Merlin's distributed workflows because it enables us to write end-to-end machine learning workflows with Python, integrate it with the machine learning libraries we use at Shopify and easily distribute and scale them with little to no code changes. In Merlin, each machine learning project comes with the Ray library as part of its dependencies, and uses it for distributed preprocessing, training and prediction.

        Ray makes it easy for data scientists and machine learning engineers to move from prototype to production. Our users start by prototyping on their local machines or in a Jupyter Notebook. Even at this stage, their work can be distributed on a remote Ray cluster, allowing them to run the code at scale from an early stage of development.

        Ray is a fast evolving open source project. It has short release cycles and the Ray team is continuously adding and working on new features. In Merlin, we adopted capabilities and features such as:

        • Ray Train: a library for distributed deep learning which we use for training our TensorFlow and PyTorch models
        • Ray Tune: a library for experiment execution and hyperparameter tuning
        • Ray Kubernetes Operator: a component for managing deployments of Ray on Kubernetes and autoscale Ray clusters

        Building On Merlin

        A diagram of the user’s development journey in Merlin

        A user’s first interaction with Merlin usually happens when they start a new machine learning project. Let’s walk through a user’s development journey:

        1. Creating a new project: The user starts by creating a Merlin Project where they can place their code and specify the requirements and packages they need for development
        2. Prototyping: Next, the user will create a Merlin Workspace, the sandbox where they use Jupyter notebooks to prototype on a distributed and scalable environment
        3. Moving to Production: When the user is done prototyping, they can productionize their project by updating their Merlin Project with the updated code and any additional requirements
        4. Automating: Once the Merlin Project is updated, the user can orchestrate and schedule their workflow to run regularly in production
        5. Iterating: When needed, the user can iterate on their project by spinning up another Merlin Workspace and prototyping with different models, features, parameters, etc.

        Let's dive a little deeper into these steps.

        Merlin Projects

        The first step of each machine learning use case on our platform is creating a dedicated Merlin Project. Users can create Merlin Projects for machine learning tasks like training a model or performing batch predictions. Each project can be customized to fit the needs of the project by specifying the system-level packages or Python libraries required for development. From a technical perspective, a Merlin Project is a Docker container with a dedicated virtual environment (e.g. Conda, pyenv, etc.), which isolates code and dependencies. As the project requirements change, the user can update and change their Merlin Project to fit their new needs. Our users can leverage a simple-to-use command line interface that allows them to create, define and use their Merlin Project.

        Below is an example of a Merlin Project file hierarchy:

        The config.yml file allows users to specify the different dependencies and machine learning libraries that they need for their use case. All the code relevant to a specific use case is stored in the src folder.

        Once users push their Merlin Project code to their branch, our CI/CD pipelines build a custom Docker image.

        Merlin Workspaces

        Once the Merlin Project is ready, our data scientists can use the centralized Merlin API to create dedicated Merlin Workspaces in prototype and production environments. The interface abstracts away all of the infrastructure-related logic (e.g. deployment of Ray clusters on Kubernetes, creation of ingress, service accounts) so they can focus on the core of the job.

        A high level architecture diagram of Merlin Workspaces

        Merlin Workspaces also allow users to define the resources required for running their project. While some use cases need GPUs, others might need more memory and additional CPUs or more machines to run on. The Docker image that was created for a Merlin Project will be used to spin up the Ray cluster in a dedicated Kubernetes namespace for a Merlin Workspace. The user can configure all of this through the Merlin API, which gives them either a default environment or allows them to select the specific resource types (GPUs, memory, machine types, etc.) that their job requires.

        Here’s an example of a payload that we send the Merlin API in order to create a Merlin Workspace:

        Using this payload will result in a new Merlin Workspace which will spin up a new Ray cluster with the specific pre-built Docker image of one of our models at Shopify—our product categorization model, which we’ll dive into more later on. This cluster will use 20 Ray workers, each one with 10 CPUs, 30GB of memory and 1 nvidia-tesla-t4 GPU. The cluster will be able to scale up to 30 workers.

        After the job is complete, the Merlin Workspace can be shut down, either manually or automatically, and return the resources back to the Kubernetes cluster.

        Prototyping From Jupyter Notebooks

        Once our users have their Merlin Workspace up and running, they can start prototyping and experimenting with their code from Shopify’s centrally hosted JupyterHub environment. This environment allows them to spin up a new machine learning notebook using their Merlin Project's Docker image, which includes all their code and dependencies that will be available in their notebook.

        An example of how our users can create a Merlin Jupyter Notebook

        From the notebook, the user can access the Ray Client API to connect remotely to their Merlin Workspaces. They can then run their remote Ray Tasks and Ray Actors to parallelize and distribute the computation work on the Ray cluster underlying the Merlin Workspace.

        This method of working with Merlin minimizes the gap between prototyping and production by providing our users with the full capabilities of Merlin and Ray right from the beginning.

        Moving to Production

        Once the user is done prototyping, they can push their code to their Merlin Project. This will kick off our CI/CD pipelines and create a new version of the project's Docker image.

        Merlin was built to be fully integrated with the tools and systems that we already have in place to process data at Shopify. Once the Merlin Project's production Docker image is ready, the user can build the orchestration around their machine learning flows using declarative YAML (yet another markup language) templates or by configuring a DAG (directed Acyclic Graph) in our Airflow environment. The jobs can be scheduled to run periodically, call the production Merlin API to spin up Merlin Workspaces and run Merlin jobs on them.

        A simple example of an Airflow DAG running a training job on Merlin

        The DAG in the image above demonstrates a training flow, where we create a Merlin Workspace, train our model on it and—when it’s done—delete the workspace and return the resources back to the Kubernetes cluster.

        We also integrated Merlin with our monitoring and observability tools. Each Merlin Workspace gets its own dedicated Datadog dashboard which allows users to monitor their Merlin job. It also helps them understand more about the computation load of their job and the resources it requires. On top of this, each Merlin job sends its logs to Splunk so that our users can also debug their job based on the errors or stacktrace.

        At this point, our user's journey is done! They created their Merlin Project, prototyped their use case on a Merlin Workspace and scheduled their Merlin jobs using one of the orchestrators we have at Shopify (e.g Airflow). Later on, when the data scientist needs to update their model or machine learning flow, they can go back to their Merlin Project to start the development cycle again from the prototype phase.

        Now that we explained Merlin's architecture and our user journey, let's dive into how we onboarded a real-world algorithm to Merlin—Shopify’s product categorization model.

        Onboarding Shopify’s Product Categorization Model to Merlin

        A high level diagram of the machine learning workflow for the Product Categorization model

        Recently we rebuilt our product categorization model to ensure we understand what our merchants are selling, so we can build the best products that help power their sales. This is a complex use case that requires several workflows for its training and batch prediction. Onboarding this use case to Merlin early on enabled us to validate our new platform, as it requires large scale computation and includes complex machine learning logic and flows. The training and batch prediction workflows were migrated to Merlin and converted using Ray.

        Migrating the training code

        To onboard the product categorization model training stage to Merlin, we integrated its Tensorflow training code with Ray Train, for distributing training across a Ray cluster. With Ray Train, changing the code to support the distributed training was easy and required few code changes - the original logic stayed the same, and the core changes are described in the example below.

        The following is an example of how we integrated Ray Train with our Tensorflow training code for this use-case:

        The TensorFlow logic for the training step stays the same, but is separated out into its own function. The primary change is adding Ray logic to the main function. Ray Train allows us to specify the job configuration, with details such as number of workers, backend type, and GPU usage.

        Migrating inference

        The inference step in the product categorization model is a multi-step process. We migrated each step separately, using the following method. We used Ray ActorPool to distribute each step of batch inference across a Ray cluster. Ray ActorPool is similar to Python's `multiprocessing.Pool` and allows scheduling Ray tasks over a fixed pool of actors. Using Ray ActorPool is straightforward and allows easy configuration for parallelizing the computation.

        Here’s an example of how we integrated Ray ActorPool with our existing inference code to perform distributed batch predictions:

        We first need to create our Predictor class (a Ray Actor), which includes the logic for loading the product categorization model, and performing predictions on product datasets. In the main function, we use the size of the cluster (ray.available_resources()["CPU"]) to create all the Actors that will run in the ActorPool. We then send all of our dataset partitions to the ActorPool for prediction.

        While this method works for us at the moment, we plan to migrate to using Ray Dataset Pipelines which provides a more robust way to distribute the load of the data and perform batch inference across the cluster with less dependence on the number of data partitions or their size.

        What's next for Merlin

        As Merlin and its infrastructure mature, we plan to continue growing and evolving to better support the needs of our users. Our aspiration is to create a centralized platform that streamlines our machine learning workflows in order to enable our data scientists to innovate and focus on their craft.

        Our next milestones include:

        • Migration: Intention to migrate all of Shopify’s machine learning use cases and workflows to Merlin and adding a low code framework to onboard new use cases
        • Online inference: Support real time serving of machine learning models at scale
        • Model lifecycle management: Add model registry and experiment tracking
        • Monitoring: Support monitoring for machine learning

        While Merlin is still a new platform at Shopify, it’s already empowering us with the scalability, fast iteration and flexibility that we had in mind when designing it. We're excited to keep building the platform and onboarding new data scientists, so Merlin can help enable the millions of businesses powered by Shopify.

        Isaac Vidas is a tech lead on the ML Platform team, focusing on designing and building Merlin, Shopify’s machine learning platform. Connect with Isaac on LinkedIn.

        If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

        Continue reading

        Start your free 14-day trial of Shopify