Stop trying to do it all alone, add Kit to your team. Learn more.
When is JIT Faster Than A Compiler?

When is JIT Faster Than A Compiler?

I had this conversation over and over before I really understood it. It goes:

“X language can be as fast as a compiled language because it has a JIT compiler!”
“Wow! It’s as fast as C?”
“Well, no, we’re working on it. But in theory, it could be even FASTER than C!”
“Really? How would that work?”
“A JIT can see what your code does at runtime and optimize for only that. So it can do things C can’t!”
“Huh. Uh, okay.”

It gets hand-wavy at the end, doesn’t it? I find that frustrating. These days I work on YJIT, a JIT for Ruby. So I can make this extremely NOT hand-wavy. Let’s talk specifics.

I like specifics.

Wait, What’s JIT Again?

An interpreter reads a human-written description of your app and executes it. You’d usually use interpreters for Ruby, Python, Node.js, SQL, and nearly all high-level dynamic languages. When you run an interpreted app, you download the human-written source code, and you have an interpreter on your computer that runs it. The interpreter effectively sees the app code for the first time when it runs it. So an interpreter doesn’t usually spend much time fixing or improving your code. They just run it how it’s written. An interpreter that significantly transforms your code or generates machine code tends to be called a compiler.

A compiler typically turns that human-written code into native machine code, like those big native apps you download. The most straightforward compilers are ahead-of-time compilers. They turn human-written source code into a native binary executable, which you can download and run. A good compiler can greatly speed up your code by putting a lot of effort into improving it ahead of time. This is beneficial for users because the app developer runs the compiler for them. The app developer pays the compile cost, and users get a fast app. Sometimes people call anything a compiler if it translates from one kind of code to another—not just source code to native machine code. But when I say “compiler” here, I mean the source-code-to-machine-code kind.

A JIT, aka a Just-In-Time compiler, is a partial compiler. A JIT waits until you run the program and then translates the most-used parts of your program into fast native machine code. This happens every time you run your program. It doesn’t write the code to a file—okay, except MJIT and a few others. But JIT compilation is primarily a way to speed up an interpreter—you keep the source code on your computer, and the interpreter has a JIT built into it. And then long-running programs go faster.

It sounds kind of inefficient, doesn’t it? Doing it all ahead of time sounds better to me than doing it every time you run your program.

But some languages are really hard to compile correctly ahead of time. Ruby is one of them. And even when you can compile ahead of time, often you get bad results. An ahead-of-time compiler has to create native code that will always be correct, no matter what your program does later, and sometimes that means it’s about as bad as an interpreter, which has that exact same requirement.

Ruby is Unreasonably Dynamic

Ruby is like my four-year-old daughter: the things I love most about it are what make it difficult.

In Ruby, I can redefine + on integers like 3, 7, or -45. Not just at the start—if I wanted, I could write a loop and redefine what + means every time through that loop. My new + could do anything I want. Always return an even number? Print a cheerful message? Write “I added two numbers” to a log file? Sure, no problem.

That’s thrilling and wonderful and awful in roughly equal proportions.

And it’s not just +. It’s every operator on every type. And equality. And iteration. And hashing. And so much more. Ruby lets you redefine it all.

The Ruby interpreter needs to stop and check every time you add two numbers if you have changed what + means in between. You can even redefine + in a background thread, and Ruby just rolls with it. It picks up the new + and keeps right on going. In a world where everything can be redefined, you can be forgiven for not knowing many things, but the interpreter handles it.

Ruby lets you do awful, awful things. It lets you do wonderful, wonderful things. Usually, it’s not obvious which is which. You have expressive power that most languages say is a very bad idea.

I love it.

Compilers do not love it.

When JITs Cheat, Users Win

Okay, we’ve talked about why it’s hard for ahead-of-time (AOT) compilers to deliver performance gains. But then, how do JIT compilers do it? Ruby lets you constantly change absolutely everything. That’s not magically easy at runtime. If you can’t compile + or == or any operator, why can you compile some parts of the program?

With a JIT, you have a compiler around as the program runs. That allows you to do a trick.

The trick: you can compile the method wrong and still get away with it.

Here’s what I mean.

YJIT asks, “Well, what if you didn’t change what + means every time?” You almost never do that. So it can compile a version of your method where + keeps its meaning from right now. And so does equals, iteration, hashing, and everything you can change in Ruby but you nearly never do.

But… that’s wrong. What if I do change those things? Sometimes apps do. I’m looking at you, ActiveRecord.

But your JIT has a compiler around at runtime. So when you change what + means, it will throw away all those methods it compiled with the old definition. Poof. Gone. If you call them again, you get the interpreted version again. For a while—until JIT compiles a new version with the new definition. This is called de-optimization. When the code starts being wrong, throw it away. When 3+4 stops being 7 (hey, this is Ruby!), get rid of the code that assumed it was. The devil is in the details—switching from one version of a method to another version midway through is not easy. But it’s possible, and JIT compilers basically do it successfully.

So your JIT can assume you don’t change + every time through the loop. Compilers and interpreters can’t get away with that.

An AOT compiler has to create fully correct code before your app even ships. It’s very limited if you change anything. And even if it had some kind of fallback (“Okay, I see three things 3+4 could be in this app”), it can only respond at runtime with something it figured out ahead of time. Usually, that means very conservative code that constantly checks if you changed anything.

An interpreter must be fully correct and respond immediately if you change anything. So they normally assume that you could have redefined everything at any time. The normal Ruby interpreter spends a lot of time checking if you changed the definition of + over time. You can do  clever things to speed up that check, and CRuby does. But if you make your interpreter extremely clever, pre-building optimized code and invalidating assumptions, eventually you realize that you’ve built a JIT.

Ruby and YJIT

I work on YJIT, which is part of CRuby. We do the stuff I mention here. It’s pretty fast.

There are a lot of fun specifics to figure out. What do we track? How do we make it faster? When it’s invalid, do we need to recompile or cancel it? Here’s an example I wrote recently.

You can try out our work by turning on --yjit on recent Ruby versions. You can use even more of our work if you build the latest head-of-master Ruby, perhaps with ruby-build 3.2.0-dev. You can also get all the details by reading the source, which is built right into CRuby.

By the way, YJIT has some known bugs in 3.1 that mean you should NOT use it for real production. We’re a lot closer now—it should be production-ready for some uses in 3.2, which comes out Christmas 2022.

What Was All That Again?

A JIT can add assumptions to your code, like the fact that you probably didn’t change what + means. Those assumptions make the compiled code faster. If you do change what + means, you can throw away the now-incorrect code.

An ahead-of-time compiler can’t do that. It has to assume you could change anything you want. And you can.

An interpreter can’t do that. It has to assume you could have changed anything at any time. So it re-checks constantly. A sufficiently smart interpreter that pre-builds machine code for current assumptions and invalidates if it changes could be as fast as JIT… Because it would be a JIT.

And if you like blog posts about compiler internals—who doesn’t?—you should hit “Yes, sign me up” up above and to the left.

Noah Gibbs wrote the ebook Rebuilding Rails and then a lot about how fast Ruby is at various tasks. Despite being a grumpy old programmer in Inverness, Scotland, Noah believes that some day, somehow, there will be a second game as good as Stuart Smith’s Adventure Construction Set for the Apple IIe. Follow Noah on Twitter and GitHub

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Spin Cycle: Shopify’s SFN Team Overcomes a Cloud-Development Spiral

Spin Cycle: Shopify’s SFN Team Overcomes a Cloud-Development Spiral

You may have read about Spin, Shopify’s new cloud-based development tool. Instead of editing and running a local version of a service on a developer’s MacBook, Shopify is moving towards a world where the development servers are available on-demand as a container running in Kubernetes. When using Spin, you don’t need anything on your local machine other than an ssh client and VSCode, if that’s your editor of choice. 

By moving development off our MacBooks and onto Spin, we unlock the ability to easily share work in progress with coworkers and can work on changes that span different codebases without any friction. And because Spin instances are lightweight and ephemeral, we don’t run the risk of messing up long-lived development databases when experimenting with data migrations.

Across Shopify, teams have been preparing and adjusting their codebases so that their services can run smoothly in this kind of environment. In the Shopify Fulfillment Network (SFN) engineering org, we put together a team of three engineers to get us up and running on Spin.

At first, it seemed like the job would be relatively straightforward. But as we started doing the work, we began to notice some less obvious forces at play that were pushing against our efforts.

Since it was easier for most developers to use our old tooling instead of Spin while we were getting the kinks worked out, developers would often unknowingly commit changes that broke some functionality we’d just enabled for Spin. In hindsight, the process of getting SFN working on Spin is a great example of the kind of hidden difficulty in technical work that's more related to human systems than how to get bits of electricity to do what you want.

Before we get to the interesting stuff, it’s important to understand the basics of the technical challenge. We'll start by getting a broad sense of the SFN codebase and then go into the predictable work that was needed to get it running smoothly in Spin. With that foundation, we’ll be able to describe how and why we started treading water, and ultimately how we’re pushing past that.

The Shape of SFN

SFN exists to take care of order fulfillment on behalf of Shopify merchants. After a customer has completed the checkout process, their order information is sent to SFN. We then determine which warehouse has enough inventory and is best positioned to handle the order. Once SFN has identified the right warehouse, it sends the order information to the service responsible for managing that warehouse’s operations. The state of the system is visible to the merchant through the SFN app running in the merchant’s Shopify admin. The SFN app communicates to Shopify Core via the same GraphQL queries and mutations that Shopify makes available to all app developers.

At a highly simplified level, this is the general shape of the SFN codebase:

Diagram of SFN codebase. SFN is a large rectangle in the center containing six boxes labelled Subcomponent. Outside of the SFN box are six directional arrows. Each arrow connects to boxes called Dependency
SFN’s monolithic Rails application with many external dependencies

Similar to the Shopify codebase, SFN is a monolithic Rails application divided into individual components owned by particular teams. Unlike Shopify Core, however, SFN has many strong dependencies on services outside of its own monolith.

SFN’s biggest dependency is on Shopify itself, but there are plenty more. For example, SFN does not design shipping labels, but does need to send shipping labels to the warehouse. So, SFN is a client to a service that provides valid shipping labels. Similarly, SFN does not tell the mobile Chuck robots in a warehouse where to go— we are a client of a service that handles warehouse operations.

The value that SFN provides is in gluing together a bunch of separate systems with some key logic living in that glue. There isn't much you can do with SFN without those dependencies around in some form.

How SFN Handles Dependencies

As software developers, we need quick feedback about whether an in-progress change is working as expected. And to know if something is working in SFN, that code generally needs to be validated alongside one or several of SFN’s dependencies. For example, if a developer is implementing a feature to display some text in the SFN app after a customer has placed an order, there’s no useful way to validate that change without also having Shopify available.

So the work of getting a useful development environment for SFN with Spin appears to be about looking at each dependency, figuring how to handle that dependency, and then implementing that decision. We have a few options for how to handle any particular dependency when running SFN in Spin:

  1. Run an instance of the dependency directly in the same Spin container.
  2. Mock the dependency.
  3. Use a shared running instance of the dependency, such as a staging or live test environment.

Given all the dependencies that SFN has, this seems like a decent amount of work for a three-person team.

But this is not the full extent of the problem—it’s just the foundation.

Once we added configuration to make some dependency or some functional flow of SFN work in Spin, another commit would often be added to SFN that nullifies that effort. For example, after getting some particular flow functioning in a Spin environment, the implementation of that flow might be rewritten with new dependencies that are not yet configured to work in Spin.

One apparent solution to this problem would be simply to pay more attention to what work is in flight in the SFN codebase and better prepare for upcoming changes.

But here’s the problem: It’s not just one or two flows changing. Across SFN, the internal implementation of functionality is constantly being improved and refactored. With over 150 SFN engineers deploying to production over 30 times a day, the SFN codebase doesn’t sit still for long. On top of that, Spin itself is constantly changing. And all of SFN’s dependencies are changing. For any dependencies that were mocked, those mocks will become stale and need to be updated.

The more we accomplished, the more functionality existed with the potential to stop working when something changes. And when one of those regressions occurred, we needed to interrupt the dependency we were working on solving in order to keep a previously solved flow functioning. The tension between making improvements and maintaining what you’ve already built is central to much of software engineering. Getting SFN working on Spin was just a particularly good example.

The Human Forces on the System

After recognizing the problem, we needed to step back and look at the forces acting on the system. What incentive structures and feedback loops are contributing to the situation?

In the case of getting SFN working on Spin, changes were happening frequently and those changes were causing regressions. Some of those changes were within our control (e.g., a change goes into SFN that isn’t configured to work in Spin), and some are less so (e.g., Spin itself changing how it uses certain inputs).

This led us to observe two powerful feedback loops that could be happening when SFN developers are working in Spin:

Two feedback loops: Loop of Happy Equilibrium and Spiral of Struggle.
Loop of Happy Equilibrium and Spiral of Struggle

If it’s painful to use Spin for SFN development, it’s less likely that developers will use Spin the next time they have to validate their work. And if a change hasn’t been developed and tested using Spin, maybe something about that change breaks a particular testing flow, and that causes another SFN developer to become frustrated enough to stop using Spin. And this cycle continues until SFN is no longer usable in Spin.

Alternatively, if it’s a great experience to use and validate work in Spin, developers will likely want to continue using the tool, which will catch any Spin-specific regressions before they make it into the main branch.

As you can imagine, it’s very difficult to move from the Spiral of Struggle into the positive Loop of Happy Equilibrium. Our solution is to try our best to dampen the force acting on the negative spiral while simultaneously propping up the force of the positive feedback loop. 

As the team focused on getting SFN working on Spin, our solution to this problem was to be very intentional about where we spent our efforts while asking the other SFN developers to endure a little pain and pitch in as we go through this transition. The SFN-on-Spin team narrowed its focus to just getting SFN to a basic level of functionality on Spin so that most developers could use it for the most common validation flows, and we prioritized fixing any bugs that disrupted those areas. This meant explicitly not working to get all SFN functionality running Spin, but just enough so that we could manage upkeep. And at the same time, we asked other SFN developers to use Spin for their daily work, even though it’s missing functionality they need or want. Where they feel frustrations or see gaps, we encouraged and supported them in adding the functionality they needed.

Breaking the cycle

Our hypothesis is that this is a temporary stage of transition to cloud development. If we’re successful, we’ll land in the Loop of Happy Equilibrium where regressions are caught before they’re merged, individuals add the missing functionality they need, and everyone ultimately has a fun time developing. They will feel confident about shipping their code.

Our job seems to be all about code and making computers do what we say. But many of the real-life challenges we face when working on a codebase are not apparent from code or architecture diagrams. Instead they require us to reflect on the forces operating on the humans that are building that software. And once we have an idea of what those forces might be, we can brainstorm how to disrupt or encourage the feedback loops we’ve observed.

Jen is a Staff Software Engineer at Shopify who's spent her career seeking out and building teams that challenge the status quo. In her free time, she loves getting outdoors and spending time with her chocolate lab.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Mastering React’s Stable Values

Mastering React’s Stable Values

The concept of stable value is a distinctly React term, and especially relevant since the introduction of Functional ComponentsIt refers to values (usually coming from a hook) that have the same value across multiple renders. And they’re immediately confusing. In this post, Colin Gray, Principal Developer at Shopify, walks through some cases where they really matter and how to make sense of them.

Continue reading

10 Tips for Building Resilient Payment Systems

10 Tips for Building Resilient Payment Systems

During the past five years I’ve worked on a lot of different parts of Shopify’s payment infrastructure and helped onboard dozens of developers in one way or another. Some people came from different programming languages, others used Ruby before but were new to the payments space. What was mostly consistent among new team members was little experience in building systems at Shopify’s scale—it was new for me too when I joined.

It’s hard to learn something when you don’t know what you don’t know. As I learned things over the years—sometimes the hard way—I eventually found myself passing on these lessons to others. I distilled these topics into a presentation I gave to my team and boiled that down into this blog post. So, without further ado, here are my top 10 tips and tricks for building resilient payment systems.

1. Lower Your Timeouts

Ruby’s built-in Net::HTTP client has a default timeout of 60 seconds to open a connection to a server, 60 seconds to write data, and another 60 seconds to read a response. For online applications where a human being is waiting for something to happen, this is too long. At least there’s a default timeout in place. HTTP clients in other programming languages, like http.Client in Go and http.request in Node.JS don’t have a default timeout at all! This means an unresponsive server could tie up your resources indefinitely and increase your infrastructure bill unnecessarily.

Timeouts can also be set in data stores. For example MySQL has the MAX_EXECUTION_TIME optimizer hint for setting a per-SELECT query timeout in milliseconds. Combined with other tools like pt-kill, we try to prevent bad queries from overwhelming the database.

If there’s only a single thing you take away from this post, dear reader, it should be to investigate and set low timeouts everywhere you can. But what is the right timeout to set? you may wonder. That ultimately depends on your application’s unique situation and can be deduced with monitoring (more on that later), but I found that an open timeout of one second with a write and read or query timeout of five seconds is a decent starting point. Consider this waiting time from the perspective of the end user: would you like to wait for more than five seconds for a page to load successfully or show an error?

2. Install Circuit Breakers

Timeouts put an upper bound on how long we wait before giving up. But services that go down tend to stay down for a while, so if we see multiple timeouts in a short period of time, we can improve on this by not trying at all. Much like the circuit breaker you will find in your house or apartment, once the circuit is opened or tripped, nothing is let through.

Shopify developed Semian to protect Net::HTTP, MySQL, Redis, and gRPC services with a circuit breaker in Ruby. By raising an exception instantly once we detect a service being down, we save resources by not waiting for another timeout we expect to happen. In some cases rescuing these exceptions allows you to provide a fallback. Building and Testing Resilient Ruby on Rails Applications describes how we design and unit tests such fallbacks using Toxiproxy.

Semian and other circuit breaker implementations aren’t a silver bullet that will solve all your resiliency problems by adding it to your application. It requires understanding the ways your application can fail and what falling back could look like. At scale a circuit breaker can still waste a lot of resources (and money) as well. The article Your Circuit Breaker is Misconfigured explains how to fine tune this pattern for maximum performance.

3. Understand Capacity

Understanding a bit of queueing theory goes a long way in being able to reason about how a system will behave under load. Slightly summarized, Little’s Law states that “the average number of customers in a system (over an interval) is equal to their average arrival rate, multiplied by their average time in the system.” The arrival rate is the amount of customers entering and leaving the system.

Some might not realize it at first, but queues are everywhere: in grocery stores, traffic, factories, and as I recently rediscovered, at a festival in front of the toilets. Jokes aside, you find queues in online applications as well. A background job, a Kafka event, and a web request all are examples of units of work processed on queues. Put in a formula, Little’s Law is expressed as capacity = throughput * latency. This also means that throughput = capacity / latency. Or in more practical terms: if we have 50 requests arrive in our queue and it takes an average of 100 milliseconds to process a request, our throughput is 500 requests per second.

With the relationship between queue size, throughput, and latency clarified, we can reason about what changing any of the variables implies. An N+1 query increases the latency of a request and lowers our throughput. If the amount of requests coming in exceeds our capacity, the requests queue grows and at some point a client is waiting so long for their request to be served that they time out. At some point you need to put a limit on the amount of work coming in—your application can’t out scale the world. Rate limiting and load shedding are two techniques for this.

4. Add Monitoring and Alerting

With our newfound understanding of queues, we now have a better idea of what kind of metrics we need to monitor to know our system is at risk of going down due to overload. Google’s site reliability engineering (SRE) book lists four golden signals a user-facing system should be monitored for:

  • Latency: the amount of time it takes to process a unit of work, broken down between success and failures. With circuit breakers failures can happen very fast and lead to misleading graphs.
  • Traffic: the rate in which new work comes into the system, typically expressed in requests per minute.
  • Errors: the rate of unexpected things happening. In payments, we distinguish between payment failures and errors. An example of a failure is a charge being declined due to insufficient funds, which isn’t unexpected at all. HTTP 500 response codes from our financial partners on the other hand are. However a sudden increase in failures might need further investigation.
  • Saturation: how much load the system is under, relative to its total capacity. This could be the amount of memory used versus available or a thread pool’s active threads versus total number of threads available, in any layer of the system.

5. Implement Structured Logging

Where metrics provide a high-level overview of how our system is behaving, logging allows us to understand what happened inside a single web request or background job. Out of the box Ruby on Rails logs are human-friendly but hard to parse for machines. This can work okay if you have only a single application server, but beyond that you’ll quickly want to store logs in a centralized place and make them easily searchable. This is done by using structured logging in a machine readable format, like key=value pairs or JSON, allows log aggregation systems to parse and index the data. 

In distributed systems, it’s useful to pass along some sort of correlation identifier. A hypothetical example is when a buyer initiates a payment at checkout, a correlation_id is generated by our Rails controller. This identifier is passed along to a background job that makes the API call to the payment service that handles sensitive credit card data, which contains the correlation identifier in the API parameters and SQL query comments. Because these components of our checkout process all log the correlation_id, we can easily find all related logs when we need to debug this payment attempt.

6. Use Idempotency Keys

Distributed systems use unreliable networks, even if the networks look reliable most of the time. At Shopify’s scale, a once in a million chance of something unreliable occurring during payment processing means it’s happening many times a day. If this is a payments API call that timed out, we want to retry the request, but do so safely. Double charging a customer's card isn’t just annoying for the card holder, it also opens up the merchant for a potential chargeback if they don’t notice the double charge and refund it. A double refund isn’t good for the merchant's business either.

In short, we want a payment or refund to happen exactly once despite the occasional hiccups that could lead to sending an API request more than once. Our centralized payment service can track attempts, which consists of at least one or more (retried) identical API requests, by sending an idempotency key that’s unique for each one. The idempotency key looks up the steps the attempt completed (such as creating a local database record of the transaction) and makes sure we send only a single request to our financial partners. If any of these steps fail and a retried request with the same idempotency key is received, recovery steps are run to recreate the same state before continuing. Building Resilient GraphQL APIs Using Idempotency describes how our idempotency mechanism works in more detail.

An idempotency key needs to be unique for the time we want the request to be retryable, typically 24 hours or less. We prefer using an Universally Unique Lexicographically Sortable Identifier (ULID) for these idempotency keys instead of a random version 4 UUID. ULIDs contain a 48-bit timestamp followed by 80 bits of random data. The timestamp allows ULIDs to be sorted, which works much better with the b-tree data structure databases use for indexing. In one high-throughput system at Shopify we’ve seen a 50 percent decrease in INSERT statement duration by switching from UUIDv4 to ULID for idempotency keys.

7. Be Consistent With Reconciliation

With reconciliation we make sure that our records are consistent with those of our financial partners. We reconcile individual records such as charges or refunds, and aggregates such as the current balance not yet paid out to a merchant. Having accurate records isn’t just for display purposes, they’re also used as input for tax forms were required to generate for merchants in some jurisdictions.

In case of a mismatch, we record the anomaly in our database. An example is the MismatchCaptureStatusAnomaly, expressing the status of a captured local charge wasn’t the same as the status as returned by our financial partners. Often we can automatically attempt to remediate the discrepancy and mark the anomaly as resolved. In cases where this isn’t possible, the developer team investigates anomalies and ships fixes as necessary.

Even though we attempt automatic fixes where possible, we want to keep track of the mismatch so we know what our system did and how often. We should rely on anomalies to fix things as a last resort, preferring solutions that prevent anomalies from being created in the first place.

8. Incorporate Load testing

While Little’s Law is a useful theorem, practice is messier: the processing time for work isn’t uniformly distributed, making it impossible to achieve 100% saturation. In practice, queue size starts growing somewhere around the 70 to 80 percent mark, and if the time spent waiting in the queue exceeds the client timeout, from the client’s perspective our service is down. If the volume of incoming work is large enough, our servers can even run out of memory to store work on the queue and crash.

There are various ways we can keep queue size under control. For example, we use scriptable load balancers to throttle the amount of checkouts happening at any given time. In order to provide a good user experience for buyers, if the amount of buyers wanting to check out exceeds our capacity, we place these buyers on a waiting queue (I told you they are everywhere!) before allowing them to pay for their order. Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers describes this system in more detail.

We regularly test the limits and protection mechanisms of our systems by simulating large volume flash sales on specifically set up benchmark stores. Pummelling the Platform–Performance Testing Shopify describes our load testing tooling and philosophy. Specifically for load testing payments end-to-end, we have a bit of a problem: the test and staging environments of our financial partners don’t have the same capacity or latency distribution as production. To solve this, our benchmark stores are configured with a special benchmark gateway whose responses mimic these properties.

9. Get on Top of Incident Management

As mentioned at the start of this article, we know that failure can’t be completely avoided and is a situation that we need to prepare for. An incident usually starts when the oncall service owners get paged, either by an automatic alert based on monitoring or by hand if someone notices a problem. Once the problem is confirmed, we start the incident process with a command sent to our Slack bot spy

The conversation moves to the assigned incident channel where we have three roles involved:

  • Incident Manager on Call (IMOC): responsible for coordinating the incident
  • Support Response Manager (SRM): responsible for public communication 
  • the service owner(s): who are working on restoring stability.

The article Implementing ChatOps into our Incident Management Procedure goes into more detail about the process. Once the problem has been mitigated, the incident is stopped, and the Slack bot generates a Service Disruption in our services database application. The disruption contains an initial timeline of events, Slack messages marked as important, and a list of people involved.

10. Organize Incident Retrospectives

We aim to have an incident retrospective meeting within a week after the incident occurred. During this meeting:

  • we dig deep into what exactly happened
  • what incorrect assumptions we held about our systems
  • what we can do to prevent the same thing from happening again. 

Once these things are understood, typically a few action items are assigned to implement safeguards to prevent the same thing from happening again.

Retrospectives aren’t just good for trying to prevent problems, they’re also a valuable learning tool for new members of the team. At Shopify, the details of every incident are internally public for all employees to learn from. A well-documented incident can also be a training tool for newer members joining the team on call rotation, either as an archived document to refer to or by creating a disaster role playing scenario from it.

Scratching the Surface

I moved from my native Netherlands to Canada for this job in 2016, before Shopify became a Digital by Design company. During my work, I’m often reminded of this Dutch saying “trust arrives on foot, but leaves on horseback.” Merchants’ livelihoods are dependent on us if they pick Shopify Payments for accepting payments online or in-person, and we take that responsibility seriously. While failure isn’t completely avoidable, there are many concepts and techniques that we apply to minimize downtime, limit the scope of impact, and build applications that are resilient to failure.

This top ten only scratches the tip of the iceberg, it was meant as an introduction to the kind of challenges the Shopify Payments team deals with after all. I usually recommend Release It! by Michael Nygard as a good resource for team members who want to learn more.

Bart is a staff developer on the Shopify Payments team and has been working on the scalability, reliability, and security of Shopify’s payment processing infrastructure since 2016.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Data-Centric Machine Learning: Building Shopify Inbox’s Message Classification Model

Data-Centric Machine Learning: Building Shopify Inbox’s Message Classification Model

By Eric Fung and Diego Castañeda

Shopify Inbox is a single business chat app that manages all Shopify merchants’ customer communications in one place, and turns chats into conversions. As we were building the product it was essential for us to understand how our merchants’ customers were using chat applications. Were they reaching out looking for product recommendations? Wondering if an item would ship to their destination? Or were they just saying hello? With this information we could help merchants prioritize responses that would convert into sales and guide our product team on what functionality to build next. However, with millions of unique messages exchanged in Shopify Inbox per month, this was going to be a challenging natural language processing (NLP) task. 

Our team didn’t need to start from scratch, though: off-the-shelf NLP models are widely available to everyone. With this in mind, we decided to apply a newly popular machine learning process—the data-centric approach. We wanted to focus on fine-tuning these pre-trained models on our own data to yield the highest model accuracy, and deliver the best experience for our merchants.

A merchant’s Shopify Inbox screen titled Customers that displays snippets of messages from customers that are labelled with things for easy identification like product details, checkout, and edit order.
Message Classification in Shopify Inbox

We’ll share our journey of building a message classification model for Shopify Inbox by applying the data-centric approach. From defining our classification taxonomy to carefully training our annotators on labeling, we dive into how a data-centric approach, coupled with a state-of-the-art pre-trained model, led to a very accurate prediction service we’re now running in production.

Why a Data-Centric Approach?

A traditional development model for machine learning begins with obtaining training data, then successively trying different model architectures to overcome any poor data points. This model-centric process is typically followed by researchers looking to advance the state-of-the-art, or by those who don't have the resources to clean a crowd-sourced dataset.

By contrast, a data-centric approach focuses on iteratively making the training data better to reduce inconsistencies, thereby yielding better results for a range of models. Since anyone can download a well-performing, pre-trained model, getting a quality dataset is the key differentiator in being able to produce a high-quality system. At Shopify, we believe that better training data yields machine learning models that can serve our merchants better. If you’re interested in hearing more about the benefits of the data-centric approach, check out Andrew Ng’s talk on MLOps: From Model-centric to Data-centric.

Our First Prototype

Our first step was to build an internal prototype that we could ship quickly. Why? We wanted to build something that would enable us to understand what buyers were saying. It didn’t have to be perfect or complex, it just had to prove that we could deliver something with impact. We could iterate afterwards. 

For our first prototype, we didn't want to spend a lot of time on the exploration, so we had to construct both the model and training data with limited resources. Our team chose a pre-trained model available on TensorFlow Hub called Universal Sentence Encoder. This model can output embeddings for whole sentences while taking into account the order of words. This is crucial for understanding meaning. For example, the two messages below use the same set of words, but they have very different sentiments:

  • “Love! More please. Don’t stop baking these cookies.”
  • “Please stop baking more cookies! Don’t love these.”

To rapidly build our training dataset, we sought to identify groups of messages with similar meaning, using various dimensionality reduction and clustering techniques, including UMAP and HDBScan. After manually assigning topics to around 20 message clusters, we applied a semi-supervised technique. This approach takes a small amount of labeled data, combined with a larger amount of unlabeled data. We hand-labeled a few representative seed messages from each topic, and used them to find additional examples that were similar. For instance, given a seed message of “Can you help me order?”, we used the embeddings to help us find similar messages such as “How to order?” and “How can I get my orders?”. We then sampled from these to iteratively build the training data.

A visualization using a scatter plot of message clusters during one of our explorations.
A visualization of message clusters during one of our explorations.

We used this dataset to train a simple predictive model containing an embedding layer, followed by two fully connected, dense layers. Our last layer contained the logits array for the number of classes to predict on.

This model gave us some interesting insights. For example, we observed that a lot of chat messages are about the status of an order. This helped inform our decision to build an order status request as part of Shopify Inbox’s Instant Answers FAQ feature. However, our internal prototype had a lot of room for improvement. Overall, our model achieved a 70 percent accuracy rate and could only classify 35 percent of all messages with high confidence (what we call coverage). While our scrappy approach of using embeddings to label messages was fast, the labels weren’t always the ground truth for each message. Clearly, we had some work to do.

We know that our merchants have busy lives and want to respond quickly to buyer messages, so we needed to increase the accuracy, coverage, and speed for version 2.0. Wanting to follow a data-centric approach, we focused on how we could improve our data to improve our performance. We made the decision to put additional effort into defining the training data by re-visiting the message labels, while also getting help to manually annotate more messages. We sought to do all of this in a more systematic way.

Creating A New Taxonomy

First, we dug deeper into the topics and message clusters used to train our prototype. We found several broad topics containing hundreds of examples that conflated distinct semantic meanings. For example, messages asking about shipping availability to various destinations (pre-purchase) were grouped in the same topic as those asking about what the status of an order was (post-purchase).

Other topics had very few examples, while a large number of messages didn’t belong to any specific topic at all. It’s no wonder that a model trained on such a highly unbalanced dataset wasn’t able to achieve high accuracy or coverage.

We needed a new labeling system that would be accurate and useful for our merchants. It also had to be unambiguous and easy to understand by annotators, so that labels would be applied consistently. A win-win for everybody!

This got us thinking: who could help us with the taxonomy definition and the annotation process? Fortunately, we have a talented group of colleagues. We worked with our staff content designer and product researcher who have domain expertise in Shopify Inbox. We were also able to secure part-time help from a group of support advisors who deeply understand Shopify and our merchants (and by extension, their buyers).

Over a period of two months, we got to work sifting through hundreds of messages and came up with a new taxonomy. We listed each new topic in a spreadsheet, along with a detailed description, cross-references, disambiguations, and sample messages. This document would serve as the source of truth for everyone in the project (data scientists, software engineers, and annotators).

In parallel with the taxonomy work, we also looked at the latest pre-trained NLP models, with the aim of fine-tuning one of them for our needs. The Transformer family is one of the most popular, and we were already using that architecture in our product categorization model. We settled on DistilBERT, a model that promised a good balance between performance, resource usage, and accuracy. Some prototyping on a small dataset built from our nascent taxonomy was very promising: the model was already performing better than version 1.0, so we decided to double down on obtaining a high-quality, labeled dataset.

Our final taxonomy contained more than 40 topics, grouped under five categories: 

  • Products
  • Pre-Purchase
  • Post-Purchase
  • Store
  • Miscellaneous

We arrived at this hierarchy by thinking about how an annotator might approach classifying a message, viewed through the lens of a buyer. The first thing to determine is: where was the buyer on their purchase journey when the message was sent? Were they asking about a detail of the product, like its color or size? Or, was the buyer inquiring about payment methods? Or, maybe the product was broken, and they wanted a refund? Identifying the category helped to narrow down our topic list during the annotation process.

Our in-house annotation tool displaying the message to classify at top of screen with a list possible topics grouped by category below
Our in-house annotation tool displaying the message to classify, along with some of the possible topics, grouped by category

Each category contains an other topic to group the messages that don’t have enough content to be clearly associated with a specific topic. We decided to not train the model with the examples classified as other because, by definition, they were messages we couldn’t classify ourselves in the proposed taxonomy. In production, these messages get classified by the model with low probabilities. By setting a probability threshold on every topic in the taxonomy, we could decide later on whether to ignore them or not.

Since this taxonomy was pretty large, we wanted to make sure that everyone interpreted it consistently. We held several training sessions with our annotation team to describe our classification project and philosophy. We divided the annotators into two groups so they could annotate the same set of messages using our taxonomy. This exercise had a two-fold benefit:

  1. It gave annotators first-hand experience using our in-house annotation tool.
  2. It allowed us to measure inter-annotator agreement.

This process was time-consuming as we needed to do several rounds of exercises. But, the training led us to refine the taxonomy itself by eliminating inconsistencies, clarifying descriptions, adding additional examples, and adding or removing topics. It also gave us reassurance that the annotators were aligned on the task of classifying messages.

Let The Annotation Begin

Once we and the annotators felt that they were ready, the group began to annotate messages. We set up a Slack channel for everyone to collaborate and work through tricky messages as they arose. This allowed everyone to see the thought process used to arrive at a classification.

During the preprocessing of training data, we discarded single-character messages and messages consisting of only emojis. During the annotation phase, we excluded other kinds of noise from our training data. Annotators also flagged content that wasn’t actually a message typed by a buyer, such as when a buyer cut-and-pastes the body of an email they’ve received from a Shopify store confirming their purchase. As the old saying goes, garbage in, garbage out. Lastly, due to our current scope and resource constraints, we had to set aside non-English messages.

Handling Sensitive Information

You might be wondering how we dealt with personal information (PI) like emails or phone numbers. PI occasionally shows up in buyer messages and we took special care to ensure that it was handled appropriately. This was a complicated, and at times manual, process that involved many steps and tools.

To avoid training our machine learning model on any messages containing PI, we couldn’t just ignore them. That would likely bias our model. Instead, we wanted to identify the messages with PI, then replace it with realistic, mock data. In this way, we would have examples of real messages that wouldn’t be identifiable to any real person.

This anonymization process began with our annotators flagging messages containing PI. Next, we used an open-source library called Presidio to analyze and anonymize the PI. This tool ran in our data warehouse, keeping our merchants’ data within Shopify’s systems. Presidio is able to recognize many different types of PI, and the anonymizer provides different kinds of operators that can transform the instances of PI into something else. For example, you could completely remove it, mask part of it, or replace it with something else.

In our case, we used another open-source tool called Faker to replace the PI. This library is customizable and localized, and its providers can generate realistic addresses, names, locations, URLs, and more. Here’s an example of its Python API:

Combining Presidio and Faker allowed us to semi-automate the PI replacement, see below for a fabricated example:


can i pickup today? i ordered this am: Sahar Singh my phone is 852 5555 1234. Email is


can i pickup today? i ordered this am: Sahar Singh my phone is 090-722-7549. Email is


If you’re a sharp eyed reader, you’ll notice (as we did), that our tools missed identifying a bit of fabricated PI in the above example (hint: the name). Despite Presidio using a variety of techniques (regular expressions, named entity recognition, and checksums), some PI slipped through the cracks. Names and addresses have a lot of variability and are hard to recognize reliably. This meant that we still needed to inspect the before and after output to identify whether any PI was still present. Any PI was manually replaced with a placeholder (for example, the name Sahar Singh was replaced with <PERSON>). Finally, we ran another script to replace the placeholders with Faker-generated data.

A Little Help From The Trends

Towards the end of our annotation project, we noticed a trend that persisted throughout the campaign: some topics in our taxonomy were overrepresented in the training data. It turns out that buyers ask a lot of questions about products!

Our annotators had already gone through thousands of messages. We couldn’t afford to split up the topics with the most popular messages and re-classify them, but how could we ensure our model performed well on the minority classes? We needed to get more training examples from the underrepresented topics.

Since we were continuously training a model on the labeled messages as they became available, we decided to use it to help us find additional messages. Using the model’s predictions, we excluded any messages classified with the overrepresented topics. The remaining examples belonged to the other topics, or were ones that the model was uncertain about. These messages were then manually labeled by our annotators.


So, after all of this effort to create a high-quality, consistently labeled dataset, what was the outcome? How did it compare to our first prototype? Not bad at all. We achieved our goal of higher accuracy and coverage:


Version 1.0 Prototype

Version 2.0 in Production

Size of training set



Annotation strategy

Based on embedding similarity

Human labeled

Taxonomy classes



Model accuracy



High confidence coverage




Another key part of our success was working collaboratively with diverse subject matter experts. Bringing in our support advisors, staff content designer, and product researcher provided perspectives and expertise that we as data scientists couldn’t achieve alone.

While we shipped something we’re proud of, our work isn’t done. This is a living project that will require continued development. As trends and sentiments change over time, the topics of conversations happening in Shopify Inbox will shift accordingly. We’ll need to keep our taxonomy and training data up-to-date and create new models to continue to keep our standards high.

If you want to learn more about the data work behind Shopify Inbox, check out Building a Real-time Buyer Signal Data Pipeline for Shopify Inbox that details the real-time buyer signal data pipeline we built.

Eric Fung is a senior data scientist on the Messaging team at Shopify. He loves baking and will always pick something he’s never tried at a restaurant. Follow Eric on Twitter.

Diego Castañeda is a senior data scientist on the Core Optimize Data team at Shopify. Previously, he was part of the Messaging team and helped create machine learning powered features for Inbox. He loves computers, astronomy and soccer. Connect with Diego on LinkedIn.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Data Science & Engineering career page to find out about our open positions. Learn about how we’re hiring to design the future together—a future that is Digital by Design.

Continue reading

Spin Infrastructure Adventures: Containers, Systemd, and CGroups

Spin Infrastructure Adventures: Containers, Systemd, and CGroups

The Spin infrastructure team works hard at improving the stability of the system. In February 2022 we moved to Container Optimized OS (COS), the Google maintained operating system for their Kubernetes Engine SaaS offering. A month later we turned on multi-cluster to allow for increased scalability as more users came on board. Recently, we’ve increased default resources allotted to instances dramatically. However, with all these changes we’re still experiencing some issues, and for one of those, I wanted to dive a bit deeper in a post to share with you.

Spin’s Basic Building Blocks

First it's important to know the basic building blocks of Spin and how these systems interact. The Spin infrastructure is built on top of Kubernetes, using many of the same components that Shopify’s production applications use. Spin instances themselves are implemented via a custom resource controller that we install on the system during creation. Among other things, the controller transforms the Instance custom resource into a pod that’s booted from a special Isospin container image along with the configuration supplied during instance creation. Inside the container we utilize systemd as a process manager and workflow engine to initialize the environment, including installing dotfiles, pulling application source code, and running through bootstrap scripts. Systemd is vital because it enables a structured way to manage system initialization and this is used heavily by Spin.

There’s definitely more to what makes Spin then what I’ve described, but from a high level and for the purposes of understanding the technical challenges ahead it's important to remember that:

  1. Spin is built on Kubernetes
  2. Instances are run in a container
  3. systemd is run INSIDE the container to manage the environment.

First Encounter

In February 2022, we had a serious problem with Pod relocations that we eventually tracked to node instability. We had several nodes in our Kubernetes clusters that would randomly fail and require either a reboot or to be replaced entirely. Google had decent automation for this that would catch nodes in a bad state and replace them automatically, but it was occurring often enough (five nodes per day or about one percent of all nodes) that users began to notice. Through various discussions with Shopify’s engineering infrastructure support team and Google Cloud support we eventually honed in on memory consumption being the primary issue. Specifically, nodes were running out of memory and pods were being out of memory (OOM) killed as a result. At first, this didn’t seem so suspicious, we gave users the ability to do whatever they want inside their containers and didn’t provide much resources to them (8 to 12 GB of RAM each), so it was a natural assumption that containers were, rightfully, just using too many resources. However, we found some extra information that made us think otherwise.

First, the containers being OOM killed would occasionally be the only Spin instance on the node and when we looked at their memory usage, often it would be below the memory limit allotted to them.

Second, in parallel to this another engineer was investigating an issue with respect to Kafka performance where he identified a healthy running instance using far more resources than should have been possible.

The first issue would eventually be connected to a memory leak that the host node was experiencing, and through some trial and error we found that switching the host OS from Ubuntu to Container Optimized OS from Google solved it. The second issue remained a mystery. With the rollout of COS though, we saw 100 times reduction in OOM kills, which was sufficient for our goals and we began to direct our attention to other priorities.

Second Encounter

Fast forward a few months to May 2022. We were experiencing better stability which was a source of relief for the Spin team. Our ATC rotations weren't significantly less frantic, the infrastructure team had the chance to roll out important improvements including multi-cluster support and a whole new snapshotting process. Overall things felt much better.

Slowly but surely over the course of a few weeks, we started to see increased reports of instance instability. We verified that the nodes weren’t leaking memory as before, so it wasn’t a regression. This is when several team members re-discovered the excess memory usage issue we’d seen before, but this time we decided to dive a little further.

We needed a clean environment to do the analysis, so we set up a new spin instance on its own node. During our test, we monitored the Pod resource usage and the resource usage of the node it was running on. We used kubectl top pod and kubectl top node to do this. Before we performed any tests we saw

Next, we needed to simulate memory load inside of the container. We opted to use a tool called stress, allowing us to start a process that consumes a specified amount of memory that we could use to exercise the system.

We ran kubectl exec -it spin-muhc – bash to land inside of a shell in the container and then stress -m 1 --vm-bytes 10G --vm-hang 0 to start the test.

Checking the resource usage again we saw

This was great, exactly what we expected. The 10GB used by our stress test showed up in our metrics. Also, when we checked the cgroup assigned to the process we saw it was correctly assigned to the Kubernetes Pod:

Where 24899 was the PID of the process started by stress.This looked great as well. Next, we performed the same test, but in the instance environment accessed via spin shell. Checking the resource usage we saw

Now this was odd. Here we saw that the memory created by stress wasn’t showing up under the Pod stats (still only 14Mi), but it was showing up for the node (33504Mi). Checking the usage from in the container we saw that it was indeed holding onto memory as expected

However, when we checked the cgroup this time, we saw something new:

What the heck!? Why was the cgroup different? We double checked that this was the correct hierarchy by using the systemd cgroup list tool from within the spin instance: 

So to summarize what we had seen: 

  1. When we run processes inside the container via kubectl exec, they’re correctly placed within the kubepods cgroup hierarchy. This is the hierarchy that contains the pods memory limits.
  2. When we run the same processes inside the container via spin shell, they’re placed within a cgroup hierarchy that doesn’t contain the limits. We verify this by checking the cgroup file directly:

The value above is close to the maximum value of a 64 bit integer (about 8.5 Billion Gigabytes of memory). Needless to say, our system has less than that, so this is effectively unlimited.

For practical purposes, this means any resource limitation we put on the Pod that runs Spin instances isn’t being honored. So Spin instances can use more memory than they’re allotted which is concerning for a few reasons, but probably most importantly, we depend on this to avoid instances from interfering with one another.

Isolating It

In a complex environment like Spin it’s hard to account for everything that might be affecting the system. Sometimes it’s best to distill problems down to the essential details to properly isolate the issue. We were able to reproduce the cgroup leak in a few different ways. First on the Spin instances directly using crictl or ctr and custom arguments with real Spin instances. Second, running on a local Docker environment . Setting up an experiment like this also allowed for much quicker iteration time when testing potential fixes.

From the experiments we discovered differences between the different runtimes (containerd, Docker, and Podman) execution of systemd containers. Podman for instance has a --systemd flag that enables and disables an integration with the host systemd. containerd has a similar flag –runc-systemd-cgroup that starts runc with the systemd cgroup manager. For Docker, however, no such integration exists (you can modify the cgroup manager via daemon.json, but not via the CLI like Podman and Containerd) and we saw the same cgroup leakage. When comparing the cgroups assigned to the container processes between Docker and Podman, we saw the following




Podman placed the systemd and stress processes in a cgroup unique to the container. This allowed Podman to properly delegate the resource limitations to both systemd and any process that systemd spawns. This was the behavior we were looking for!

The Fix

We now had an example of a systemd container properly being isolated from the host with Podman. The trouble was that in our Spin production environments we use Kubernetes that uses Containerd, not Podman, for the container runtime. So how could we leverage what we learned from Podman toward a solution?

While investigating differences between Podman and Docker with respect to Systemd we came across the crux of the fix. By default Docker and containerd use a cgroup driver called cgroupfs to manage the allocation of resources while Podman uses the systemd driver (this is specific to our host operating system COS from Google). The systemd driver delegates responsibility of cgroup management to the host systemd which then properly manages the delegate systemd that’s running in the container. 

It’s recommended for nodes running systemd on the host to use the systemd cgroup driver by default, however, COS from Google is still set to use cgroupfs. Checking the developer release notes, we see that in version 101 of COS there is a mention of switching the default cgroup driver to systemd, so the fix is coming!

What’s Next

Debugging this issue was an enlightening experience. If you had asked us before, Is it possible for a container to use more resources than its assigned?, we would have said no. But now that we understand more about how containers deliver the sandbox they provide, it’s become clear the answer should have been, It depends.

Ultimately the escaped permissions were from us bind-mounting /sys/fs/cgroup read-only into the container. A subtle side effect of this, while this directory isn’t writable, all sub directories are. But since this is required of systemd to even boot up, we don’t have the option to remove it. There’s a lot of ongoing work by the container community to get systemd to exist peacefully within containers, but for now we’ll have to make do.


Special thanks to Daniel Walsh from RedHat for writing so much on the topic. And Josh Heinrichs from the Spin team for investigating the issue and discovering the fix.

Additional Information

Chris is an infrastructure engineer with a focus on developer platforms. He’s also a member of the ServiceMeshCon program committee and a @Linkerd ambassador.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Shopify and Open Source: A Mutually Beneficial Relationship

Shopify and Open Source: A Mutually Beneficial Relationship

Shopify and Rails have grown up together. Both were in their infancy in 2004, and our CEO (Tobi) was one of the first contributors and a member of Rails Core. Shopify was built on top of Rails, and our engineering culture is rooted in the Rails Doctrine, from developer happiness to the omakase menu, sharp knives, and majestic monoliths. We embody the doctrine pillars. 

Shopify's success is due, in part, to Ruby and Rails. We feel obligated to pay that success back to the community as best we can. But our commitment and investment are about more than just paying off that debt; we have a more meaningful and mutually beneficial goal.

One Hundred Year Mission

At Shopify, we often talk about aspiring to be a 100-year company–to still be around in 2122! That feels like an ambitious dream, but we make decisions and build our code so that it scales as we grow with that goal in mind. If we pull that off, will Ruby and Rails still be our tech stack? It's hard to answer, but it's part of my job to think about that tech stack over the next 100 years.

Ruby and Rails as 100-year tools? What does that even mean?

To get to 100 years, Rails has to be more than an easy way to get started on a new project. It's about cost-effective performance in production, well-formed opinions on the application architecture, easy upgrades, great editors, avoiding antipatterns, and choosing when you want the benefits of typing. 

To get to 100 years, Ruby and Rails have to merit being the tool of choice, every day, for large teams and well-aged projects for a hundred years. They have to be the tool of choice for thousands of developers, across millions of lines of code, handling billions of web requests. That's the vision. That's Rails at scale.

And that scale is where Shopify is investing.

Why Companies Should Invest In Open Source

Open source is the heart and soul of Rails: I’d say that Rails would be nowhere near what it is today if not for the open source community.

Rafael França, Shopify Principal Engineer and Rails Core Team Member

We invest in open source to build the most stable, resilient, performant version to grow our applications. How much better could it be if more people were contributing? As a community, we can do more. Ruby and Rails can only continue to be a choice for companies if we're actively investing in the development, and to do that; we need more companies involved in contributing.

It Improves Engineering Skills

Practice makes progress! Building open source software with cross-functional teams helps build better communication skills and offers opportunities to navigate feedback and criticism constructively. It also enables you to flex your debugging muscles and develop deep expertise in how the framework functions, which helps you build better, more stable applications for your company.

It’s Essential to Application Health & Longevity

Contributing to open source helps ensure that Rails benefits your application and the company in the long term. We contribute because we care about the changes and how they affect our applications. Investing upfront in the foundation is proactive, whereas rewrites and monkey patches are reactive and lead to brittle code that's hard to maintain and upgrade.

At our scale, it's common to find issues with, or opportunities to enhance, the software we use. Why keep those improvements private? Because we build on open source software, it makes sense to contribute to those projects to ensure that they will be as great as possible for as long as possible. If we contribute to the community, it increases our influence on the software that our success is built on and helps improve our chances of becoming a 100-year company. This is why we make contributions to Ruby and Rails, and other open source projects. The commitment and investment are significant, but so are the benefits.

How We're Investing in Ruby and Rails

Shopify is built on a foundation of open source software, and we want to ensure that that foundation continues to thrive for years to come and that it continues to scale to meet our requirements. That foundation can’t succeed without investment and contribution from developers and companies. We don’t believe that open source development is “someone else’s problem”. We are committed to Ruby and Rails projects because the investment helps us future-proof our foundation and, therefore, Shopify. 

We contribute to strategic projects and invest in initiatives that impact developer experience, performance, and security—not just for Shopify but for the greater community. Here are some projects we’re investing in:

Improving Developer Tooling 

  • We’ve open-sourced projects like toxiproxy, bootsnap, packwerk, tapioca, paquito, and maintenance_tasks that are niche tools we found we needed. If we need them, other developers likely need them as well.
  • We helped add Rails support to Sorbet's gradual typing to make typing better for everyone.
  • We're working to make Ruby support in VScode best-in-class with pre-configured extensions and powerful features like refactorings.
  • We're working on automating upgrades between Ruby and Rails versions to reduce friction for developers.

Increasing Performance

Enhancing Security

  • We're actively contributing to bundler and rubygems to make Ruby's supply chain best-in-class.
  • We're partnering with Ruby Central to ensure the long-term success and security of through strategic investments in engineering, security-related projects, critical tools and libraries, and improving the cycle time for contributors.

Meet Shopify Contributors

The biggest investment you can make is to be directly involved in the future of the tools that your company relies on. We believe we are all responsible for the sustainability and quality of open source. Shopify engineers are encouraged to contribute to open source projects where possible. The commitment varies. Some engineers make occasional contributions, some are part-time maintainers of important open source libraries that we depend on, and some are full-time contributors to critical open source projects.

Meet some of the Shopify engineers contributing to open source. Some of those faces are probably familiar because we have some well-known experts on the team. But some you might not know…yet. We're growing the next generation of Ruby and Rails experts to build for the future.

Mike is a NYC-based engineering leader who's worked in a variety of domains, including energy management systems, bond pricing, high-performance computing, agile consulting, and cloud computing platforms. He is an active member of the Ruby open-source community, where as a maintainer of a few popular libraries he occasionally still gets to write software. Mike has spent the past decade growing inclusive engineering organizations and delivering amazing software products for Pivotal, VMware, and Shopify.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

The Story Behind Shopify’s Isospin Tooling

The Story Behind Shopify’s Isospin Tooling

You may have read that Shopify has built an in-house cloud development platform named Spin. In that post, we covered the history of the platform and how it powers our everyday work. In this post, we’ll take a deeper dive into one specific aspect of Spin: Isospin, Shopify’s systemd-based tooling that forms the core of how we run applications within Spin.

The initial implementation of Spin used the time-honored POSS (Pile of Shell Scripts) design pattern. As we moved to a model where all of our applications ran in a single Linux VM, we were quickly outgrowing our tooling⁠—not to mention the added complexity of managing multiple applications within a single machine. Decisions such as what dependency services to run, in what part of the boot process, and how many copies to run became much more difficult as we ran many applications together within the same instance. Specifically, we needed a way to:

  • split up an application into its component parts
  • specify the dependencies between those parts
  • have those jobs be scheduled at the appropriate times
  • isolate services and processes from each other.

At a certain point, stepping back, an obvious answer began to emerge. The needs we were describing weren’t merely solvable, they were already solved—by something we were already using. We were describing services, the same as any other services run by the OS. There were already tools to solve this built right into the OS. Why not leverage that?

A Lightning Tour of systemd

systemd’s service management works by dividing the system into a graph of units representing individual units or jobs. Each unit can specify its dependencies on other units, in granular detail, allowing systemd to determine an order to launch services in order to bring the system up and reason about cascading failure states. In addition to units representing actual services, it supports targets, which represent an abstract grouping of one or more units. Targets can have dependencies of their own and be depended on by other units, but perform no actual work. By specifying targets representing phases of the boot process and a top-level target representing the desired state of the system, systemd can quickly and comprehensively prepare services to run.

systemd has several features which enable dynamic generation of units. Since we were injecting multiple apps into a system at runtime, with varying dependencies and processes, we made heavy use of these features to enable us to create complex systemd service graphs on the fly.

The first of these is template unit files. Ordinarily, systemd namespaces units via their names; any service named foo will satisfy a dependency on the service named foo, and only one instance of a unit with a name can be running at once. This was obviously not ideal for us, since we have many services that we’re running per-application. Template unit files expand this distinction a bit by allowing a service to take a parameter that becomes part of its namespace. For example, a service named foo@.service could take the argument bar, running as foo@bar. This allows multiple copies of the same service to run simultaneously. The parameter is also available within the unit as a variable, allowing us to namespace runtime directories and other values with the same parameter.

Template units were key to us since not only do they allow us to share service definitions for applications themselves, they allow us to run multiple copies of dependency services. In order to maintain full isolation between applications—and to simulate the separately-networked services they would be talking to in production—neighbor apps within a single Isospin VM don’t use the same installation of core services such as MySQL or Elasticsearch. Instead, we run one copy of these services for each app that needs it. Template units simplified this process greatly and via a single service definition, we simply run as many copies of each as we need.

We also made use of generators, a systemd feature that allows dynamically creating units at runtime. This was useful for us since the dynamic state of our system meant that a fixed service order wasn’t really feasible. There were two primary features of Isospin’s setup that complicated things:

  1. Which app or apps to run in the system isn’t fixed, but rather is assigned when we boot a system. Thus, via information assigned at the time the system is booted, we need to choose which top-level services to enable.

  2. While many of the Spin-specific services are run for every app, dependencies on other services are dynamic. Not every app requires MySQL or Elasticsearch or so on. We needed a way to specify these systemd-level dependencies dynamically.

Generators provided a simple way to run this. Early in the bootup process, we run a generator that creates a target named spin-app for each app to be run in the system. That target contains all of the top-level dependencies an app needs to run, and is then assigned as a dependency of the system is running target. Despite sounding complex, this requires no more than a 28-line bash script and a simple template file for the service. Likewise, we’re able to assign the appropriate dependency services as requirements of this spin-app target via another generator that runs later in the process.

Booting Up an Example

To help understand how this works in action, let’s walk through an example of the Isospin boot process.

We start by creating a target named, which we use to represent Spin has finished booting. We’ll use this target later to determine whether or not the system has successfully finished starting up. We then run a generator named apps that checks the configuration to see which apps we’ve specified for the system. It then generates new dependencies on the spin-app@ target, requesting one instance per application and passing in the name of the application as its parameter.

spin-app@ depends on several of the core services that represent a fully available Spin application, including several more generators. Via those dependencies, we run the spin-svcs@ generator to determine which system-level service dependencies to inject, such as MySQL or Elasticsearch. We also run the spin-procs@ generator that determines which command or commands to run the application itself and generates one service per command.

Finally, we bring the app up via the spin-init@ service and its dependencies. spin-init@ represents the final state of bootstrapping necessary for the application to be ready to run, and via its recursive dependencies systemd builds out the chain of processes necessary to clone an application’s source, run bootstrap processes, and then run any necessary finalizing tasks before it’s ready to run.

Additional Tools (and Problems)

Although the previously described tooling got us very far, we found that we had a few additional problems that required some additional tooling to fix.

A problem we encountered under this new model was port collision between services. In the past, our apps were able to assume they were the only app in a service, so they could claim a common port for themselves without conflict. Although systemd gave us a lot of process isolation for free, this was a hole we’d dug for ourselves and one we’d need to get out of by ourselves too. 

The solution we settled on was simple but effective, and one that leveraged a few systemd features to simplify the process. We reasoned that port collision is only a problem because port selection was in the user’s hands. We could solve this by making port assignment the OS’s responsibility. We created a service that handles port assignment programmatically via a hashing process—by taking the service’s name into account we produce a semi-stable automated port assignment that avoids port collision with any other ports we’ve assigned on the system. This service can be used as a dependency of another service that needs to bind to a port and writes the generated port to an environment path that can be used as input to another service to inject environment variables. As long as we specify this as a dependency, we can ensure that the dependent service receives a PORT variable that it’s meant to respect and bind to.

Another feature that came in handy is systemd’s concept of service readiness. Many process runners, including the Foreman-based solutions we’d been using in the past, have a binary concept of service readiness (either a process is running, or it isn’t) and if a process exits unexpectedly it’s considered failed

systemd has the same model by default, but it also supports something more complex: it allows configuring a notify socket that allows an application to explicitly communicate its readiness. systemd exposes a Unix datagram socket to the service it’s running via the NOTIFY_SOCKET environment variable. When the underlying app has finished starting up and is ready, it can communicate that status via writing a message to the socket. This granularity helps avoid some of the rare but annoying gotchas with a more simple model of service readiness. It ensures that the service is only considered ready to accept connections when it's actually ready, avoiding a scenario in which external services try sending messages during the startup window. It also avoids a situation where the process remains running but the underlying service has failed during startup.

Some of the external services we depend on use this, such as MySQL, but we also wrote our own tooling to incorporate it. Our notify-port script is a thin wrapper around web applications that monitors whether the service we’re wrapping has begun accepting HTTP connections over the port Isospin has assigned to it. By polling the port and notifying systemd when it comes up, we’ve been able to catch many real world bugs where services were waiting on the wrong port, and situations in which a server failed on startup while leaving the process alive.

Isospin on Top

Although we started out with some relatively simple goals, the more we worked with systemd the more we found ourselves able to leverage its tools to our advantage. By building Isospin on top of systemd, we found ourselves able to save time by reusing pre-existing structures that suited our needs and took advantage of sophisticated tooling for expressing service interdependency and service health. 

Going forward, we plan to continue expanding on Isospin to express more complex service relationships. For example, we’re investigating the use of systemd service dependencies to allow teams to express that certain parts of their application relies on another team’s application being available.

Misty De Méo is an engineer who specializes in developer tooling. She’s been a maintainer of the Homebrew package manager since 2011.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

How to Build Trust as a New Manager in a Fully Remote Team

How to Build Trust as a New Manager in a Fully Remote Team

I had been in the same engineering org for seven years before the pandemic hit. We were a highly collaborative and co-located company and the team were very used to brainstorming and working on physical whiteboards and workspaces. When it did, we moved pretty seamlessly from working together from our office to working from our homes. We didn’t pay too much attention to designing a Digital First culture and neither did we alter our ways of working dramatically. 

It was only when I joined Shopify last September that I began to realize that working remotely in a company where you have already established trust is very different from starting in a fully remote space and building trust from the ground up. 

What Is Different About Starting Remotely?

So, what changes? The one word that comes to mind is intentionality. I would define intentionality as “the act of thoughtfully designing interactions or processes.” A lot of things that happen seamlessly and organically in a real-life setting takes more intentionality in a remote setting. If you deconstruct the process of building trust, you’ll find that in a physical setting trust is built in active ways (words you speak, your actions, and your expertise), but also in passive ways (your body language, demeanor, and casual water cooler talk). In a remote setting, it’s much more difficult to observe, but also build casual, non-transactional relationships with people unless you’re intentional about it. 

Also, since you’re represented a lot more through your active voice, it’s important to work on setting up a new way of working and gaining mastery over the set of tools and skills that will help build trust and create success in your new environment.

The 90-Day Plan

The 90-Day Plan is named after the famous book The First 90 Days written by Michael D. Watkins. Essentially, it breaks down your onboarding journey into three steps:

  1. First 30 days: focus on your environment
  2. First 60 days: focus on your team
  3. 90 days and beyond: focus on yourself.

First 30 Days: Focus on Your Environment

Take the time out to think about what kind of workplace you want to create and also to reach out and understand the tone of the wider organization that you are part of.

Study the Building 

When you start a new job in a physical location, it’s common to study the office and understand the layout of, not only the building, but also  the company itself. When beginning work remotely, I suggest you start with a metaphorical study of the building. Try and understand the wider context of the organization and the people in it. You can do this with a mix of pairing sessions, one-on-ones, and peer group sessions. These processes help you in gaining technical and organizational context and also build relationships with peers. 

Set Up the Right Tools

In an office, many details of workplace setup are abstracted away from you. In a fully digital environment, you need to pay attention to set your workplace up for success. There are plenty of materials available on how to set up your home office (like on and Ensure that you take the time to set up your remote tools to your taste. 

Build Relationships 

If you’re remote, it’s easy to be transactional with people outside of your immediate organization. However, it’s much more fun and rewarding to take the time to build relationships with people from different backgrounds across the company. It gives you a wider context of what the company is doing and the different challenges and opportunities.

First 60 Days: Focus on Your Team

Use asynchronous communication for productivity and synchronous for connection.

Establish Connection and Trust

When you start leading a remote team, the first thing to do is establish connection and trust. You do this by spending a lot of your time in the initial days meeting your team members and understanding their aspirations and expectations. You should also, if possible, attempt to meet the team once in real life within a few months of starting. 

Meet in Real Life

Meeting in real life will help you form deep human relationships and understand them beyond the limited scope of workplace transactions. Once you’ve done this, ensure that you create a mix of synchronous and asynchronous processes within your time. Examples of asynchronous processes are automated dailies, code reviews, receiving feedback, and collaboration on technical design documents. We use synchronous meetings for team retros, coffee sessions, demos, and planning sessions. We try to maximize async productivity and being intentional about the times that you do come together as a team.

Establishing Psychological Safety

The important thing about leading a team remotely is to firmly establish a strong culture of psychological safety. Psychological safety in the workplace is important, not only for teams to feel engaged, but also for members to thrive. While it might be trickier to establish psychological safety remotely, its definitely possible. Some of the ways to do it:

  1. Default to open communication wherever possible.
  2. Engage people to speak about issues openly during retros and all-hands meetings.
  3. Be transparent about things that have not worked well for you. Setting this example will help people open themselves up to be vulnerable with their teams.

First 90 Days - Focus on Yourself

How do you manage and moderate your own emotions as you find your way in a new organization with this new way of working?

FOMO Is Real

Starting in a new place is nerve wracking. Starting while fully remote can be a lonely exercise. Working in a global company like Shopify means that you need to get used to the fact that work is always happening in some part of the globe. It’s easy to get overwhelmed and be always on. While FOMO can be very real, be aware of all the new information that you’re ingesting and take the time to reflect upon it.

Design Your Workday

Remote work means you’re not chained to your nine-to-five routine anymore. Reflect on the possibilities this offers you and think about how you want to design your workday. Maybe you want meeting free times to walk the dog, hit the gym, or take a power nap. Ensure you think about how to structure the day in a way that suits your life and plan your agenda accordingly.

Try New Things

It’s pretty intense in the first few months as you try ways to establish trust and build a strong team together. Not everything you try will take and not everything will work. The important thing is to be clear with what you’re setting out to achieve, collect feedback on what works and doesn’t, learn from the experience, and move forward.

Being able to work in a remote work environment is both rewarding and fun. It’s definitely a new superpower that, if used well, leads to rich and absorbing experiences. The first 90 days are just the beginning of this journey. Sit back, tighten your seatbelt and get ready for a joyride of learning and growth.

Sadhana is an engineer, book nerd, constant learner and enabler of people. Worked in the industry for more than 20 years in various roles and tech stacks. Agile Enthusiast. You can connect with Sadhana on LinkedIn and Medium.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Introducing ShopifyQL: Our New Commerce Data Querying Language 

Introducing ShopifyQL: Our New Commerce Data Querying Language 

At Shopify, we recognize the positive impact data-informed decisions have on the growth of a business. But we also recognize that data exploration is gated to those without a data science or coding background. To make it easier for our merchants to inform their decisions with data, we built an accessible, commerce-focused querying language. We call it ShopifyQL. ShopifyQL enables Shopify Plus merchants to explore their data with powerful features like easy to learn syntax, one-step data visualization, built-in period comparisons, and commerce-specific date functions. 

I’ll discuss how ShopifyQL makes data exploration more accessible, then dive into the commerce-specific features we built into the language, and walk you through some query examples.

Why We Built ShopifyQL

As data scientists, engineers, and developers, we know that data is a key factor in business decisions across all industries. This is especially true for businesses that have achieved product market fit, where optimization decisions are more frequent. Now, commerce is a broad industry and the application of data is deeply personal to the context of an individual business, which is why we know it’s important that our merchants be able to explore their data in an accessible way.

Standard dashboards offer a good solution for monitoring key metrics, while interactive reports with drill-down options allow deeper dives into understanding how those key metrics move. However, reports and dashboards help merchants understand what happened, but not why it happened. Often, merchants require custom data exploration to understand the why of a problem, or to investigate how different parts of the business were impacted by a set of decisions. For this, they turn to their data teams (if they have them) and the underlying data.

Historically, our Shopify Plus merchants with data teams have employed a centralized approach in which data teams support multiple teams across the business. This strategy helps them maximize their data capability and consistently prioritizes data stakeholders in the business. Unfortunately, this leaves teams in constant competition for their data needs. Financial deep dives are prioritized over operational decision support. This leaves marketing, merchandising, fulfillment, inventory, and operations to fend for themselves. They’re then forced to either make decisions with the standard reports and dashboards available to them, or do their own custom data exploration (often in spreadsheets). Most often they end up in the worst case scenario: relying on their gut and leaving data out of the decision making process.

Going past the reports and dashboards into the underlying datasets that drive them is guarded by complex data engineering concepts and languages like SQL. The basics of traditional data querying languages are easy to learn. However, applying querying languages to datasets requires experience with, and knowledge of, the entire data lifecycle (from data capture to data modeling). In some cases, simple commerce-specific data explorations like year-over-year sales require a more complicated query than the basic pattern of selecting data from some table with some filter. This isn’t a core competency of our average merchant. They get shut out from the data exploration process and the ability to inform their decisions with insights gleaned from custom data explorations. That’s why we built ShopifyQL.

A Data Querying Language Built for Commerce

We understand that merchants know their business the best and want to put the power of their data into their hands. Data-informed decision making is at the heart of every successful business, and with ShopifyQL we’re empowering Shopify Plus merchants to gain insights at every level of data analysis. 

With our new data querying language, ShopifyQL, Shopify Plus merchants can easily query their online store data. ShopifyQL makes commerce data exploration accessible to non-technical users by simplifying traditional aspects of data querying like:

  • Building visualizations directly from the query, without having to manipulate data with additional tools.
  • Creating year-over-year analysis with one simple statement, instead of writing complicated SQL joins.
  • Referencing known commerce date ranges (For example, Black Friday), without having to remember the exact dates.
  • Accessing data specifically modeled for commerce exploration purposes, without having to connect the dots across different data sources. 

Intuitive Syntax That Makes Data Exploration Easy

The ShopifyQL syntax is designed to simplify the traditional complexities of data querying languages like SQL. The general syntax tree follows a familiar querying structure:

FROM {table_name}
SHOW|VISUALIZE {column1, column2,...} 
TYPE {visualization_type}
AS {alias1,alias2,...}
BY {dimension|date}
WHERE {condition}
SINCE {date_offset}
UNTIL {date_offset}
COMPARE TO {date_offset}
LIMIT {number}

We kept some of the fundamentals of the traditional querying concepts because we believe these are the bedrock of any querying language:

  • FROM: choose the data table you want to query
  • SELECT: we changed the wording to SHOW because we believe that data needs to be seen to be understood. The behavior of the function remains the same: choose the fields you want to include in your query
  • GROUP BY: shortened to BY. Choose how you want to aggregate your metrics
  • WHERE: filter the query results
  • ORDER BY: customize the sorting of the query results
  • LIMIT: specify the number of rows returned by the query.

On top of these foundations, we wanted to bring a commerce-centric view to querying data. Here’s what we are making available via Shopify today.

1. Start with the context of the dataset before selecting dimensions or metrics

We moved FROM to precede SHOW. It’s more intuitive for users to select the dataset they care about first and then the fields. When wanting to know conversion rates it's natural to think about product and then conversion rates, that's why we swapped the order of FROM and SHOW as compared to traditional querying languages.

2. Visualize the results directly from the query

Charts are one of the most effective ways of exploring data, and VISUALIZE aims to simplify this process. Most query languages and querying interfaces return data in tabular format and place the burden of visualizing that data on the end user. This means using multiple tools, manual steps, and copy pasting. The VISUALIZE keyword allows Shopify Plus merchants to display their data in a chart or graph visualization directly from a query. For example, if you’re looking to identify trends in multiple sales metrics for a particular product category:

A screenshot showing the ShopifyQL code at the top of the screen and a line chart that uses VISUALIZE to chart monthly total and gross sales
Using VISUALIZE to chart monthly total and gross sales

We’ve made the querying process simpler by introducing smart defaults that allow you to get the same output with less lines of code. The query from above can also be written as:

FROM sales
VISUALIZE total_sales, gross_sales
BY month
WHERE product_category = ‘Shoes’
SINCE -13m

The query and the output relationship remains explicit, but the user is able to get to the result much faster.

The following language features are currently being worked on, and will be available later this year:

3. Period comparisons are native to the ShopifyQL experience

Whether it’s year-over-year, month-over-month, or a custom date range, period comparison analyses are a staple in commerce analytics. With traditional querying languages, you either have to model a dataset to contain these comparisons as their own entries or write more complex queries that include window functions, common table expressions, or self joins. We’ve simplified that to a single statement. The COMPARE TO keyword allows ShopifyQL users to effortlessly perform period-over-period analysis. For instance, comparing this week’s sales data to last week:

A screenshot showing the ShopifyQL code at the top of the screen and a line chart that uses VISUALIZE for comparing total sales between 2 time periods with COMPARE TO
Comparing total sales between 2 time periods with COMPARE TO

This powerful feature makes period-over-period exploration simpler and faster; no need to learn joins or window functions. Future development will enable multiple comparison periods for added functionality.

4. Commerce specific date ranges simplify time period filtering

Commerce-specific date ranges (for example Black Friday Cyber Monday, Christmas Holidays, or Easter) involve a manual lookup or a join to some holiday dataset. With ShopifyQL, we take care of the manual aspects of filtering for these date ranges and let the user focus on the analysis.

The DURING statement, in conjunction with Shopify provided date ranges, allows ShopifyQL users to filter their query results by commerce-specific date ranges. For example, finding out what the top five selling products were during BFCM in 2021 versus 2020:

A screenshot showing the ShopifyQL code at the top of the screen and a table that shows Product Title, Total Sales BFCM 2021, and Total Sales BFCM 2019
Using DURING to simplify querying BFCM date ranges

Future development will allow users to save their own date ranges unique to their business, giving them even more flexibility when exploring data for specific time periods.

Check out our full list of current ShopifyQL features and language docs at

Data Models That Simplify Commerce-Specific Analysis and Explorations

ShopifyQL allows us to access data models that address commerce-specific use cases and abstract the complexities of data transformation. Traditionally, businesses trade off SQL query simplicity for functionality, which limits users’ ability to perform deep dives and explorations. Since they can’t customize the functionality of SQL, their only lever is data modeling. For example, if you want to make data exploration more accessible to business users via simple SQL, you have to either create one flat table that aggregates across all data sources, or a number of use case specific tables. While this approach is useful in answering simple business questions, users looking to dig deeper would have to write more complex queries to either join across multiple tables, leverage window functions and common table expressions, or use the raw data and SQL to create their own models. 

Alongside ShopifyQL we’re building exploration data models that are able to answer questions across the entire spectrum of commerce: products, orders, and customers. Each model focuses on the necessary dimensions and metrics to enable data exploration associated with that domain. For example, our product exploration dataset allows users to explore all aspects of product sales such as conversion, returns, inventory, etc. The following characteristics allow us to keep these data model designs simple while maximizing the functionality of ShopifyQL:

  • Single flat tables aggregated to a lowest domain dimension grain and time attribute. There’ is no need for complicated joins, common table expressions, or window functions. Each table contains the necessary metrics that describe that domain’s interaction across the entire business, regardless of where the data is coming from (for example, product pageviews and inventory are product concerns from different business processes).
  • All metrics are fully additive across all dimensions. Users are able to leverage the ShopifyQL aggregation functions without worrying about which dimensions are conformed. This also makes table schemas relatable to spreadsheets, and easy to understand for business users with no experience in data modeling practices.
  • Datasets support overlapping use cases. Users can calculate key metrics like total sales in multiple exploration datasets, whether the focus is on products, orders, or customers. This allows users to reconcile their work and gives them confidence in the queries they write.

Without the leverage of creating our own querying language, the characteristics above would require complex queries which would limit data exploration and analysis.

ShopifyQL Is a Foundational Piece of Our Platform

We built ShopifyQL for our Shopify Plus merchants, third-party developer partners, and ourselves as a way to serve merchant-facing commerce analytics. 

Merchants can access ShopifyQL via our new first party app ShopifyQL Notebooks

We used the ShopifyQL APIs to build an app that allows our Shopify Plus merchants to write ShopifyQL queries inside a traditional notebooks experience. The notebooks app gives users the ultimate freedom of exploring their data, performing deep dives, and creating comprehensive data stories. 

ShopifyQL APIs enable our partners to easily develop analytics apps

The Shopify platform allows third-party developers to build apps that enable merchants to fully customize their Shopify experience. We’ve built GraphQL endpoints for access to ShopifyQL and the underlying datasets. Developers can leverage these APIs to submit ShopifyQL queries and return the resulting data in the API response. This allows our developer partners to save time and resources by querying modeled data. For more information about our GraphQL API, check out our API documentation.

ShopifyQL will power all analytical experiences on the Shopify platform

We believe ShopifyQL can address all commerce analytics use cases. Our internal teams are going to leverage ShopifyQL to power the analytical experiences we create in the Shopify Admin—the online backend where merchants manage their stores. This helps us standardize our merchant-facing analytics interfaces across the business. Since we’re also the users of the language, we’re acutely aware of its gaps, and can make changes more quickly.

Looking ahead

We’re planning new language features designed to make querying with ShopifyQL even simpler and more powerful:

  • More visualizations: Line and bar charts are great but, we want to provide more visualization options that help users discover different insights. New visualizations on the roadmap include dual axis charts, funnels, annotations, scatter plots, and donut charts.
  • Pivoting: Pivoting data with a traditional SQL query is a complicated endeavor. We will simplify this with the capability to break down a metric by dimensional attributes in a columnar fashion. This will allow for charting trends of dimensional attributes across time for specific metrics with one simple query.
  • Aggregate conditions: Akin to a HAVING statement in SQL, we are building the capability for users to filter their queries on an aggregate condition. Unlike SQL, we’re going to allow for this pattern in the WHERE clause, removing the need for additional language syntax and keyword ordering complexity.

As we continue to evolve ShopifyQL, our focus will remain on making commerce analytics more accessible to those looking to inform their decisions with data. We’ll continue to empower our developer partners to build comprehensive analytics apps, enable our merchants to make the most out of their data, and support our internal teams with powering their merchant-facing analytical use cases.

Ranko is a product manager working on ShopifyQL and data products at Shopify. He's passionate about making data informed decisions more accessible to merchants.

Are you passionate about solving data problems and eager to learn more, we’re always hiring! Reach out to us or apply on our careers page.

Continue reading

Making Open Source Safer for Everyone with Shopify’s Bug Bounty Program

Making Open Source Safer for Everyone with Shopify’s Bug Bounty Program

Zack Deveau, Senior Application Security Engineer at Shopify, shares the details behind a recent contribution to the Rails library, inspired by a bug bounty report we received. He'll go over the report and its root cause, how we fixed it in our system, and how we took it a step further to make Rails more secure by updating the default serializer for a few classes to use safe defaults.

Continue reading

How We Built Shopify Party

How We Built Shopify Party

Shopify Party is a browser-based internal tool that we built to make our virtual hangouts more fun. With Shopify’s move to remote, we wanted to explore how to give people a break from video fatigue and create a new space designed for social interaction. Here's how we built it.

Continue reading

8 Data Conferences Shopify Data Thinks You Should Attend

8 Data Conferences Shopify Data Thinks You Should Attend

Our mission at Shopify is to make commerce better for everyone. Doing this in the long term requires constant learning and development – which happen to be core values here at Shopify.

Learning and development aren’t exclusively internal endeavors, they also depend on broadening your horizons and gaining insight from what others are doing in your field. One of our favorite formats for learning are conferences.

Conferences are an excellent way to hear from peers about the latest applications, techniques, and use cases in data science. They’re also great for networking and getting involved in the larger data community.

We asked our data scientists and engineers to curate a list of the top upcoming data conferences for 2022. Whether you’re looking for a virtual conference or in-person learning, we’ve got something that works for everyone.

Hybrid Events

Data + AI Summit 2022

When: June 27-30
Where: Virtual or in-person (San Francisco)

The Data + AI Summit 2022 is a global event that provides access to some of the top experts in the data industry through keynotes, technical sessions, hands-on training, and networking. This four-day event explores various topics and technologies ranging from business intelligence, data analytics, and machine learning, to working with Presto, Looker, and Kedro. There’s great content on how to leverage open source technology (like Spark) to develop practical ways of dealing with large volumes of data. I definitely walked away with newfound knowledge.

Ehsan K. Asl, Senior Data Engineer

Transform 2022

When: July 19-28
Where: Virtual or in-person (San Francisco)

Transform 2022 offers three concentrated events over an action-packed two weeks. The first event—The Data & AI Executive Summit—provides a leadership lens on real-world experiences and successes in applying data and AI. The following two events—The Data Week and The AI & Edge Week—dive deep into the most relevant topics in data science across industry tracks like retail, finance, and healthcare. Transform’s approach to showcasing various industry use cases is one of the reasons this is such a great conference. I find hearing how other industries are practically applying AI can help you find unique solutions to challenges in your own industry.

Ella Hilal, VP of Data Science

RecSys 2022

When: September 18-23
Where: Virtual & In-person (Seattle)

RecSys is a conference dedicated to sharing the latest developments, techniques, and use cases in recommender systems. The conference has both a research track and industry track, allowing for different types of talks and perspectives from the field. The industry track is particularly interesting, since you get to hear about real-world recommender use cases and challenges from leading companies. Expect talks and workshops to be centered around applications of recommender systems in various settings (fashion, ecommerce, media, etc), reinforcement learning, evaluation and metrics, and bias and fairness.

Chen Karako, Data Scientist Lead


When: November 1-3
Where: Virtual or in-person (San Francisco)

ODSC West is a great opportunity to connect with the larger data science community and contribute your ideas to the open source ecosystem. Attend to hear keynotes on topics like machine learning, MLOps, natural language processing, big data analytics, and new frontiers in research. On top of in-depth technical talks, there’s a pre-conference bootcamp on programming, mathematics or statistics, a career expo, and opportunities to connect with experts in the industry.

Ryan Harter, Senior Staff Data Scientist

In-person Events

PyData London 2022

When: June 17-19
Where: London

PyData is a community of data scientists and data engineers who use and develop a variety of open source data tools. The organization has a number of events around the world, but PyData London is one of its larger events. You can expect the first day to be tutorials providing walkthroughs on methods like data validation and training object detection with small datasets. The remaining two days are filled with talks on practical topics like solving real-world business problems with Bayesian modeling. Catch Shopify at this year’s PyData London event.

Micayla Wood, Data Brand & Comms Senior Manager

KDD 2022

When: August 14-18
Where: Washington D.C.

KDD is a highly research-focused event and home to some of the data industry’s top innovations like personalized advertising and recommender systems. Keynote speakers range from academics to industry professionals, and policy leaders, and each talk is accompanied by well-written papers for easy reference. I find this is one of the best conferences to keep in touch with the industry trends and I’ve actually applied what I learned from attending KDD to the work that I do.

Vincent Chio, Data Science Manager

Re-Work Deep Learning Summit

When: November 9-10
Where: Toronto

The Re-Work Deep Learning Summit focuses on showcasing the learnings from the latest advancements in AI and how businesses are applying them in the real world. With content focusing on areas like computer vision, pattern recognition, generative models, and neural networks, it was great to hear speakers not only share successes, but also the challenges that still lie in the application of AI. While I’m personally not a machine learning engineer, I still found the content approachable and interesting, especially the talks on how to set up practical ETL pipelines for machine learning applications.

Ehsan K. Asl, Senior Data Engineer


When: October 3-5
Where: Budapest

Crunch is a data science conference focused on sharing the industry’s top use cases for using data to scale a business. With talks and workshops around areas like the latest data science trends and tools, how to build effective data teams, and machine learning at scale, Crunch has great content and opportunities to meet other data scientists from around Europe. With their partner conferences (Impact and Amuse) you also have a chance to listen in to interesting and relevant product and UX talks.

Yizhar (Izzy) Toren, Senior Data Scientist

Rebekah Morgan is a Copy Editor on Shopify's Data Brand & Comms team.

Are you passionate about data discovery and eager to learn more, we’re always hiring! Visit our Data Science and Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

Continue reading

Building a Form with Polaris

Building a Form with Polaris

As companies grow in size and complexity, there comes a moment in time when teams realize the need to standardize portions of their codebase. This most often becomes the impetus for creating a common set of processes, guidelines, and systems, as well as a standard set of reusable UI elements. Shopify’s Polaris design system was born from such a need. It includes design and content guidelines along with a rich set of React components for the UI that Shopify developers use to build the Shopify admin, and that third-party app developers can use to create apps on the Shopify App Store.

Working with Polaris

Shopify puts a great deal of effort into Polaris and it’s used quite extensively within the Shopify ecosystem, by both internal and external developers, to build UIs with a consistent look and feel. Polaris includes a set of React components used to implement functionality such as lists, tables, cards, avatars, and more.

Form… Forms… Everywhere

Polaris includes 16 separate components that not only encompass the form element itself, but also the standard set of form inputs such as checkbox, color or date picker, radio button, and text field to name just a few. Here are a few examples of basic form design in Shopify admin related to creating a Product or Collection.

Add Product and Create Collection Forms
Add Product and Create Collection Forms

Additional forms you’ll encounter in the Shopify admin also include creating an Order or Gift Card.

Create Order and Issue Gift Card Forms
Create Order and Issue Gift Card Forms

From these form examples, we see a clear UI design implemented using reusable Polaris components.

What We’re Building

In this tutorial, we’ll build a basic form for adding a product that includes Title and Description fields, a Save button, and a top navigation that allows the user to return to the previous page.

Add Product Form
Add Product Form

Although our initial objective is to create the basic form design, this tutorial is also meant as an introduction to React, so we’ll also add the additional logic such as state, events, and event handlers to emulate a fully functional React form. If you’re new to React then this tutorial is a great introduction into many of the fundamental underlying concepts of this library.

Starter CodeSandbox

As this tutorial is a step by step guide, providing all the code snippets along the way, we highly encourage you to fork this Starter CodeSandbox and code along with us. “Knowledge” is in the understanding but the “Knowing” is only gained in its application.

If you hit any roadblocks along the way, or just prefer to jump right into the solution code, heres the Solution CodeSandbox.

Initial Code Examination

The codebase we’re working with is a basic create-react-app. We’ll use the PolarisForm component to build our basic form using Polaris components and then add standard React state logic. The starter code also imports the @shopify/polaris library which can be seen in the dependencies section.

Initial Setup in CodeSandbox
Initial Setup

One other thing to note about our component structure is that the component folders contain both an index.js file and a ComponentName.js file. This is a common pattern visible throughout the Polaris component library, such as the Avatar component in the example below.

Polaris Avatar Component Folder
Polaris Avatar Component Folder

Let’s first open the PolarisForm component. We can see that its a bare bones component that outputs some initial text just so that we know everything is working as expected.

Choosing Components

Choosing our Polaris components will seem intuitive but, at times, also requires a deeper understanding of the Polaris library which the tutorial is meant to introduce along the way. Here’s a list of the components we’ll include in our design:


The actual form


To apply a bit of styling between the fields


The text inputs for our form


To provide the back arrow navigation and Save button


To apply a bit of styling around the form

Reviewing the Polaris Documentation

When working with any new library, it’s always best to examine the documentation or as developers like to say, RTFM. With that in mind we’ll review the Polaris documentation along the way, but for now let’s start with the Form component.

The short description of the Form component describes it as “a wrapper component that handles the submissions of forms.” Also, in order to make it easier to start working with Polaris, each component provides best practices, related components, and several use case examples along with a corresponding CodeSandbox. The docs provide explanations for all the additional props that can be passed to the component.

Adding Our Form

It’s time to dive in and build out the form based on our design. The first thing we need to do is import the components we’ll be working with into the PolarisForm.js file.

import { Form, FormLayout, TextField, Page, Card } from "@shopify/polaris";

Now let’s render the Form and TextField components. I’ve also gone ahead and included the following TextField props: label, type, and multiline.

So it seems our very first introduction into our Polaris journey is the following error message:

No i18n was provided. Your application must be wrapped in an <AppProvider> component. See for implementation instructions.

Although the message also provides a link to the AppProvider component, I’d suggest we take a few minutes to read the Get Started section of the documentation. We see there’s a working example of rendering a Button component that’s clearly wrapped in an AppProvider.

And if we take a look at the docs for the AppProvider component it states it's “a required component that enables sharing global settings throughout the hierarchy of your application.”

As we’ll see later, Polaris creates several layers of shared context which are used to manage distinct portions of the global state. One important feature of the Shopify admin is that it supports up to 20 languages. The AppProvider is responsible for sharing those translations to all child components across the app.

We can move past this error by importing the AppProvider and replacing the existing React Fragment (<>) in our App component.

The MissingAppProviderError should now be a thing of the past and the form renders as follows:

Rendering The Initial Form
Rendering The Initial Form

Examining HTML Elements

One freedom developers allow themselves when developing code is to be curious. In this case, my curiosity drives me towards viewing what the HTML elements actually look like in developer tools once rendered.

Some questions that come to mind are: “how are they being rendered” and “do they include any additional information that isn’t visible in the component.” With that in mind, let’s take a look at the HTML structure in developer tools and see for ourselves.

Form Elements In Developer Tools
Form Elements In Developer Tools

At first glance it’s clear that Polaris is prefixed to all class and ID names. There are a few more elements to the HTML hierarchy as some of the elements are collapsed, but please feel free to pursue your own curiosity and continue to dig a bit deeper into the code.

Working With React Developer Tools

Another interesting place to look as we satisfy our curiosity is the Components tab in Dev Tools. If you don’t see it then take a minute to install the React Developer Tools Chrome Extension. I’ve highlighted the Form component so we can focus our attention there first and see all the component hierarchy of our form. Once again, we’ll see there’s more being rendered than what we imported and rendered into our component.

Form Elements In Developer Tools
Form Elements In Developer Tools

We also can see that at the very top of the hierarchy, just below App, is the AppProvider component. Being able to see the entire component hierarchy provides some context into how many nested levels of context are being rendered. 

Context is a much deeper topic in React and is meant to allow child components to consume data directly instead of prop drilling. 

Form Layout Component

Now that we’ve allowed ourselves the freedom to be curious, let’s refocus ourselves on implementing the form. One thing we might have noticed in the initial layout of the elements is that there’s no space between the Title input field and the Description label. This can be easily fixed by wrapping both TextField components in a FormLayout component. 

If we take a look at the documentation, we see that the FormLayout component is “used to arrange fields within a form using standard spacing and that by default it stacks fields vertically but also supports horizontal groups of fields.”

Since spacing is what we needed in our design, lets include the component.

Once the UI updates, we see that it now includes the additional space needed to meet our design specs.

Form With Spacing Added
Form With Spacing Added

With our input fields in place, let’s now move onto adding the back arrow navigation and the Save button to the top of our form design.

The Page Component

This is where Polaris steps outside the bounds of being intuitive and requires that we do a little digging into the documentation, or better yet the HTML. Since we know that we’re rebuilding the Add Product form, then perhaps we take a moment to once again explore our curiosity and take a look at the actual Form component in Shopify admin in the Chrome Dev Tools.

Polaris Page Component As Displayed In HTML
Polaris Page Component As Displayed In HTML

If we highlight the back arrow in HTML it highlights several class names prefixed with Polaris-Page. It looks like we’ve found a reference to the component we need, so now it’s off to the documentation to see what we can find.

Located under the Structure category in the documentation, there’s a component called Page. The short description for the Page component states that it’s “used to build the outer-wrapper of a page, including the page title and associated actions.” The assumption is that title is used for the Add Product text and the associated action includes the Save button.

Let’s give the rest of the documentation a closer look to see what props implement that functionality. Taking a look at the working example, we can correlate it with the following props:


Adds the back arrow navigation


Adds the title to the right of the navigation


Adds the button which will include a event listener


With props in hand, let’s add the Page component along with its props and set their values accordingly.

Polaris Page Component As Displayed In HTML
Page Component With Props

The Card Component

We're getting so close to completing the UI design, but based on a side by side comparison, it still needs a bit of white space around the form elements.  Of course we could opt to add our own custom CSS, but Polaris provides us a component that achieves the same results.

Comparing Our Designs
Comparing Our Designs

If we take a look at the Shopify admin Add Products form in Dev Tools, we see a reference to the Card component, and it appears to be using padding (green outline in the image below) to create the space.

Card Component Padding
Card Component Padding

Let’s also take a look at the documentation to see what the Card component brings to the table. The short description for the Card component states that it is “used to group similar concepts and tasks together to make Shopify easier for merchants to scan, read and get things done.”  Although it makes no mention of creating space either via padding or margin, if we look at the example provided, we see that it contains a prop called sectioned and the docs state the prop it used to “auto wrap content in a section.” Feel free to toggle the True/False buttons to confirm this does indeed create the spacing we’re looking for.

Let’s add the Card component and include the sectioned prop.

It seems our form has finally taken shape and our UI design is complete.

Final Design
Final Design

Add the Logic

Although our form is nice to look at, it doesnt do very much at this point. Typing into the fields doesn’t capture any input and clicking the Save button does nothing. The only feature that appears to function is clicking on the back button.

If youre new to React then this section introduces some of the fundamental elements of React such as state, events, and event handlers.

In React, the Forms can be configured as Controlled or Uncontrolled. If you google either concept you’ll find many articles that describe the differences and use cases for either approach. In our example we will configure the form as a Controlled form. What that means is that we’ll capture every keystroke and re-render the input in its current state.

Adding State and Handler Functions

Since we will be capturing every keystroke in two separate input fields we’ll opt to instantiate two instances of state. The first thing we need to do is import the useState Hook.

import { useState } from "react";

Now we can instantiate two unique instances of state called title and description. Instantiating state requires that we create both a state value and a setState function. React is very particular about state and requires that any updates to the state value use the setState function.

With our state instantiated, lets create our event handler functions that manages all updates to the state. Handler functions aren’t required, however, they represent a React best practice in that developers expect a handler function as part of the convention and many times additional logic needs to take place before state is updated.

Since we have two state values to update, we create two separate handler functions for each one, but being that we’re also implementing a form, we also need an event handler to manage when the form is submitted.

Adding Events 

We’re almost there. Now that state and our event handler functions are in place, it’s time to add the events and assign them the corresponding functions. The two events that we add are: onChange and onSubmit.

Let’s start with adding the onChange event to both of the TextField components. Not only will we need to add the event, but also, being that we are implementing a Controlled form, need to include the value prop and assign the value to its corresponding state value.

Take a moment to confirm that the form is capturing input by typing into the fields. If it works then were good to go.

The last event we’ll add is the onSubmit event. Our logic dictates that the form would only be submitted once the user clicks on the Save button, so that’s where we’ll add the event logic.

If we take a look at the documentation for the Page component we see that it includes an onAction prop. Although the documentation doesn’t go any further than providing an example we assume that’s the prop we use to trigger the onSubmit function.

Of course let’s confirm that everything is now tied together by clicking on the Save button. If everything worked we should see the following console log output:

SyntheticBaseEvent {_reactName: "onClick", _targetInst: null, type: "click", nativeEvent: PointerEvent, target: HTMLSpanElement...}

Clearing the Form

The very last step in our form submission process is to clear the form fields so that the merchant has a clean slate if they choose to add another product.

This tutorial was meant to introduce you to Polaris React components available in Shopify’s Polaris Design System. The library provides a robust set of components that have been meticulously designed by our UX teams and implemented by our Development teams. The Polaris github library is open source, so feel free to look around or set up the local development environment (which uses Storybook).

Joe Keohan is an RnD Technical Facilitator responsible for onboarding our new hire engineering developers. He has been teaching and educating for the past 10 years and is passionate about all things tech. Feel free to reach out on LinkedIn and extend your network or to discuss engineering opportunities at Shopify! When he isn’t leading sessions, you’ll find Joe jogging, surfing, coding and spending quality time with his family.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

To Thread or Not to Thread: An In-Depth Look at Ruby’s Execution Models

To Thread or Not to Thread: An In-Depth Look at Ruby’s Execution Models

Deploying Ruby applications using threaded servers has become widely considered as standard practice in recent years. According to the 2022 Ruby on Rails community survey, in which over 2,600 members of the global Rails community responded to a series of questions regarding their experience using Rails, threaded web servers such as Puma are by far the most popular deployment target. Similarly when it comes to job processors, the thread-based Sidekiq seems to represent the majority of the deployments. 

In this post, I’ll explore the mechanics and reasoning behind this practice and share knowledge and advice to help you make well-informed decisions on whether or not you should utilize threads in your applications (and to that point—how many). 

Why Are Threads the Popular Default?

While there are certainly many different factors for threaded servers' rise in popularity, their main selling point is that they increase an application’s throughput without increasing its memory usage too much. So to fully understand the trade-offs between threads and processes, it’s important to understand memory usage.

Memory Usage of a Web Application

Conceptually, the memory usage of a web application can be divided in two parts.

Two separate text boxes stacked on top of each other, the top one containing the words "Static memory" and the bottom containing the words "Processing memory".
Static memory and processing memory are the two key components of memory usage in a web application.

The static memory is all the data you need to run your application. It consists of the Ruby VM itself, all the VM bytecode that was generated while loading the application, and probably some static Ruby objects such as I18n data, etc. This part is like a fixed cost, meaning whether your server runs 1 or 10 threads, that part will stay stable and can be considered read-only.

The request processing memory is the amount of memory needed to process a request. There you'll find database query results, the output of rendered templates, etc. This memory is constantly being freed by the garbage collector and reused, and the amount needed is directly proportional to the amount of thread your application runs.

Based on this simplified model, we express the memory usage of a web application as:

processes * (static_memory + (threads * processing_memory))

So if you have only 512MiB available, with an application using 200MiB of static memory and needing 150MiB of processing memory, using two single threaded processes requires 700MiB of memory, while using a single process with two threads will use only 500MiB and fit in a Heroku dyno.

Two columns of text boxes next to each other. On the left, a column representing a single process shows a box with the text "Static Memory" at the top, and two boxes with the text "Thread #2 Processing Memory beneath it. In the column on the right which represents two single threaded processes, there are four boxes, which read: "Process 1 Static Memory", "Process #1 Processing Memory", "Process #2 Static Memory", and "Process #2 Processing memory" in order from top to bottom.
A single process with two threads uses less memory than two single threaded processes.

However this model, like most models, is a simplified depiction of reality. Let’s bring it closer to reality by adding another layer of complexity: Copy on Write (CoW).

Enter Copy on Write

CoW is a common resource management technique involving sharing resources rather than duplicating them until one of the users needs to alter it, at which point the copy actually happens. If the alteration never happens, then neither does the copy.

In old UNIX systems of the ’70s or ’80s, forking a process involved copying its entire addressable memory over to the new process address space, effectively doubling the memory usage. But since the mid ’90s, that’s no longer true as, most, if not all, fork implementations are now sophisticated enough to trick the processes into thinking they have their own private memory regions, while in reality they’re sharing it with other processes.

When the child process is forked, its pages tables are initialized to point to the parent’s memory pages. Later on, if either the parent or the child tries to write in one of these pages, the operating system is notified and will actually copy the page before it’s modified.

This means that if neither the child nor the parent write in these shared pages after the fork happens, forked processes are essentially free.

A flow chart with "Parent Process Static Memory" in a text box at the top. On the second row, there are two text boxes containing the text "Process 1 Processing Memory" and "Process 2 Processing Memory", connected to the top text box with a line to illustrate resource sharing by forking of the parent process.
Copy on Write allows for sharing resources by forking the parent process.

So in a perfect world, our memory usage formula would now be:

static_memory + (processes * threads * processing_memory)

Meaning that threads would have no advantage at all over processes.

But of course we're not in a perfect world. Some shared pages will likely be written into at some point, the question is how many? To answer this, we’ll need to know how to accurately measure the memory usage of an application.

Beware of Deceiving Memory Metrics

Because of CoW and other memory sharing techniques, there are now many different ways to measure the memory usage of an application or process. Depending on the context, some metrics can be more or less relevant.

Why RSS Isn’t the Metric You’re Looking For

The memory metric that’s most often shown by various administration tools, such as ps, is Resident Set Size (RSS). While RSS has its uses, it's really misleading when dealing with forking servers. If you fork a 100MiB process and never write in any memory region, RSS will report both processes as using 100MiB. This is inaccurate because 100MiB is being shared between the two processes—the same memory is being reported twice.

A slightly better metric is Proportional Set Size (PSS). In PSS, shared memory region sizes are divided by the number of processes sharing them. So our 100MiB process that was forked once should actually have a PSS of 50MiB. If you’re trying to figure out whether you’re nearing memory exhaustion, this is already a much more useful metric to look at because if you add up all the PSS numbers you get how much memory is actually being used—but we can go even deeper.

On Linux, you can get a detailed breakdown of a process memory usage though cat /proc/$PID/smaps_rollup. Here’s what it looks like for a Unicorn worker on one of our apps in production:

And for the parent process:

Let’s unpack what each element here means. First, the Shared and Private fields. As its name suggests, Shared memory is the sum of memory regions that are in use by multiple processes. Whereas Private memory is allocated for a specific process and isn’t shared by other processes. In this example, we see that out of the 771,912 kB of addressable memory only 437,928 kB (56.7%) are really owned by the Unicorn worker, the rest is inherited from the parent process.

As for Clean and Dirty, Clean memory is memory that has been allocated but never written to (things like the Ruby binary and various native shared libraries). Dirty memory is memory that has been written into by at least one process. It can be shared as long as it was only written into by the parent process before it forked its children.

Measuring and Improving Copy on Write Efficiency

We’ve established that shared memory is a key to maximizing efficiency of processes, so the important question here is how much of the static memory is actually shared. To approximate this, we compare the worker shared memory with the parent process RSS, which is 508,544 kB in this app, so:

worker_shared_mem / master_rss
>>(18288 + 315648) / 508544.0 * 100

Here we see that about two-thirds of the static memory is shared:

A flow chart depicting worker shared memory, with Private and Parent Process Shared Static Memory in text boxes at the top, connecting to two separate columns, each containing Private Static Memory and Processing Memory.
By comparing the worker shared memory with the parent process RSS, we can see that two thirds of this app’s static memory is shared.

If we were looking at RSS, we’d think each extra worker would cost ~750MiB, but in reality it’s closer to ~427MiB, when an extra thread would cost ~257MiB. That’s still noticeably more, but far less than what the initial naive model would have predicted.

There’s a number of ways an application owner can improve CoW efficiency with the general idea being to load as many things as possible as part of the boot process before the server forks. This topic is very broad and could be a whole post by itself, but here are a few quick pointers.

The first thing to do is configure the server to fully load the application. Unicorn, Puma, and Sidekiq Enterprise all have a preload_app option for that purpose. Once that’s done, a common pattern that degrades CoW performance is memoized class variables, for example:

Such delayed evaluation both prevents that memory from being shared and causes a slowdown for the first request to call this method. The simple solution is to instead use a constant, but when it’s not possible, the next best thing is to leverage the Rails eager_load_namespaces feature, as shown here:

Now, locating these lazy loaded constants is the tricky part. Ruby heap-profiler is a useful tool for this. You can use it to dump the entire heap right after fork, and then after processing a few requests, see how much the process has grown and where these extra objects were allocated.

The Case for Process-Based Servers

So, while there are increased memory costs involved in using process-based servers, using more accurate memory metrics and optimizations like CoW to share memory between processes can alleviate some of this. But why use process-based servers such as Unicorn or Resque at all, given the increased memory cost? There are actually advantages to process-based servers that shouldn’t be overlooked, so let’s go through those. 

Clean Timeout Mechanism

When running large applications, you may run into bugs that cause some requests to take much longer than desirable. There could be many reasons for that—they might be specifically crafted by a malicious actor to try to DOS your service, or they might be processing an unexpectedly large amount of data. When this occurs, being able to cleanly interrupt this request is paramount for resiliency. Process-based servers can kill the worker process and fork a fresh one to replace it, ensuring the request is cleanly interrupted.

Threads, however, can’t be interrupted cleanly. Since they directly share mutable resources with other threads, if you attempt to kill a single thread, you may leave some resources such as mutexes or database connections in an unrecoverable state, causing the other threads to run into various unrecoverable errors. 

The Black Box of Global VM Lock Latency

Improved latency is another major advantage of processes over threads in Ruby (and other languages with similar constraints such as Python). A typical web application process will do two types of work: CPU and I/O. So two Ruby process might look like this:

Two rows of text boxes, containing separate boxes with the text "IO", "CPU", and "GC", representing the work of processes in a Ruby web application.
CPU and IOs in two processes in a Ruby application.

But in a Ruby process, because of the infamous Global VM Lock (GVL) , only one thread at a time can execute Ruby code, and when the garbage collector (GC) triggers, all threads are paused. So if we were to use two threads, the picture may instead look like this:

Two rows of text boxes, with the individual boxes containing the text "CPU", "IO", "GVL wait" and GC", representing the work of threads in a Ruby web application and the latency introduced by the GVL.
Global VM Lock (GVL) increases latency in Ruby threads.

So every time two threads need to execute Ruby code at the same time, the service latency increases. How much this happens varies considerably from one application to another and even from one request to another. If you think about it, to fully saturate a process with N threads an application only needs to spend less than 1 / N of its time waiting on I/O. So 50 percent I/O for two threads, 75 percent I/O for four threads, etc. And that’s only the saturation limit, given that a request’s use of I/O and CPU is very much unpredictable, an application doing 75 percent I/O with two threads will still frequently wait on the GVL.

The common wisdom in the Ruby community is that Ruby applications are relatively I/O heavy, but from my experience it’s not quite true, especially once you consider that GC pauses do acquire the GVL too, and Ruby applications tend to spend quite a lot of time in GC.

Web applications are often specifically crafted to avoid long I/O operations in the web request cycle. Any potentially slow or unreliable I/O operation like calling a third-party API or sending an email notification is generally deferred to a background job queue, so the remaining I/O in web requests are mostly reasonably fast database and cache queries. A corollary is that the job processing side of applications tends to be much more I/O intensive than the web side. So job processors like Sidekiq can more frequently benefit from a higher thread count. But even for web servers, using threads can be seen as a perfectly acceptable tradeoff between throughput per dollar and latency. 

The main problem is that as of today there isn’t really a good way to measure how much the service latency is impacted by the GVL, so service owners are left in the dark. Since Ruby doesn’t provide any way to instrument the GVL, all we’re left with are proxy metrics, like gradually increasing or decreasing the number of threads and measuring the impact on the latency metrics, but that’s far from enough.

That’s why I recently put together a feature request and a proof of concept implementation for Ruby 3.2 to provide a GVL instrumentation API . It's a really low-level and hard to use API, but if it’s accepted I plan to publish a gem to expose simple metrics to know exactly how much time is spent waiting for the GVL, and I hope application performance monitoring services include it.

Ractors and Fibers—Not a Silver Bullet Solution

In the last few years, the Ruby community has been experimenting heavily with other concurrency constructs to potentially replace threads, known as Ractors and Fibers.

Ractors can execute Ruby code in parallel, rather than having one single GVL, each Ractor has its own lock, so they theoretically could be game changing. However Ractors can’t share any global mutable state, so even sharing a database connection pool or a logger between Ractors isn’t possible. That’s a major architectural challenge that would require most libraries to be heavily refactored, and the result would likely not be as usable. I hope to be proven wrong, but I don’t expect Ractors to be used as units of execution for sizable web applications any time soon. 

As for Fibers, they’re essentially lighter threads that are cooperatively scheduled. So everything said in the previous sections about threads and the GVL applies to them as well. They’re very well suited for I/O intensive applications that mostly just move bytes streams around and don’t spend much time executing code, but any application that doesn’t benefit from more than a handful of threads won’t benefit from using fibers.

YJIT May Change the Status Quo

While it’s not yet the case, the advent of YJIT may significantly increase the need to run threaded servers in the future. Since just-in-time (JIT) compilers speedup code execution at the expense of unshareable memory usage, JITing Ruby will decrease CoW performance, but will also make applications proportionally more I/O intensive.

Right now, YJIT only offers modest speed gains, but if in the future it manages to provide even a two times speedup, it would certainly allow application owners to ramp up their number of web threads by as much to compensate for the increased memory cost.

Tips to Remember

Ultimately choosing between process versus thread-based servers involves many trade-offs, so it’s unreasonable to recommend either without first looking at an application’s metrics.

But in the abstract, here are a few quick takeaways to keep in mind: 

  • Always enable application preloading to benefit from CoW as much as possible. 
  • Unless your application fits on the smallest offering or your hosting provider, use a smaller number of larger containers instead of a bigger number of smaller containers. For instance a single 4CPU 2GiB of RAM box is more efficient than 4 boxes with 1CPU 512MiB of RAM each.
  • If latency is more important to you than keeping costs low, or if you have enough free memory for it, use Unicorn to benefit from the reliable request timeout. 
    • Note: Unicorn must be protected from slow client attacks by a reverse proxy that buffer requests. If that’s a problem, Puma can be configured to run with a single thread per worker.
  • If using threads, start with only two threads unless you’re confident your application is indeed spending more than half its time waiting on I/O operations. This doesn’t apply to job processors since they tend to be much more I/O intensive and are much less latency sensitive, so they can easily benefit from higher thread counts. 

Looking Ahead: Future Improvements to the Ruby Ecosystem

We’re exploring a number of avenues to improve the situation for both process and thread-based servers.

First, there’s the GVL instrumentation API mentioned previously that should hopefully allow application owners to make more informed trade-offs between throughput and latency. We could even try to use it to automatically apply backpressure by dynamically adjusting concurrency when GVL contention is over some threshold.

Additionally, threaded web servers could theoretically implement a reliable request timeout mechanism. When a request takes longer than expected, they could stop forwarding requests to the impacted worker and wait for all other requests to either complete or timeout before killing the worker and reforking it. That’s something Matthew Draper explored a few years ago and that seems doable.

Then, the CoW performance of Ruby itself could likely be improved further. Several patches have been merged for this purpose over the years, but we can probably do more. Notably we suspect that Ruby’s inline caches cause most of the VM bytecode to be unshared once it’s executed. I think we could also take some inspiration from what the Instagram engineering team did to improve Python’s CoW performance . For instance they introduced a gc.freeze() method that instructs the GC that all existing memory regions will become shared. Python uses this information to make more intelligent decisions around memory usage, like not using any free slots in these shared regions since it’s more efficient to allocate a new page than to dirty an old one.

Jean Boussier is a Rails Core team member, Ruby committer, and Senior Staff Engineer on Shopify's Ruby and Rails infrastructure team. You can find him on GitHub as @byroot or on Twitter at @_byroot.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

Continue reading

Implementing Equality in Ruby

Implementing Equality in Ruby

Ruby is one of the few programming languages that get equality right. I often play around with other languages, but keep coming back to Ruby. This is largely because Ruby’s implementation of equality is so nice.

Nonetheless, equality in Ruby isn't straightforward. There is #==, #eql?, #equal?, #===, and more. Even if you’re familiar with how to use them, implementing them can be a whole other story.

Let's walk through all forms of equality in Ruby and how to implement them.

Why Properly Implementing Equality Matters

We check whether objects are equal all the time. Sometimes we do this explicitly, sometimes implicitly. Here are some examples:

  • Do these two Employees work in the same Team? Or, in code: ==
  • Is the given DiscountCode valid for this particular Product? Or, in code: product.discount_codes.include?(given_discount_code).
  • Who are the (distinct) managers for this given group of employees? Or, in code:

A good implementation of equality is predictable; it aligns with our understanding of equality.

An incorrect implementation of equality, on the other hand, conflicts with what we commonly assume to be true. Here is an example of what happens with such an incorrect implementation:

The geb and geb_also objects should definitely be equal. The fact that the code says they’re not is bound to cause bugs down the line. Luckily, we can implement equality ourselves and avoid this class of bugs.

No one-size-fits-all solution exists for an equality implementation. However, there are two kinds of objects where we do have a general pattern for implementing equality: entities and value objects. These two terms come from domain-driven design (DDD), but they’re relevant even if you’re not using DDD. Let’s take a closer look.


Entities are objects that have an explicit identity attribute. Often, entities are stored in some database and have a unique id attribute corresponding to a unique id table column. The following Employee example class is such an entity:

Two entities are equal when their IDs are equal. All other attributes are ignored. After all, an employee’s name might change, but that does not change their identity. Imagine getting married, changing your name, and not getting paid anymore because HR has no clue who you are anymore!

ActiveRecord, the ORM that is part of Ruby on Rails, calls entities "models" instead, but they’re the same concept. These model objects automatically have an ID. In fact, ActiveRecord models already implement equality correctly out of the box!

Value Objects

Value objects are objects without an explicit identity. Instead, their value as a whole constitutes identity. Consider this Point class:

Two Points will be equal if their x and y values are equal. The x and y values constitute the identity of the point.

In Ruby, the basic value object types are numbers (both integers and floating-point numbers), characters, booleans, and nil. For these basic types, equality works out of the box:

Arrays of value objects are in themselves also value objects. Equality for arrays of value objects works out of the box—for example, [17, true] == [17, true]. This might seem obvious, but this isn’t true in all programming languages.

Other examples of value objects are timestamps, date ranges, time intervals, colors, 3D coordinates, and money objects. These are built from other value objects; for example, a money object consists of a fixed-decimal number and a currency code string.

Basic Equality (Double Equals)

Ruby has the == and != operators for checking whether two objects are equal or not:

Ruby’s built-in types all have a sensible implementation of ==. Some frameworks and libraries provide custom types, which will have a sensible implementation of ==, too. Here is an example with ActiveRecord:

For custom classes, the == operator returns true if and only if the two objects are the same instance. Ruby does this by checking whether the internal object IDs are equal. These internal object IDs are accessible using #__id__. Effectively, gizmo == thing is the same as gizmo.__id__ == thing.__id__.

This behavior is often not a good default, however. To illustrate this, consider the Point class from earlier:

The == operator will return true only when calling it on itself:

This default behavior is often undesirable in custom classes. After all, two points are equal if (and only if) their x and y values are equal. This behavior is undesirable for value objects (such as Point) and entities (such as the Employee class mentioned earlier).

The desired behavior for value objects and entities is as follows:

Image showing the desired behavior for value objects and entities. The first pairing for value objects checks if x and y (all attributes) are equal. The second pair for entities, checks whether the id attributes are equal. The third pair shows the default ruby check, which is whether internal object ids are equal
  • For value objects (a), we’d like to check whether all attributes are equal.
  • For entities (b), we’d like to check whether the explicit ID attributes are equal.
  • By default (c), Ruby checks whether the internal object IDs are equal.

Instances of Point are value objects. With the above in mind, a good implementation of == for Point would look as follows:

This implementation checks all attributes and the class of both objects. By checking the class, checking equality of a Point instance and something of a different class return false rather than raise an exception.

Checking equality on Point objects now works as intended:

The != operator works too:

A correct implementation of equality has three properties: reflexivity, symmetry, and transitivity.

Image with simple circles to describe the implementation of equality having three properties: reflexivity, symmetry, and transitivity, described below the image for more context
  • Reflexivity (a): An object is equal to itself: a == a
  • Symmetry (b): If a == b, then b == a
  • Transitivity (c): If a == b and b == c, then a == c

These properties embody a common understanding of what equality means. Ruby won’t check these properties for you, so you’ll have to be vigilant to ensure you don’t break these properties when implementing equality yourself.

IEEE 754 and violations of reflexivity

It seems natural that something would be equal to itself, but there is an exception. IEEE 754 defines NaN (Not a Number) as a value resulting from an undefined floating-point operation, such as dividing 0 by 0. NaN, by definition, is not equal to itself. You can see this for yourself:

This means that == in Ruby is not universally reflexive. Luckily, exceptions to reflexivity are exceedingly rare; this is the only exception I am aware of.

Basic Equality for Value Objects

The Point class is an example of a value object. The identity of a value object, and thereby equality, is based on all its attributes. That is exactly what the earlier example does:

Basic Equality for Entities

Entities are objects with an explicit identity attribute, commonly @id. Unlike value objects, an entity is equal to another entity if and only if their explicit identities are equal.

Entities are uniquely identifiable objects. Typically, any database record with an id column corresponds to an entity. Consider the following Employee entity class:

Other forms of ID are possible too. For example, books have an ISBN, and recordings have an ISRC. But if you have a library with multiple copies of the same book, then ISBN won’t uniquely identify your books anymore.

For entities, the == operator is more involved to implement than for value objects:

This code does the following:

  • The super call invokes the default implementation of equality: Object#==. On Object, the #== method returns true if and only if the two objects are the same instance. This super call, therefore, ensures that the reflexivity property always holds.
  • As with Point, the implementation Employee#== checks class. This way, an Employee instance can be checked for equality against objects of other classes, and this will always return false.
  • If @id is nil, the entity is considered not equal to any other entity. This is useful for newly-created entities which have not been persisted yet.
  • Lastly, this implementation checks whether the ID is the same as the ID of the other entity. If so, the two entities are equal.

Checking equality on entities now works as intended:

Blog post of Theseus

Implementing equality on entity objects isn’t always straightforward. An object might have an id attribute that doesn’t quite align with the object’s conceptual identity.

Take a BlogPost class, for example, with id, title, and body attributes. Imagine creating a BlogPost, then halfway through writing the body for it, scratching everything and starting over with a new title and a new body. The id of that BlogPost will still be the same, but is it still the same blog post?

If I follow a Twitter account that later gets hacked and turned into a cryptocurrency spambot, is it still the same Twitter account?

These questions don’t have a proper answer. That’s not surprising, as this is essentially the Ship of Theseus thought experiment. Luckily, in the world of computers, the generally accepted answer seems to be yes: if two entities have the same id, then the entities are equal as well.

Basic Equality with Type Coercion

Typically, an object is not equal to an object of a different class. However, this isn’t always the case. Consider integers and floating-point numbers:

Here, float_two is an instance of Float, and integer_two is an instance of Integer. They are equal: float_two == integer_two is true, despite different classes. Instances of Integer and Float are interchangeable when it comes to equality.

As a second example, consider this Path class:

This Path class provides an API for creating paths:

The Path class is a value object, and implementing #== could be done just as with other value objects:

However, the Path class is special because it represents a value that could be considered a string. The == operator will return false when checking equality with anything that isn’t a Path:

It can be beneficial for path == "/usr/bin/ruby" to be true rather than false. To make this happen, the == operator needs to be implemented differently:

This implementation of == coerces both objects to Strings, and then checks whether they are equal. Checking equality of a Path now works:

This class implements #to_str, rather than #to_s. These methods both return strings, but by convention, the to_str method is only implemented on types that are interchangeable with strings.

The Path class is such a type. By implementing Path#to_str, the implementation states that this class behaves like a String. For example, it’s now possible to pass a Path (rather than a String) to, and it will work because accepts anything that responds to #to_str.

String#== also uses the to_str method. Because of this, the == operator is reflexive:

Strict Equality

Ruby provides #equal? to check whether two objects are the same instance:

Here, we end up with two String instances with the same content. Because they are distinct instances, #equal? returns false, and because their content is the same, #== returns true.

Do not implement #equal? in your own classes. It isn’t meant to be overridden. It’ll all end in tears.

Earlier in this post, I mentioned that #== has the property of reflexivity: an object is always equal to itself. Here is a related property for #equal?:

Property: Given objects a and b. If a.equal?(b), then a == b.

Ruby won't automatically validate this property for your code. It’s up to you to ensure that this property holds when you implement the equality methods.
For example, recall the implementation of Employee#== from earlier in this article:

The call to super on the first line makes this implementation of #== reflexive. This super invokes the default implementation of #==, which delegates to #equal?. Therefore, I could have used #equal? rather than super:

I prefer using super, though this is likely a matter of taste.

Hash Equality

In Ruby, any object can be used as a key in a Hash. Strings, symbols, and numbers are commonly used as Hash keys, but instances of your own classes can function as Hash keys too—provided that you implement both #eql? and #hash.

The #eql? Method

The #eql? method behaves similarly to #==:

However, #eql?, unlike #==, does not perform type coercion:

If #== doesn’t perform type coercion, the implementations of #eql? and #== will be identical. Rather than copy-pasting, however, we’ll put the implementation in #eql?, and let #== delegate to #eql?:

I made the deliberate decision to put the implementation in #eql? and let #== delegate to it, rather than the other way around. If we were to let #eql? delegate to #==, there’s an increased risk that someone will update #== and inadvertently break the properties of #eql? (mentioned below) in the process.

For the Path value object, whose #== method does perform type coercion, the implementation of #eql? will differ from the implementation of #==:

Here, #== does not delegate to #eql?, nor the other way around.

A correct implementation of #eql? has the following two properties:

  • Property: Given objects a and b. If a.eql?(b), then a == b.
  • Property: Given objects a and b. If a.equal?(b), then a.eql?(b).

These two properties are not explicitly called out in the Ruby documentation. However, to the best of my knowledge, all implementations of #eql? and #== respect this property.

Ruby will not automatically validate that these properties hold in your code. It’s up to you to ensure that these properties aren’t violated.

The #hash Method

For an object to be usable as a key in a Hash, it needs to implement not only #eql?, but also #hash. This #hash method will return an integer, the hash code, that respects the following property:

Property: Given objects a and b. If a.eql?(b), then a.hash == b.hash.

Typically, the implementation of #hash creates an array of all attributes that constitute identity and returns the hash of that array. For example, here is Point#hash:

For Path, the implementation of #hash will look similar:

For the Employee class, which is an entity rather than a value object, the implementation of #hash will use the class and the @id:

If two objects are not equal, the hash code should ideally be different, too. This isn’t mandatory, however. It’s okay for two non-equal objects to have the same hash code. Ruby will use #eql? to tell objects with identical hash codes apart.

Avoid XOR for Calculating Hash Codes

A popular but problematic approach for implementing #hash uses XOR (the ^ operator). Such an implementation would calculate the hash codes of each individual attribute, and combine these hash codes with XOR. For example:

With such an implementation, the chance of a hash code collision, which means that multiple objects have the same hash code, is higher than with an implementation that delegates to Array#hash. Hash code collisions will degrade performance and could potentially pose a denial-of-service security risk.

A better way, though still flawed, is to multiply the components of the hash code by unique prime numbers before combining them:

Such an implementation has additional performance overhead due to the new multiplication. It also requires mental effort to ensure the implementation is and remains correct.

An even better way of implementing #hash is the one I’ve laid out before—making use of Array#hash:

An implementation that uses Array#hash is simple, performs quite well, and produces hash codes with the lowest chance of collisions. It’s the best approach to implementing #hash.

Putting it Together

With both #eql? and #hash in place, the Point, Path, and Employee objects can be used as hash keys:

Here, we use a Hash instance to keep track of a collection of Points. We can also use a Set for this, which uses a Hash under the hood, but provides a nicer API:

Objects used in Sets need to have an implementation of both #eql? and #hash, just like objects used as hash keys.

Objects that perform type coercion, such as Path, can also be used as hash keys, and thus also in sets:

We now have an implementation of equality that works for all kinds of objects.

Mutability, Nemesis of Equality

So far, the examples for value objects have assumed that these value objects are immutable. This is with good reason because mutable value objects are far harder to deal with.

To illustrate this, consider a Point instance used as a hash key:

The problem arises when changing attributes of this point:

Because the hash code is based on the attributes, and an attribute has changed, the hash code is no longer the same. As a result, collection no longer seems to contain the point. Uh oh!

There are no good ways to solve this problem except for making value objects immutable.

This isn’t a problem with entities. This is because the #eql? and #hash methods of an entity are solely based on its explicit identity—not its attributes.

So far, we’ve covered #==, #eql?, and #hash. These three methods are sufficient for a correct implementation of equality. However, we can go further to improve that sweet Ruby developer experience and implement #===.

Case Equality (Triple Equals)

The #=== operator, also called the case equality operator, isn’t really an equality operator at all. Rather, it’s better to think of it as a membership testing operator. Consider the following:

Here, Range#=== checks whether a range covers a certain element. It’s also common to use case expressions to achieve the same:

This is also where case equality gets its name. Triple-equals is called case equality, because case expressions use it.

You never need to use case. It’s possible to rewrite a case expression using if and ===. In general, case expressions tend to look cleaner. Compare:

The examples above all use Range#===, to check whether the range covers a certain number. Another commonly used implementation is Class#===, which checks whether an object is an instance of a class:

I’m rather fond of the #grep method, which uses #=== to select matching elements from an array. It can be shorter and sweeter than using #select:

Regular expressions also implement #===. You can use it to check whether a string matches a regular expression:

It helps to think of a regular expression as the (infinite) collection of all strings that can be produced by it. The set of all strings produced by /[a-z]/ includes the example string "+491573abcde". Similarly, you can think of a Class as the (infinite) collection of all its instances, and a Range as the collection of all elements in that range. This way of thinking clarifies that #=== really is a membership testing operator.

An example of a class that could implement #=== is a PathPattern class:

An example instance is"/bin/*"), which matches anything directly under the /bin directory, such as /bin/ruby, but not /var/log.

The implementation of PathPattern#=== uses Ruby’s built-in File.fnmatch to check whether the pattern string matches. Here is an example of it in use:

Worth noting is that File.fnmatch calls #to_str on its arguments. This way, #=== automatically works on other string-like objects as well, such as Path instances:

The PathPattern class implements #===, and therefore PathPattern instances work with case/when, too:

Ordered Comparison

For some objects, it’s useful not only to check whether two objects are the same, but how they are ordered. Are they larger? Smaller? Consider this Score class, which models the scoring system of my university in Ghent, Belgium.

(I was a terrible student. I’m not sure if this was really how the scoring even worked — but as an example, it will do just fine.)

In any case, we benefit from having such a Score class. We can encode relevant logic there, such as determining the grade and checking whether or not a score is passing. For example, it might be useful to get the lowest and highest score out of a list:

However, as it stands right now, the expressions scores.min and scores.max will result in an error: comparison of Score with Score failed (ArgumentError). We haven’t told Ruby how to compare two Score objects. We can do so by implementing Score#&<=>:

An implementation of #<=> returns four possible values:

  • It returns 0 when the two objects are equal.
  • It returns -1 when self is less than other.
  • It returns 1 when self is greater than other.
  • It returns nil when the two objects cannot be compared.

The #<=> and #== operators are connected:

  • Property: Given objects a and b. If (a <=> b) == 0, then a == b.
  • Property: Given objects a and b. If (a <=> b) != 0, then a != b.

As before, it’s up to you to ensure that these properties hold when implementing #== and #<=>. Ruby won’t check this for you.

For simplicity, I’ve left out the implementation Score#== in the Score example above. It’d certainly be good to have that, though.

In the case of Score#<=>, we bail out if other is not a score, and otherwise, we call #<=> on the two values. We can check that this works: the expression <=> evaluates to -1, which is correct because a score of 6 is lower than a score of 12. (Did you know that the Belgian high school system used to have a scoring system where 1 was the highest and 10 was the lowest? Imagine the confusion!)

With Score#<=> in place, scores.max now returns the maximum score. Other methods such as #min, #minmax, and #sort work as well.

However, we can’t yet use operators like <. The expression scores[0] < scores[1], for example, will raise an undefined method error: undefined method `<' for #<Score:0x00112233 @value=6>. We can solve that by including the Comparable mixin:

By including Comparable, the Score class automatically gains the <, <=, >, and >= operators, which all call <=> internally. The expression scores[0] < scores[1] now evaluates to a boolean, as expected.

The Comparable mixin also provides other useful methods such as #between? and #clamp.

Wrapping Up

We talked about the following topics:

  • the #== operator, used for basic equality, with optional type coercion
  • #equal?, which checks whether two objects are the same instance
  • #eql? and #hash, which are used for testing whether an object is a key in a hash
  • #===, which isn’t quite an equality operator, but rather a “is kind of” or “is member of” operator
  • #<=> for ordered comparison, along with the Comparable module, which provides operators such as < and >=

You now know all you need to know about implementing equality in Ruby. For more information check out the following resources:

The Ruby documentation is a good place to find out more about equality:

I also found the following resources useful:

Denis is a Senior Software Engineer at Shopify. He has made it a habit of thanking ATMs when they give him money, thereby singlehandedly staving off the inevitable robot uprising.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

Lessons Learned From Running Apache Airflow at Scale

Lessons Learned From Running Apache Airflow at Scale

By Megan Parker and Sam Wheating

Apache Airflow is an orchestration platform that enables development, scheduling and monitoring of workflows. At Shopify, we’ve been running Airflow in production for over two years for a variety of workflows, including data extractions, machine learning model training, Apache Iceberg table maintenance, and DBT-powered data modeling. At the time of writing, we are currently running Airflow 2.2 on Kubernetes, using the Celery executor and MySQL 8.

System diagram showing Shopify's Airflow Architecture
Shopify’s Airflow Architecture

Shopify’s usage of Airflow has scaled dramatically over the past two years. In our largest environment, we run over 10,000 DAGs representing a large variety of workloads. This environment averages over 400 tasks running at a given moment and over 150,000 runs executed per day. As adoption increases within Shopify, the load incurred on our Airflow deployments will only increase. As a result of this rapid growth, we have encountered a few challenges, including slow file access, insufficient control over DAG (directed acyclic graph) capabilities, irregular levels of traffic, and resource contention between workloads, to name a few.

Below we’ll share some of the lessons we learned and solutions we built in order to run Airflow at scale.

1. File Access Can Be Slow When Using Cloud Storage

Fast file access is critical to the performance and integrity of an Airflow environment. A well defined strategy for file access ensures that the scheduler can process DAG files quickly and keep your jobs up-to-date.

Airflow keeps its internal representation of its workflows up-to-date by repeatedly scanning and reparsing all the files in the configured DAG directory. These files must be scanned often in order to maintain consistency between the on-disk source of truth for each workload and its in-database representation. This means the contents of the DAG directory must be consistent across all schedulers and workers in a single environment (Airflow suggests a few ways of achieving this).

At Shopify, we use Google Cloud Storage (GCS) for the storage of DAGs. Our initial deployment of Airflow utilized GCSFuse to maintain a consistent set of files across all workers and schedulers in a single Airflow environment. However, at scale this proved to be a bottleneck on performance as every file read incurred a request to GCS. The volume of reads was especially high because every pod in the environment had to mount the bucket separately.

After some experimentation we found that we could vastly improve performance across our Airflow environments by running an NFS (network file system) server within the Kubernetes cluster. We then mounted this NFS server as a read-write-many volume into the worker and scheduler pods. We wrote a custom script which synchronizes the state of this volume with  GCS, so that users only have to interact with GCS for uploading or managing DAGs. This script runs in a separate pod within the same cluster. This also allows us to conditionally sync only a subset of the DAGs from a given bucket, or even sync DAGs from multiple buckets into a single file system based on the environment’s configuration (more on this later).

Altogether this provides us with fast file access as a stable, external source of truth, while maintaining our ability to quickly add or modify DAG files within Airflow. Additionally, we can use Google Cloud Platform’s IAM (identify and access management) capabilities to control which users are able to upload files to a given environment. For example, we allow users to upload DAGs directly to the staging environment but limit production environment uploads to our continuous deployment processes.

Another factor to consider when ensuring fast file access when running Airflow at scale is your file processing performance. Airflow is highly configurable and offers several ways to tune the background file processing (such as the sort modethe parallelism, and the timeout). This allows you to optimize your environments for interactive DAG development or scheduler performance depending on the requirements.

2. Increasing Volumes Of Metadata Can Degrade Airflow Operations

In a normal-sized Airflow deployment, performance degradation due to metadata volume wouldn’t be an issue, at least within the first years of continuous operation.

However, at scale the metadata starts to accumulate pretty fast. After a while this can start to incur additional load on the database. This is noticeable in the loading times of the Web UI and even more so during Airflow upgrades, during which migrations can take hours.

After some trial and error, we settled on a metadata retention policy of 28 days, and implemented a simple DAG which uses ORM (object–relational mapping) queries within a PythonOperator to delete rows from any tables containing historical data (DagRuns, TaskInstances, Logs, TaskRetries, etc). We settled on 28 days as this gives us sufficient history for managing incidents and tracking historical job performance, while keeping the volume of data in the database at a reasonable level.

Unfortunately, this means that features of Airflow which rely on durable job history (for example, long-running backfills) aren’t supported in our environment. This wasn’t a problem for us, but it may cause issues depending on your retention period and usage of Airflow.

As an alternative approach to a custom DAG, Airflow has recently added support for a db clean command which can be used to remove old metadata. This command is available in Airflow version 2.3.

3. DAGs Can Be Hard To Associate With Users And Teams

When running Airflow in a multi-tenant setting (and especially at a large organization), it’s important to be able to trace a DAG back to an individual or team. Why? Because if a job is failing, throwing errors or interfering with other workloads, us administrators can quickly reach out to the appropriate users.

If all of the DAGs were deployed directly from one repository, we could simply use git blame to track down the job owner. However, since we allow users to deploy workloads from their own projects (and even dynamically generate jobs at deploy-time), this becomes more difficult.

In order to easily trace the origin of DAGs, we introduced a registry of Airflow namespaces, which we refer to as an Airflow environment’s manifest file.

The manifest file is a YAML file where users must register a namespace for their DAGs. In this file they will include information about the jobs’ owners and source github repository (or even source GCS bucket), as well as define some basic restrictions for their DAGs. We maintain a separate manifest per-environment and upload it to GCS alongside with the DAGs.

4. DAG Authors Have A Lot Of Power

By allowing users to directly write and upload DAGs to a shared environment, we’ve granted them a lot of power. Since Airflow is a central component of our data platform, it ties into a lot of different systems and thus jobs have wide-ranging access. While we trust our users, we still want to maintain some level of control over what they can and cannot do within a given Airflow Environment. This is especially important at scale as it becomes unfeasible for the Airflow administrators to review all jobs before they make it to production.

In order to create some basic guardrails, we’ve implemented a DAG policy which reads configuration from the previously mentioned Airflow manifest, and rejects DAGs which don’t conform to their namespace’s constraints by raising an AirflowClusterPolicyViolation.

Based on the contents of the manifest file, this policy will apply a few basic restrictions to DAG files, such as:

  • A DAG ID must be prefixed with the name of an existing namespace, for ownership.
  • Tasks in a DAG must only enqueue tasks to the specified celery queue—more on this later.
  • Tasks in a DAG can only be run in specified pools, to prevent one workload from taking over another’s capacity.
  • Any KubernetesPodOperators in this DAG must only launch pods in the specified namespaces, to prevent access to other namespace’s secrets.
  • Tasks in a DAG can only launch pods into specified sets of external kubernetes clusters

This policy can be extended to enforce other rules (for example, only allowing a limited set of operators), or even mutate tasks to conform to a certain specification (for exampke, adding a namespace-specific execution timeout to all tasks in a DAG).

Here’s a simplified example demonstrating how to create a DAG policy which reads the previously shared manifest file, and implements the first three of the controls mentioned above:

These validations provide us with sufficient traceability while also creating some basic controls which reduce DAGs abilities to interfere with each other.

5. Ensuring A Consistent Distribution Of Load Is Difficult

It’s very tempting to use an absolute interval for your DAGs schedule interval—simply set the DAG to run every timedelta(hours=1) and you can walk away, safely knowing that your DAG will run approximately every hour. However, this can lead to issues at scale.

When a user merges a large number of automatically-generated DAGs, or writes a python file which generates many DAGs at parse-time, all the DAGRuns will be created at the same time. This creates a large surge of traffic which can overload the Airflow scheduler, as well as any external services or infrastructure which the job is utilizing (for example, a Trino cluster).

After a single schedule_interval has passed, all these jobs will run again at the same time, thus leading to another surge of traffic. Ultimately, this can lead to suboptimal resource utilization and increased execution times.

While crontab-based schedules won’t cause these kinds of surges, they come with their own issues. Humans are biased towards human-readable schedules, and thus tend to create jobs which run at the top of every hour, every hour, every night at midnight, etc. Sometimes there’s a valid application-specific reason for this (for example, every night at midnight we want to extract the previous day’s data), but often we have found users just want to run their job on a regular interval. Allowing users to directly specify their own crontabs can lead to bursts of traffic which can impact SLOs and put uneven load on external systems.

As a solution to both these issues, we use a deterministically randomized schedule interval for all automatically generated DAGs (which represent the vast majority of our workflows). This is typically based on a hash of a constant seed such as the dag_id.

The below snippet provides a simple example of a function which generates deterministic, random crontabs which yield constant schedule intervals. Unfortunately, this limits the range of possible intervals since not all intervals can be expressed as a single crontab. We have not found this limited choice of schedule intervals to be limiting, and in cases when we really need to run a job every five hours, we just accept that there will be a single four hours interval each day.

Thanks to our randomized schedule implementation, we were able to smooth the load out significantly. The below image shows the number of tasks completed every 10 minutes over a twelve hour period in our single largest Airflow environment.

Bar graph showing the Tasks Executed versus time. Shows a per 10–minute Interval in our Production Airflow Environment
Tasks Executed per 10–minute Interval in our Production Airflow Environment

6. There Are Many Points of Resource Contention

There’s a lot of possible points of resource contention within Airflow, and it’s really easy to end up chasing bottlenecks through a series of experimental configuration changes. Some of these resource conflicts can be handled within Airflow, while others may require some infrastructure changes. Here’s a couple of ways which we handle resource contention within Airflow at Shopify:


One way to reduce resource contention is to use Airflow pools. Pools are used to limit the concurrency of a given set of tasks. These can be really useful for reducing disruptions caused by bursts in traffic. While pools are a useful tool to enforce task isolation, they can be a challenge to manage because only administrators have access to edit them via the Web UI.

We wrote a custom DAG which synchronizes the pools in our environment with the state specified in a Kubernetes Configmap via some simple ORM queries. This lets us manage pools alongside the rest of our Airflow deployment configuration and allows users to update pools via a reviewed Pull Request without needing elevated access. 

Priority Weight

Priority_weight allows you to assign a higher priority to a given task. Tasks with a higher priority will float to the top of the pile to be scheduled first. Although not a direct solution to resource contention, priority_weight can be useful to ensure that latency-sensitive critical tasks are run prior to lower priority tasks. However, given that the priority_weight is an arbitrary scale, it can be hard to determine the actual priority of a task without comparing it to all other tasks. We use this to ensure that our basic Airflow monitoring DAG (which emits simple metrics and powers some alerts) always runs as promptly as possible.

It’s also worthwhile to note that by default, the effective priority_weight of a task used when making scheduling decisions is the sum of its own weight and that of all its downstream tasks. What this means is that upstream tasks in large DAGs are often favored over tasks in smaller DAGs. Therefore, using priority_weight requires some knowledge of the other DAGs running in the environment.

Celery Queues and Isolated Workers

If you need your tasks to execute in separate environments (for example, dependencies on different python libraries, higher resource allowances for intensive tasks, or differing level of access), you can create additional queues which a subset of jobs submit tasks to. Separate sets of workers can then be configured to pull from separate queues. A task can be assigned to a separate queue using the queue argument in operators. To start a worker which runs tasks from a different queue, you can use the following command:

bashAirflow celery worker –queues <list of queues>

This can help ensure that sensitive or high-priority workloads have sufficient resources, as they won’t be competing with other workloads for worker capacity.

Any combination of pools, priority weights and queues can be useful in reducing resource contention. While pools allow for limiting concurrency within a single workload, a priority_weight can be used to make individual tasks run at a lower latency than others. If you need even more flexibility, worker isolation provides fine-grained control over the environment in which your tasks are executed.

It’s important to remember that not all resources can be carefully allocated in Airflow—scheduler throughput, database capacity and Kubernetes IP space are all finite resources which can’t be restricted on a workload-by-workload basis without the creation of isolated environments.

Going Forward…

There are many considerations that go into running Airflow with such high throughput, and any combination of solutions can be useful. We’ve learned a ton and we hope you’ll remember these lessons and apply some of our solutions in your own Airflow infrastructure and tooling.

To sum up our key takeaways:

  • A combination of GCS and NFS allows for both performant and easy to use file management.
  • Metadata retention policies can reduce degradation of Airflow performance.
  • A centralized metadata repository can be used to track DAG origins and ownership.
  • DAG Policies are great for enforcing standards and limitations on jobs.
  • Standardized schedule generation can reduce or eliminate bursts in traffic.
  • Airflow provides multiple mechanisms for managing resource contention.

What’s next for us? We’re currently working on applying the principles of scaling Airflow in a single environment as we explore splitting our workloads across multiple environments. This will make our platform more resilient, allow us to fine-tune each individual Airflow instance based on its workloads’ specific requirements, and reduce the reach of any one Airflow deployment.

Got questions about implementing Airflow at scale? You can reach out to either of the authors on the Apache Airflow slack community.

Megan has worked on the data platform team at Shopify for the past 9 months where she has been working on enhancing the user experience for Airflow and Trino. Megan is located in Toronto, Canada where she enjoys any outdoor activity, especially biking and hiking.

Sam is a Senior developer from Vancouver, BC who has been working on the Data Infrastructure and Engine Foundations teams at Shopify for the last 2.5 years. He is an internal advocate for open source software and a recurring contributor to the Apache Airflow project.

Interested in tackling challenging problems that make a difference? Visit our Data Science & Engineering career page to browse our open positions. You can also contribute to Apache Airflow to improve Airflow for everyone.

Continue reading

Asynchronous Communication is the Great Leveler in Engineering

Asynchronous Communication is the Great Leveler in Engineering

In March 2020—the early days of the pandemic—Shopify transitioned to become a remote first- company. We call it being Digital by Design. We are now proud to employ Shopifolk around the world.

Not only has being Digital by Design allowed our staff the flexibility to work from wherever they work best, it has also increased the amount of time that they are able to spend with their families, friends, or social circles. In the pre-remote world, many of us had to move far away from our hometowns in order to get ahead in our careers. However, recently, my own family has been negotiating with the reality of aging parents, and remote working has allowed us to move back closer to home so we are always there for them whilst still doing the jobs we love.

However, being remote isn’t without its challenges. Much of the technology industry has spent decades working in colocated office space. This has formed habits in all of us that aren’t compatible with effective remote work. We’re now on a journey of mindfully unraveling these default behaviors and replacing them with remote-focused ways of working.

If you’ve worked in an office before, you’ll be familiar with synchronous communication and how it forms strong bonds between colleagues: a brief chat in the kitchen whilst getting a coffee, a discussion at your desk that was prompted by a recent code change, or a conversation over lunch.

With the layout of physical office spaces encouraging spontaneous interactions, context could be gathered and shared through osmosis with little specific effort—it just happened.

There are many challenges that you face in engineering when working on a globally distributed team. Not everyone is online at the same time of the day, meaning that it can be harder to get immediate answers to questions. You might not even know who to ask when you can’t stick your head up and look around to see who is at their desks. You may worry about ensuring that the architectural direction that you’re about to take in the codebase is the right one when building for the long term—how can you decide when you’re sitting at home on your own?

Teams have had to shift to using a different toolbox of skills now that everyone is remote. One such skill is the shift to more asynchronous communication: an essential glue that holds a distributed workforce together. It’s inclusive of teams in different time zones, it leaves an audit trail of communication and decision making, encourages us to communicate concisely, and enables everybody the same window into the company, regardless of where they are in the world.

However, an unstructured approach can be challenging, especially when teams are working on establishing their communication norms. It helps to have a model with which to reason about how best to communicate for a given purpose and to understand what side-effects of that communication might be.

The Spectrum of Synchronousness

When working remotely, we have to adapt to a different landscape. The increased flexibility of working hours, the ability to find flow and do deep work, and the fact that our colleagues are in different places means we can’t rely on the synchronous, impromptu interactions of the office anymore. We have to navigate a continuum of communication choices between synchronous and asynchronous, choosing the right way to communicate for the right group at the right time.

It’s possible to represent different types of communication on a spectrum, as seen in the diagram below.

A diagram showing spectrum of communication types with the extremes being Synchronous Impermanent Connect and Asynchronous Permanent Disconnected
The spectrum of communication types

Let’s walk the spectrum from left to right—from synchronous to asynchronous—to understand the kinds of choices that we need to make when communicating in a remote environment.

  • Video calls and pair programming are completely synchronous: all participants need to be online at the same time.
  • Chats are written and can be read later, but due to their temporal nature have a short half-life. Usually there’s an expectation that they’ll be read or replied to fairly quickly, else they’re gone.
  • Recorded video is more asynchronous, however they’re typically used as a way of broadcasting some information or complimenting a longer document, and their relevance can fade rapidly.
  • Email is archival and permanent and is typically used for important communication. People may take many days to reply or not reply at all.
  • Written documents are used for technical designs, in-depth analysis, or cornerstones of projects. They may be read many years after they were written but need to be maintained and often represent a snapshot in time.
  • Wikis and READMEs are completely asynchronous, and if well-maintained, can last and be useful forever.

Shifting to Asynchronous

When being Digital by Design, we have to be intentionally more asynchronous. It’s a big relearning of how to work collaboratively. In offices, we could get by synchronously, but there was a catch: colleagues at home, on vacation, or in different offices would have no idea what was going on. Now we’re all in that position, we have to adapt in order to harness all of the benefits of working with a global workforce.

By treating everyone as remote, we typically write as a primary form of communication so that all employees can have access to the same information wherever they are. We replace meetings with asynchronous interactions where possible so that staff have more flexibility over their time. We record and rebroadcast town halls so that staff in other timezones can experience the feeling of watching them together. We document our decisions so that others can understand the archeology of codebases and projects. We put effort into editing and maintaining our company-wide documentation in all departments, so that all employees have the same source of truth about teams, the organization chart, and projects.

This shift is challenging, but it’s worthwhile: effective asynchronous collaboration is how engineers solve hard problems for our merchants at scale, collaborating as part of a global team. Whiteboarding sessions have been replaced with the creation of collaborative documents in tools such as Miro. In-person Town Halls have been replaced with live streamed events that are rebroadcast on different time zones with commentary and interactions taking place in Slack. The information that we all have in our heads has needed to be written, recorded, and documented. Even with all of the tools provided, it requires a total mindset shift to use them effectively.

We’re continually investing in our developer tools and build systems to enable our engineers to contribute to our codebases and ship to production any time, no matter where they are. We’re also investing in internal learning resources and courses so that new hires can autonomously level up their skills and understand how we ship software. We have regular broadcasts of show and tell and demo sessions so that we can all gather context on what our colleagues are building around the world. And most importantly, we take time to write regular project and mission updates so that everyone in the company can feel the pulse of the organization.

Asynchronous communication is the great leveler: it connects everyone together and treats everyone equally.

Permanence in Engineering

In addition to giving each employee the same window into our culture, asynchronous communication also has the benefit of producing permanent artifacts. These could be written documents, pull requests, emails, or videos. As per our diagram, the more asynchronous the communication, the more permanent the artifact. Therefore, shifting to asynchronous communication means that not only are teams able to be effective remotely, but they also produce archives and audit trails for their work.

The whole of Shopify uses a single source of truth—an internal archive of information called the Vault. Here, Shopifolk can find all of the information that they need to get their work done: information on teams, projects, the latest news and video streams, internal podcasts and blog posts. Engineers can quickly find architecture diagrams and design documents for active projects.

When writing design documents for major changes to the codebase, a team produces an archive of their decisions and actions through time. By producing written updates to projects every week, anyone in the company can capture the current context and where it has been derived from. By recording team meetings and making detailed minutes, those that were unable to attend can catch up later on-demand. A shift to asynchronous communication means a shift to implied shift to permanence of communication, which is beneficial for discovery[g][h][i], reflection and understanding.

For example, when designing new features and architecture, we collaborate asynchronously on design documents via GitHub. New designs are raised as issues in our technical designs repository, which means that all significant changes to our codebase are reviewed, ratified and archived publicly. This mirrors how global collaboration works on the open source projects we know and love. Working so visibly can be intimidating for those that haven’t done it before, so we ensure that we mentor and pair with those that are doing it for the first time.

Establishing Norms and Boundaries

Yet, multiple mediums of communication incur many choices in how to use them effectively. When you have the option to communicate via chat, email, collaborative document or GitHub issue, picking the right one can become overwhelming and frustrating. Therefore we encourage our teams to establish their preferred norms and to write them down. For example:

  • What are the response time expectations within a team for chat versus email?
  • How are online working hours clearly discoverable for each team member?
  • How is consensus reached on important decisions?
  • Is a synchronous meeting ever necessary?
  • What is the correct etiquette for “raising your hand” in video calls?
  • Where are design documents stored so they’re easily accessible in the future?

By agreeing upon the right medium to use for given situations, teams can work out what’s right for them in a way that supports flexibility, autonomy, and clarity. If you’ve never done this in your team, give it a go. You’ll be surprised how much easier it makes your day to day work.

The norms that our teams define bridge both synchronous and asynchronous expectations. At Shopify, my team members ensure that they make the most of the windows of overlap that they have each day, setting aside time to be interruptible for pair programming, impromptu chats and meetings, and collaborative design sessions. Conversely, the times of the day when teams have less overlap are equally important. Individuals are encouraged to block out time in the calendar, turn on their “do not disturb” status, and find the space and time to get into a flow state and be productive

A natural extension of the communication norms cover when writing and shipping code. Given that our global distribution of staff can potentially incur delays when it comes to reviewing, merging, and deploying, teams are encouraged to explore and define how they reach alignment and get things shipped. This can happen by raising the priority of reviewing pull requests created in other timezones first thing in the morning before getting on with your own work, through to finding additional engineers that may be external to the team but on the same timezone to lean in and offer support, review, and pairing.

Maintaining Connectivity

Once you get more comfortable with working ever more asynchronously, it can be tempting to want to make everything asynchronous: stand-ups on Slack, planning in Miro, all without needing anyone to be on a video call at all. However, if we look back at the diagram one more time, we’ll see that there’s an important third category: connectivity. Humans are social beings, and feeling connected to other, real humans—not just avatars—is critical to our wellbeing. This means that when shifting to asynchronous work we also need to ensure that we maintain that connection. Sometimes having a synchronous meeting can be a great thing, even if it’s less efficient—the ability to see other faces, and to chat, can’t be taken for granted.

We actively work to ensure that we remain connected to each other at Shopify. Pair programming is a core part of our engineering culture, and we love using Tuple to solve problems collaboratively, share context about our codebases, and provide an environment to help hundreds of new engineers onboard and gain confidence working together with us.

We also strongly advocate for plenty of time to get together and have fun. And no, I’m not talking about generic, awkward corporate fun. I’m talking about hanging out with colleagues and throwing things at them in our very own video game: Shopify Party (our internal virtual world for employees to play games or meet up). I’m talking about your team spending Friday afternoon playing board games together remotely. And most importantly, I’m talking about at least twice a year, we encourage teams to come together in spaces we’ve invested in around the world for meaningful and intentional moments for brainstorming, team building, planning, and establishing connections offline.

Asynchronous brings efficiency, and synchronous brings connectivity. We’ve got both covered at Shopify.

James Stanier is Director of Engineering at Shopify. He is also the author of Become an Effective Software Manager and Effective Remote Work. He holds a Ph.D. in computer science and runs

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Double Entry Transition Tables: How We Track State Changes At Shopify

Double Entry Transition Tables: How We Track State Changes At Shopify

Recently we launched Shopify Balance, a money management account and card that gives Shopify merchants quick access to their funds with no fees. After the beta launch of Shopify Balance, the Shopify Data team was brought in to answer the question: how do we reliably count the number of merchants using Balance? In particular, how do we count this historically?

While this sounds like a simple question, it’s foundationally critical to knowing if our product is a success and if merchants are actually using it. It’s also more complicated than it seems to answer.

To be considered as using Shopify Balance, a merchant has to have both an active Shopify Balance account and an active Shopify account. This means we needed to build something to track the state changes of both accounts simultaneously, and make that tracking robust and reliable over time. Enter double entry transition tables. While very much an “invest up front and save a ton of time in the long run” strategy, double entry transition tables give us the flexibility to see the individual inputs that cause a given change. It does all of this while simplifying our queries and reducing long term maintenance on our reporting.

In this post, we’ll explore how we built a data pipeline using double entry transition tables to answer our question: how many Shopify merchants are using Shopify Balance? We’ll go over how we designed something that scales as our product grows in complexity, the benefits of using double entry transition tables—from ease of use to future proofing our reporting—and some sample queries using our new table.

What Are Double Entry Transition Tables?

Double entry transition tables are essentially a data presentation format that tracks changes in attributes of entities over time. At Shopify, one of our first use cases of a double entry transition table was used to track the state of merchants using the platform, allowing us to report on how many merchants have active accounts. In comparison to a standard transition table that has from and to columns, double entry transition tables output two rows for each state change, along with a new net_change column. They can also combine many individual tracked attributes into a single output.

It took me a long time to wrap my head around this net_change column, but it essentially works like this: if you want to track the status of something over time, every time the status changes from one state to another or vice versa, there will be two entries:

  1. net_change = -1: this row is the previous state
  2. net_change = +1: this row is the new state

Double entry transition tables have many advantages including:

  • The net_change column is additive: this is the true benefit of using this type of table. This allows you to quickly get the number of entities that are in a certain state by summing up net_change while filtering for the state you care about.
  • Identifying cause of change: for situations where you care about an overall status (one that depends on several underlying statuses), you can go into the table and see which of the individual attributes caused the change.
  • Preserving all timing information: the output preserves all timing information, and even correctly orders transitions that have identical timestamps. This is helpful for situations where you need to know something like the duration of a given status.
  • Easily scaled with additional attributes: if the downstream dependencies are written correctly, you can add additional attributes to your table as the product you’re tracking grows in complexity. The bonus is that you don’t have to rewrite any existing SQL or PySpark, all thanks to the additive nature of the net_change column.

For our purpose of identifying how many merchants are using Shopify Balance, double entry transition tables allow us to track state changes for both the Shopify Balance account and the Shopify account in a single table. It also gives us a clean way to query the status of each entity over time. But how do we do this?

Building Our Double Entry Transition Pipelines

First, we need to prepare individual attribute tables to be used as inputs for our double entry transition data infrastructure. We need at least one attribute, but it can scale to any number of attributes as the product we’re tracking grows.

In our case, we created individual attribute tables for both the Shopify Balance account status and the Shopify account status. An attribute input table must have a specific set of columns:

  • a partition key that’s common across attribute, which in our case is an account_id
  • a sort key, generally a transition_at timestamp and an index
  • an attribute you want to track.

Using a standard transition table, we can convert it to an attribute with a simple PySpark job:

Note the index column. We created this index using a row number window function, ordering by the transition_id any time we have duplicate account_id and transition_at sets in our original data. While simple, it serves the role of a tiebreak should there be two transition events with identical timestamps. This ensures we always have a unique account_id, transition_at, index set in our attribute for correct ordering of events. The index plays a key role later on when we create our double entry transition table, ensuring we’re able to capture the order of our two states.

Our Shopify Balance status attribute table showing a merchant that joined and left Shopify Balance.

Now that we have our two attribute tables, it’s time to feed these into our double entry transition pipelines. This system (called build merge state transitions) takes our individual attribute tables and first generates a combined set of unique rows using a partition_key (in our case, the account_id column), and a sort_key (in our case, the transition_at and index columns). It then creates one column per attribute, and fills in the attribute columns with values from their respective tables, in the order defined by the partition_key and sort_key. Where values are missing, it fills in the table using the previous known value for that attribute. Below you can see two example attributes being merged together and filled in:

Two example attributes merged into a single output table.

This table is then run through another process that creates our net_change column and assigns a +1 value to all current rows. It also inserts a second row for each state change with a net_change value of -1. This net_change column now represents the direction of each state change as outlined earlier.

Thanks to our pipeline, setting up a double entry transition table is a very simple PySpark job:

Note in the code above we’ve specified default values. These are used to fill in the initial null values for the attributes. Now below is the output of our final double entry transition table, which we call our accounts_transition_facts table. The table captures both a merchant’s Shopify and Shopify Balance account statuses over time. Looking at the shopify_status column, we can see they went from inactive to active in 2018, while the balance_status column shows us that they went from not_on_balance to active on March 14, 2021, and subsequently from active to inactive on April 23, 2021:

A merchant that joined and left Shopify Balance in our accounts_transition_facts double entry transition table.

Using Double Entry Transition Tables

Remember how I mentioned that the net_change column is additive? This makes working with double entry transition tables incredibly easy. The ability to sum the net_change column significantly reduces the SQL or PySpark needed to get counts of states. For example, using our new account_transition_facts table, we can identify the total number of active accounts on Shopify Balance, using both the Shopify Balance status and Shopify status. All we have to do is sum our net_change column while filtering for the attribute statuses we care about:

Add in a grouping on a date column and we can see the net change in accounts over time:

We can even use the output in other PySpark jobs. Below is an example of a PySpark job consuming the output of our account_transition_facts table. In this case, we are adding the daily net change in account numbers to an aggregate daily snapshot table for Shopify Balance:

There are many ways you can achieve the same outputs using SQL or PySpark, but having a double entry transition table in place significantly simplifies the code at query time. And as mentioned earlier, if you write the code using the additive net_change column, you won’t need to rewrite any SQL or PySpark when you add more attributes to your double entry transition table.

We won’t lie, it took a lot of time and effort to build the first version of our account_transition_facts table. But thanks to our investment, we now have a reliable way to answer our initial question: how do we count the number of merchants using Balance? It’s easy with our double entry transition table! Grouping by the status we care about, simply sum net_change and viola, we have our answer.

Not only does our double entry transition table simply and elegantly answer our question, but it also easily scales with our product. Thanks to the additive nature of the net_change column, we can add additional attributes without impacting any of our existing reporting. This means this is just the beginning for our account_transition_facts table. In the coming months, we’ll be evaluating other statuses that change over time, and adding in those that make sense for Shopify Balance into our table. Next time you need to reliably count multiple states, try exploring double entry transition tables.

Justin Pauley is a Data Scientist working on Shopify Balance. Justin has a passion for solving complex problems through data modeling, and is a firm believer that clean data leads to better storytelling. In his spare time he enjoys woodworking, building custom Lego creations, and learning new technologies. Justin can be reached via LinkedIn.

Are you passionate about data discovery and eager to learn more, we’re always hiring! Reach out to us or apply on our careers page.

If you’re interested in building solutions from the ground up and would like to come work with us, please check out Shopify’s career page.

Continue reading

Shopify Invests in Research for Ruby at Scale

Shopify Invests in Research for Ruby at Scale

Shopify is continuing to invest on Ruby on Rails at scale. We’ve taken that further recently by funding high-profile academics to focus their work towards Ruby and the needs of the Ruby community. Over the past year we have given nearly half a million dollars in gifts to influential researchers that we trust to make a significant impact on the Ruby community for the long term.

Shopify engineers and researchers at a recent meetup in London

We want developments in programming languages and their implementations to be explored in Ruby, so that support for Ruby's unique properties are built in from the start. For example, Ruby's prevalent metaprogramming motivated a whole new kind of inline caching to be developed and presented as a paper at one of the top programming language conferences, and Ruby's unusually loose C extension API motivated a new kind of C interpreter to run virtualized C. These innovations wouldn't have happened if academics weren't looking at Ruby.

We want programming language research to be evaluated against the workloads that matter to companies using Ruby. We want researchers to understand the scale of our code bases, how frequently they're deployed, and the code patterns we use in them. For example, a lot of VM research over the last couple of decades has traded off a long warmup optimization period for better peak performance, but this doesn't work for companies like Shopify where we're redeploying very frequently. Researchers aren't aware of these kinds of problems unless we partner with them and guide them.

We think that working with academics like this will be self-perpetuating. With key researchers thinking and talking about Ruby, more early career researchers will consider working with Ruby and solving problems that are important to the Ruby community.

Let’s meet Shopify’s new research collaborators.

Professor Laurence Tratt

Professor Laurence Tratt describes his vision for optimizing Ruby

Professor Laurence Tratt is the Shopify and Royal Academy of Engineering Research Chair in Language Engineering at King’s College London. Jointly funded by Shopify, the Royal Academy, and King’s College, Laurie is looking at the possibility of automatically generating a just-in-time compiler from the existing Ruby interpreter through hardware meta-tracing and basic-block stitching.

Laurie has an eclectic and influential research portfolio, and extensive writing on many aspects of improving dynamic languages and programming. He has context from the Python community and the groundbreaking work towards meta-tracing in the PyPy project. Laurie also works to build the programming language implementation community for the long term by co-organising a summer school series for early career researchers, bringing them together with experienced researchers from academia and industry.

Professor Steve Blackburn

Professor Steve Blackburn is building a new model for applied garbage collection

Professor Steve Blackburn is an academic at the Australian National University and Google Research. Shopify funded his group’s work on MMTk, the memory management toolkit, a general library for garbage collection that brings together proven garbage collection algorithms with a framework for research into new ideas for garbage collection. We’re putting MMTk into Ruby so that Ruby can get the best current collectors today and future garbage collectors can be tested against Ruby.

Steve is a world-leading expert in garbage collection, and Shopify’s funding is putting Ruby’s unique requirements for memory management into his focus.

Dr Stefan Marr

Dr Stefan Marr is an expert in benchmarking dynamic language implementations

Dr Stefan Marr is a Senior Lecturer at the University of Kent in the UK and a Royal Society Industrial Fellow. With the support of Shopify, he’s examining how we can make interpreters faster and improve interpreter startup and warmup time.

Stefan has a distinguished reputation for benchmarking techniques, differential analysis between languages and implementation techniques, and dynamic language implementation. He co-invented a new method for inline caching that has been instrumental for improving the performance of Ruby’s metaprogramming in TruffleRuby.

Shopify engineers and research collaborators discuss how to work together to improve Ruby

We’ve been bringing together the researchers that we’re funding with our senior Ruby community engineers to share their knowledge of what’s already possible and what could be possible, combining our understanding of how Ruby and Rails are used at scale today and what the community needs.

These external researchers are all in addition to our own internal teams doing publishable research-level work on Ruby, with YJIT and TruffleRuby, and more efforts.

Part of Shopify’s Ruby and Rails Infrastructure Team listening to research proposals

We’ll be looking forward to sharing more about our investments in Ruby research over the coming years in blog posts and academic papers.

Chris Seaton has a PhD in optimizing Ruby and works on TruffleRuby, a highly optimizing implementation of Ruby, and research projects at Shopify.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Maestro: The Orchestration Language Powering Shopify Flow

Maestro: The Orchestration Language Powering Shopify Flow

Adagio misterioso

Shopify recently unveiled a new version of Shopify Flow. Merchants extensively use Flow’s workflow language and associated execution engine to customize Shopify, automate tedious, repetitive tasks, and focus on what matters. Flow comes with a comprehensive library of templates for common use cases, and detailed documentation to guide merchants in customizing their workflows.

For the past couple of years my team has been working on transitioning Flow from a successful Shopify Plus App into a platform designed to power the increasing automation and customization needs across Shopify. One of the main technical challenges we had to address was the excessive coupling between the Flow editor and engine. Since they shared the same data structures, the editor and engine couldn't evolve independently, and we had limited ability to tailor these data structures for their particular needs. This problem was significant because editor and engine have fundamentally very different requirements.

The Flow editor provides a merchant-facing visual workflow language. Its language must be declarative, capturing the merchant’s intent without dealing with how to execute that intent. The editor concerns itself mainly with usability, understandability, and interactive editing of workflows. The Flow engine, in turn, needs to efficiently execute workflows at scale in a fault-tolerant manner. Its language can be more imperative, but it must have good support for optimizations and have at-least-once execution semantics that ensures workflow executions recover from crashes. However, editor and engine also need to play together nicely. For example, they need to agree on the type system, which is used to find user errors and to support IDE-like features, such as code completion and inline error reporting within the visual experience.

We realized it was important to tackle this problem right away, and it was crucial to get it right while minimizing disruptions to merchants. We proceeded incrementally.

First, we designed and implemented a new domain-specific orchestration language that addressed the requirements of the Flow engine. We call this language Maestro. We then implemented a new, horizontally scalable engine to execute Maestro orchestrations. Next, we created a translation layer from original Flow workflow data structures into Maestro orchestrations. This allowed us to execute existing Flow workflows with the new engine. At last, we slowly migrated all Flow workflows to use the new engine, and by BFCM 2020 essentially all workflows were executing in the new engine.

We were then finally in a position to deal with the visual language. So we implemented a brand new visual experience, including a new language for the Flow editor. This language is more flexible and expressive than the original, so any of the existing workflows could be easily migrated. The language also can be translated into Maestro orchestrations, so it could be directly executed by the new engine. Finally, once we were satisfied with the new experience, we started migrating existing Flow workflows, and by early 2022, all Flow workflows had been migrated to use the new editor and new engine.

In the remainder of this post I want to focus on the new orchestration language, Maestro. I’ll give you an overview of its design and implementation, and then focus on how it neatly integrates with and addresses the requirements of the new version of Shopify Flow.


A Sample of Maestro

Allegro grazioso

Let’s take a quick tour to get a taste of what Maestro looks like and what exactly it does. Maestro isn’t a general purpose programming language, but rather an orchestration language focused solely on coordinating the sequence in which calls to functions on some host language are made, while capturing which data is passed between those function calls. For example, suppose you want to implement code that calls a remote service to fetch some relevant customers and then deletes those customers from the database. The Maestro language can’t implement the remote service call nor the database call themselves, but it can orchestrate those calls in a fault-tolerant fashion. The main benefit of using Maestro is that the state of the execution is precisely captured and can be made durable, so you can observe the progression and restart where you left in the presence of crashes.

The following Maestro code, slightly simplified for presentation, implements an orchestration similar to the example above. It first defines the shape of the data involved in the orchestration: an object type called Customer with a few attributes. It then defines three functions. Function fetch_customers takes no parameters and returns an array of Customers. Its implementation simply performs a GET HTTP request to the appropriate service. The delete_customer function, in this example, simulates the database deletion by calling the print function from the standard library. The orchestration function represents the main entry point. It uses the sequence expression to coordinate the function calls: first call fetch_customers, binding the result to the customers variable, then map over the customers calling delete_customer on each.

Maestro functions declare interfaces to encapsulate expressions: the bodies of fetch_customers and delete_customer are call expressions, and the body of orchestration is a sequence expression that composes other expressions. But at some point we must yield to the host language to implement the actual service request, database call, print, and so on. This is accomplished by a function whose body is a primitive expression, meaning it binds to the host language code registered under the declared key. For example, these are the declarations of the get and print functions from the standard library of our Ruby implementation:

We now can use the Maestro interpreter to execute the orchestration function. This is one possible simplified output from the command line:

The output contains the result of calling print twice, once for each of the customers returned by the fetch service. The interesting aspect here is that the -c flag instructed the interpreter to also dump checkpoints to the standard output.

Checkpoints are what Maestro uses to store execution state. They contain enough information to understand what has already happened in the orchestration and what wasn’t completed yet. For example, the first checkpoint contains the result of the service request that includes a JSON object with the information about customers to delete. In practice, checkpoints are sent to durable storage, such as Kafka, Redis, or MySQL. Then, if the interpreter stops for some reason, we can restart and point it to the existing checkpoints. The interpreter can recover by skipping expressions for which a checkpoint already exists. If we crash while deleting customers from the database, for example, we wouldn’t re-execute the fetch request because we already have its result.

The checkpoints mechanism allows Maestro to provide at-least-once semantics for primitive calls, exactly what’s expected of Shopify Flow workflows. In fact, the new Flow engine, at a high level, is essentially a horizontally scalable, distributed pool of workers that execute the Maestro interpreter on incoming events for orchestrations generated by Flow. Checkpoints are used for fault tolerance as well as to give merchants feedback on each execution step, current status, and so on.

Flow and Maestro Ensemble

Presto festoso

Now that we know what Maestro is capable of, let’s see how it plays together with Flow. The following workflow, for example, shows a typical Flow automation use case. It triggers when orders in a store are created and checks for certain conditions in that order, based on the presence of discount codes or the customer’s email. If the condition predicate matches successfully, it adds a tag to the order and subsequently sends an email to the store owner to alert of the discount.

Screenshot of the Flow app showing the visualization of creating a workflow based on conditions
A typical Flow automation use case

Consider a merchant using the Flow App to create and execute this workflow. There are four main activities involved 

  1. navigating the possible tasks and types to use in the workflow
  2. validating that the workflow is correct
  3. activating the workflow so it starts executing on events
  4. monitoring executions.

Catalog of Tasks and Types

The Flow Editor displays a catalog of tasks for merchants to pick from. Those are triggers, conditions, and actions provided both by Shopify and Shopify Apps via Shopify Flow Connectors. Furthermore, Flow allows merchants to navigate Shopify’s GraphQL Admin API objects in order to select relevant data for the workflow. For example, the Order created trigger in this workflow conceptually brings an Order resource that represents the order that was just created. So, when the merchant is defining a condition or passing arguments to actions, Flow assists in navigating the attributes reachable from that Order object. To do so, Flow must have a model of the GraphQL API types and understand the interface expected and provided by tasks. Flow achieves this by building on top of Maestro types and functions, respectively.

Flow models types as decorated Maestro types: the structure is defined by Maestro types, but Flow adds information, such as field and type descriptions. Most types involved in workflows come from APIs, such as the Shopify GraphQL Admin API. Hence, Flow has an automated process to consume APIs and generate the corresponding Maestro types. Additional types can be defined, for example, to model data included in the events that correspond to triggers, and model the expected interface of actions. For instance, the following types are simplified versions of the event data and Shopify objects involved in the example:

Flow then uses Maestro functions and calls to model the behavior of triggers, conditions, and actions. The following Maestro code shows function definitions for the trigger and actions involved in the workflow above.

Actions are mapped directly to Maestro functions that define the expected parameters and return types. An action used in a workflow is a call to the corresponding function. A trigger, however, is mapped to a data hydration function that takes event data, which often includes only references by IDs, and loads additional data required by the workflow. For example, the order_created function takes an OrderCreatedTrigger, which contains the order ID as an Integer, and performs API requests to load an Order object, which contains additional fields like name and discountCode. Finally, conditions are currently a special case in that they’re translated to a sequence of function calls based on the predicate defined for the condition (more on that in the next section).

Workflow Validation

Once a workflow is created, it needs validation. For that, Flow composes a Maestro function representing the whole workflow. The parameter of the workflow function is the trigger data since it’s the input for its execution. The body of the function corresponds to the transitions and configurations of tasks in the workflow. For example, the following function corresponds to the example:

The first call in the sequence corresponds to the trigger function that’s used to hydrate objects from the event data. The next three steps correspond to the logical expression configured for the condition. Each disjunction branch becomes a function call (to eq and ends_with, respectively), and the result is computed with or. A Maestro match expression is used to pattern match on the result. If it’s true, the control flow goes to the sequence expression that calls the functions corresponding to the workflow actions.

Flow now can rely on Maestro static analysis to validate the workflow function. Maestro will type check, verify that every referred variable is in scope, verify that object navigation is correct (for example, that is valid), and so on. Then, any error found through static analysis is mapped back to the corresponding workflow node and presented in context in the Editor. In addition to returning errors, static analysis results contain symbol tables for each expression indicating which variables are in scope and what their types are. This supports the Editor in providing code completion and other suggestions that are specific for each workflow step. The following screenshot, for example, shows how the Editor can guide users in navigating the fields present in objects available when selecting the Add order tags action.

Flow App Editor screenshot shows how the Editor can guide users in navigating the fields present in objects available when selecting the Add order tags action. The editor shows detailed information for ther.
Shopify Flow App editor

Note that transformation and validation run while a Flow workflow is being edited, either in the Flow Editor or via APIs. This operation is synchronous and, thus, must be very fast since merchants are waiting for the results. This architecture is similar to how modern IDEs send source code to a language service that parses the code into a lower level representation and returns potential errors and additional static analysis results.

Workflow Activation

Once a workflow is ready, it needs to be activated to start executing. The process is initially similar to validation in that Flow generates the corresponding Maestro function. However, there are a few additional steps. First, Maestro performs a static usage analysis: for each call to a primitive function it computes which attributes of the returned type are used by subsequent steps. For example, the call to shopify::admin::order_created returns a tuple (Shop, Order) but not all attributes of those types are used. In particular, isn’t used by this workflow. It wouldn’t only be inefficient to hydrate that value, in the presence of recursive definitions (such as Order has a Customer who has Orders), it would be impossible to determine where to stop digging into the type graph. The result of usage analysis is then passed at runtime to the host function implementation. The runtime can use it to tailor how it computes the values it returns, for instance, by optimizing the queries to the Admin GraphQL API.

Second, Maestro performs a compilation step. The idea is to apply optimizations, removing anything unnecessary for the runtime execution of the function, such as type definitions and auxiliary functions that aren’t invoked by the workflow function. The result is a simplified, small, and efficient Maestro function. The compiled function is then packaged together with the result of usage analysis and becomes an orchestration. Finally, the orchestration is serialized and deployed to the Flow Engine that observes events and runs the Maestro interpreter on the orchestration.

Monitoring Executions

As the Flow Engine executes orchestrations, the Maestro interpreter emits checkpoints. As we discussed before, checkpoints are used by the engine when restarting the interpreter to ensure at-least-once semantics for actions. Additionally, checkpoints are sent back to Flow to feed the Activity page, which lists workflow executions. Since checkpoints have detailed information about the output of every primitive function call, they can be used to map back to the originating workflow step and offer insight into the behavior of executions.

A screenshot of the Flow run log from the Activity Page. It displays the results of the workflow execution at each step
Shopify Flow Run Log from the Activity page

For instance, the image above shows a Run Log for a specific execution of the example, which can be accessed from the Activity page. Note that Flow highlights the branch of the workflow that executed and which branch of the condition disjunction actually evaluated to true at runtime. All this information comes directly from interpreting checkpoints and mapping back to the workflow.

Outro: Future Work

Largo maestoso

In this post I introduced Maestro, a domain-specific orchestration language we developed to power Shopify Flow. I gave a sample of what Maestro looks like and how it neatly integrates with Flow, supporting features of both the Flow Editor as well as the Flow Engine. Maestro has been powering Flow for a while, but we are planning more, such as:

  • Improving the expressiveness of the Flow workflow language, making better use of all the capabilities Maestro offers. For example, allowing the definition of variables to bind the result of actions for subsequent use, support for iteration, pattern matching, and so on.
  • Implementing additional optimizations on deployment, such as merging Flow workflows as a single orchestration to avoid redundant hydration calls for the same event.
  • Using the Maestro interpreter to support previewing and testing of Flow workflows, employing checkpoints to show results and verify assertions.

If you are interested in working with Flow and Maestro or building systems from the ground up to solve real-world problems. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

Continue reading

React Native Skia—For Us, For You, and For Fun

React Native Skia—For Us, For You, and For Fun

Right now, you are likely reading this content on a Skia surface. It powers Chrome, Android, Flutter, and others. Skia is a cross-platform drawing library that provides a set of drawing primitives that you can trust to run anywhere: iOS, Android, macOS, Windows, Linux, the browser, and now React Native.

Our goal with this project is twofold. First we want to provide React Native, which is notorious for its limited graphical capabilities, with a set of powerful 2D drawing primitives that are consistent across iOS, Android, and the Web.  Second is to bridge the gap between graphic designers and React Native: by providing the same UI capabilities as a tool like Figma. Everyone can now speak the same language.

Skia logo image. The background is back and Skia is displayed in cursive rainbow font in the middle of the screen
React Native Skia logo

To bring the Skia library to React Native, we needed to rely on the new React Native architecture, JavaScript Interface (JSI). This new API enables direct communication between JavaScript and native modules using C++ instead of asynchronous messages between the two worlds. JSI allows us to expose the Skia API directly in the following way:

We are making this API virtually 100% compatible with the Flutter API allowing us to do two things:

  1. Leverage the completeness and conciseness of their drawing API
  2. Eventually provide react-native-web support for Skia using CanvasKit, the Skia WebAssembly build used by Flutter for its web apps.

React is all about declarative UIs, so we are also providing a declarative API built on top of the imperative one. The example above can also be written as:

This API allows us to provide an impressive level of composability to express complex drawings, and it allows us to perform declarative optimizations. We leverage the React Reconciler to perform the work of diffing the internal representation states, and we pass the differences through to the Skia engine.

React Native Skia offers a wide range of APIs such as advanced image filters, shaders, SVG, path operations, vertices, and text layouts. The demo below showcases a couple of drawing primitives previously unavailable in the React Native ecosystem. Each button contains a drop and inner shadow, the progress bar is rendered with an angular gradient,  and the bottom sheet uses a backdrop filter to blur the content below it.

Below is an example of mesh gradients done using React Native Skia

Reanimated 2 (a project also supported by Shopify) brought to life the vision of React Native developers writing animations directly in JavaScript code by running it on a dedicated thread. Animations in React native Skia work the same way. Below is an example of animation in Skia:

Example of the Breathe code animated

If your drawing animation depends on an outside view, like a React Native gesture handler, for instance, we also provide a direct connector to Reanimated 2.

With React Native Skia, we expect to be addressing a big pain point of the React Native community. And it is safe to say that we are only getting started. We are working on powerful new features which we cannot wait to share with you in the upcoming months. We also cannot wait to see what you build with it. What are you waiting for!? npm install @shopify/react-native-skia.

Christian Falch has been involved with React Native since 2018, both through open source projects and his Fram X consultancy. He has focused on low-level React Native coding integrating native functionality with JavaScript and has extensive experience writing C++ based native modules.

William Candillon is the host of the “Can it be done in React Native?” YouTube series, where he explores advanced user-experiences and animations in the perspective of React Native development. While working on this series, William partnered with Christian to build the next-generation of React Native UIs using Skia.

Colin Gray is a Principal Developer of Mobile working on Shopify’s Point of Sale application. He has been writing mobile applications since 2010, in Objective-C, RubyMotion, Swift, Kotlin, and now React Native. He focuses on stability, performance, and making witty rejoinders in engineering meetings. Reach out on LinkedIn to discuss mobile opportunities at Shopify!

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Data Is An Art, Not Just A Science—And Storytelling Is The Key

Data Is An Art, Not Just A Science—And Storytelling Is The Key

People often equate data science with statistics, but it’s so much more than that. When data science first emerged as a craft, it was a combination of three different skill sets: science, mathematics, and art. But over time, we’ve drifted. We’ve come to prioritize the scientific side of our skillset and have lost sight of the creative part.

A venn diagram with three circles: Math, Art, and Science. In the middle is Data Science
A Venn diagram of the skills that make up the craft of data science

One of the most neglected, yet arguably most important, skills from the artistic side of data science is communication. Communication is key to everything we do as data scientists. Without it, our businesses won’t be able to understand our work, let alone act on it.

Being a good data storyteller is key to being a good data scientist. Storytelling captures your stakeholders’ attention, builds trust with them, and invites them to fully engage with your work. Many people are intimidated by numbers. By framing a narrative for them, you create a shared foundation they can work from. That’s the compelling promise of data storytelling.

Data science is a balancing act—math and science have their role to play, but so do art and communication. Storytelling can be the binding force that unites them all. In this article, I’ll explore how to tell an effective data story and illustrate with examples from our practice at Shopify. Let’s dive in.

What Is Data Storytelling?

When you Google data storytelling, you get definitions like: “Data storytelling is the ability to effectively communicate insights from a dataset using narratives and visualizations.” And while this isn’t untrue, it feels anemic. There’s a common misconception that data storytelling is all about charts, when really, that’s just the tip of the iceberg.

Even if you design the most perfect visualization in the world—or run a report, or create a dashboard—your stakeholders likely won’t know what to do with the information. All of the burden of uncovering the story and understanding the data falls back onto them.

At its core, data storytelling is about taking the step beyond the simple relaying of data points. It’s about trying to make sense of the world and leveraging storytelling to present insights to stakeholders in a way they can understand and act on. As data scientists, we can inform and influence through data storytelling by creating personal touch points between our audience and our analysis. As with any good story, you need the following key elements:

  1. The main character: Every story needs a hero. The central figure or “main character” in a data story is the business problem. You need to make sure to clearly identify the problem, summarize what you explored when considering the problem, and provide any reframing of the problem necessary to get deeper insight.
  2. The setting: Set the stage for your story with context. What background information is key to understanding the problem? You're not just telling the story; you're providing direction for the interpretation, ideally in as unbiased a way as possible. Remember that creating a data story doesn’t mean shoe-horning data into a preset narrative—as data scientists, it’s our job to analyze the data and uncover the unique narrative it presents.
  3. The narrator: To guide your audience effectively, you need to speak to them in a way they understand and resonate with. Ideally, you should communicate your data story in the language of the receiver. For example, if you’re communicating to a non-technical audience, try to avoid using jargon they won’t be familiar with. If you have to use technical terms or acronyms, be sure to define them so you’re all on the same page.
  4. The plot: Don’t leave your audience hanging—what happens next? The most compelling stories guide the reader to a response and data can direct the action by providing suggestions for next steps. By doing this, you position yourself as an authentic partner, helping your stakeholders figure out different approaches to solving the problem.

Here’s how this might look in practice on a sample data story:


Main Character Setting Narrator Plot
Identify the business question you're trying to solve. What background information is key to understanding the problem. Ensure you're communicating in a way that your audience will understand. Use data to direct the action by providing next steps.
Ex. Why aren't merchants using their data to guide their business decisions? Ex. How are they using existing analytic products and what might be preventing use? Ex. Our audience are busy execs who prefer short bulleted lists in a Slack message. Ex. Data shows merchants spend too much time going back and forth between their Analytics and Admin page. We recommend surfacing analytics right within their workflow.


With all that in mind, how do you go about telling effective data stories? Let me show you.

1. Invest In The Practice Of Storytelling

In order to tell effective data stories, you need to invest in the right support structures. First of all, that means laying the groundwork with a strong data foundation. The right foundation ensures you have easy access to data that is clean and conformed, so you can move quickly and confidently. At Shopify, our data foundations are key to everything we do—it not only supports effective data storytelling, but also enables us to move purposefully during unprecedented moments.

For instance, we’ve seen the impact data storytelling can have while navigating the pandemic. In the early days of COVID-19, we depended on data storytelling to give us a clear lens into what was happening, how our merchants were coping, and how we could make decisions based on what we were seeing. This is a story that has continued to develop and one that we still monitor to this day.

Since then, our data storytelling approach has continued to evolve internally. The success of our data storytelling during the pandemic was the catalyst for us to start institutionalizing data storytelling through a dedicated working group at Shopify. This is a group for our data scientists, led by data scientists—so they fully own this part of our craft maturity.

Formalizing this support network has been key to advancing our data storytelling craft. Data scientists can drop in or schedule a review of a project in process. This group also provides feedback and informed guidance on how to improve the story that the analysis is trying to tell so communications back to stakeholders is most impactful. The goal is to push our data scientists to take their practice to the next level—by providing context, explaining what angles they already explored, offering ways to reframe the problem, and sharing potential next steps.

Taking these steps to invest in the practice of data storytelling ensures that when our audience receives our data communications, they’re equipped with accurate data and useful guidance to help them choose the best course of action. By investing in the practice of data storytelling, you too can ensure you’re producing the highest quality work for your stakeholders—establishing you as a trusted partner.

2. Identify Storytelling Tools And Borrow Techniques From The Best

Having the right support systems in place is key to making sure you’re telling the right stories—but how you tell those stories is just as important. One of our primary duties as data scientists is decision support. This is where the art and communication side of the practice comes in. It's not just a one-and-done, "I built a dashboard, someone else can attend to that story now." You’re committed to transmitting the story to your audience. The question then becomes, how can you communicate it as effectively as possible, both to technical and non-technical partners?

At Shopify, we’ve been inspired by and have adopted design studio Duarte’s Slidedocs approach. Slidedocs is a way of using presentation software like PowerPoint to create visual reports that are meant to be read, not presented. Unlike a chart or a dashboard, what the Slidedoc gives you is a well-framed narrative. Akin to a “policy brief” (like in government), you can pack a dense amount of information and visuals into an easily digestible format that your stakeholders can read at their leisure. Storytelling points baked into our Slidedocs include:

  • The data question we’re trying to answer
  • A description of our findings
  • A graph or visualization of the data 
  • Recommendations based on our findings
  • A link to the in-depth report
  • How to contact the storyteller
A sample slidedoc example from Shopify Data. It highlights information for a single question by describing findings and recommendations. It also links out to the in-depth report
An example of how to use Slidedocs for data storytelling

Preparing a Slidedoc is a creative exercise—there’s no one correct way to present the data, it’s about understanding your audience and shaping a story that speaks to them. What it allows us to do is guide our stakeholders as they explore the data and come to understand what it’s communicating. This helps them form personal touchpoints with the data, allowing them to make a better decision at the end.

While the Slidedocs format is a useful method for presenting dense information in a digestible way, it’s not the only option. For more inspiration, you can learn a lot from teams who excel at effective communication, such as marketing, PR, and UX. Spend time with these teams to identify their methods of communication and how they scaffold stories to be consumed. The important thing is to find tools that allow you to present information in a way that’s action-oriented and tailored for the audience you’re speaking to.

3. Turn Storytelling Into An Experience

The most effective way to help your audience feel invested in your data story is to let them be a part of it. Introducing interactivity allows your audience to explore different facets of the story on demand, in a sense, co-creating the story with you. If you supply data visualizations, consider ways that you can allow your audience to filter them, drill into certain details, or otherwise customize them to tell bigger, smaller, or different stories. Showing, not telling, is a powerful storytelling technique.

A unique way we’ve done this at Shopify is through a product we created for our merchants that lets them explore their own data. Last fall, we launched the BFCM 2021 Notebook—a data storytelling experience for our merchants with a comprehensive look at their store performance over Black Friday and Cyber Monday (BFCM).

While we have existing features for our merchants that show, through reports and contextual analytics, how their business is performing, we wanted to take it to the next level by giving them more agency and a personal connection to their own data. That said, we understand it can be overwhelming for our merchants (or anyone!) to have access to a massive set of data, but not know how to explore it. People might not know where to start or feel scared that they’ll do it wrong.

Example BFCM 2021 Notebook. The notebook shows a graph of the sales over time during BFCM weekend 2021
Shopify’s BFCM Notebook

What the BFCM Notebook provided was a scaffold to support merchants’ data exploration. It’s an interactive visual companion that enables merchants to dive into their performance data (e.g. total sales, top-performing products, buyer locations) during their busiest sale season. Starting with total sales, merchants could drill into their data to understand their results based on products, days of the week, or location. If they wanted to go even deeper, they could click over the visualizations to see the queries that powered them—enabling them to start thinking about writing queries of their own.

Turning data storytelling into an experience has given our merchants the confidence to explore their own data, which empowers them to take ownership of it. When you’re creating a data story, consider: Are there opportunities to let the end user engage with the story in interactive ways?

Happily Ever After

Despite its name, data science isn’t just a science; it’s an art too. Data storytelling unites math, science, art, and communication to help you weave compelling narratives that help your stakeholders comprehend, reflect on, and make the best decisions about your data. By investing in your storytelling practice, using creative storytelling techniques, and including interactivity, you can build trust with your stakeholders and increase their fluency with data. The creative side of data science isn’t an afterthought—it’s absolutely vital to a successful practice.

Wendy Foster is the Director of Engineering & Data Science for Core Optimize at Shopify. Wendy and her team are focused on exploring how to better support user workflows through product understanding, and building experiences that help merchants understand and grow their business.

Are you passionate about data discovery and eager to learn more, we’re always hiring! Visit our Data Science and Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.

Continue reading

Building a Business System Integration and Automation Platform at Shopify

Building a Business System Integration and Automation Platform at Shopify

Companies organize and automate their internal processes with a multitude of business systems. Since companies function as a whole, these systems need to be able to talk to one another. At Shopify, we took advantage of Ruby, Rails, and our scale with these technologies to build a business system integration solution.

The Modularization of Business Systems

In step with software design’s progression from monolithic to modular architecture, business systems have proliferated over the past 20 years, becoming smaller and more focused. Software hasn’t only targeted the different business domains like sales, marketing, support, finance, legal, and human resources, but the niches within or across these domains, like tax, travel, training, documentation, procurement, and shipment tracking. Targeted applications can provide the best experience by enabling rapid development within a small, well defined space.

The Gap

The transition from monolithic to modular architecture doesn’t remove the need for interaction between modules. Maintaining well-defined, versioned interfaces and integrating with other modules is one of the biggest costs of modularization. In the business systems space, however, it doesn’t always make sense for vendors to take responsibility for integration, or do it in the same way.

Business systems are built on different tech stacks with different levels of competition and different customer requirements. This landscape leads to business systems with asymmetric interfaces (from SOAP to FTP to GraphQL) and integration capabilities (from complete integration platforms to nothing). Businesses are left with a gap between their systems and no clear, easy way to fill it.

Organic Integration

Connecting these systems on an as needed basis leads to a hacky hodgepodge of:

  • ad hoc code (often running on individual’s laptops)
  • integration platforms like Zapier
  • users downloading and uploading CSVs
  • third party integration add ons from app stores
  • out of the box integrations
  • custom integrations built on capable business systems.

Frequently data won’t be going from the source system directly to the target system, but has multiple layovers in whatever systems it could integrate with. The only determining factors are the skillsets and creativity of the people involved in building the integration.

When a company is small this can work, but as companies scale and the number of integrations grow it becomes unmanageable: data flows are convoluted, raising security questions, and making business critical automation fragile. Just like with monolithic architecture it can become too terrifying and complex to change anything, paralyzing the business systems and preventing them from adapting and scaling to support the company.

Integration Platform as a Service

The solution, as validated by the existence of numerous Integration Platform as a Service (IPaaS) solutions like Mulesoft, Dell Boomi, and Zapier, is yet another piece of software that’s responsible for integrating business systems. The consistency provided by using one application for all integration can solve the issues of visibility, fragility, reliability, and scalability.


At Shopify, we ran into this problem, created a small team of business system integration developers and put them to work building on Mulesoft. This was an improvement, but, because Shopify is a software company, it wasn’t perfect.

Isolation from Shopify Development

Shopify employs thousands of developers. We have infrastructure, training, and security teams. We maintain a multitude of packages and have tons of Slack channels for getting help, discussing ideas, and learning about best practices. Shopify is intentionally narrow in the technologies it uses (Ruby, React, and Go) to benefit from this scale.

Mulesoft is a proprietary platform leveraging XML configuration for the Java virtual machine. This isn’t part of Shopify’s tech stack, so we missed out on many of the advantages of developing at Shopify.

Issues with Integrating Internal Applications

Mulesoft’s cloud runtime takes care of infrastructure for its users, a huge advantage of using the platform. However, Shopify has a number of internal services, like shipment tracking, as well as infrastructure, like Kafka, that for security reasons can only be used from within Shopify’s cloud. This meant that we would need to build infrastructure skills on our team to host Mulesoft on our own cloud.

Although using Mulesoft initially seemed to lower the costs of connecting business systems, due to our unique situation, it had more drawbacks than developing on Shopify’s tech stack.

Building on Shopify’s Stack

Unless performance is paramount, in which case we use Go, Ruby is Shopify’s choice for backend development. Generally Shopify uses the Rails framework, so if we’re going to start building business system integrations on Shopify’s tech stack, Ruby on Rails is our choice. The logic for choosing Ruby on Rails within the context of development at Shopify is straightforward, but how do we use it for business system integration?

The Design Priorities

When the platform is complete, we want to build reliable integrations quickly. To turn that idea into a design, we need to look at the technical aspects of business system integration that differentiate it from the standard application development Rails is designed around.


Generally applications are structured around a domain and get to determine the requirements, the data they will and won’t accept. An integration, however, isn’t the source of truth for anything. Any validation we introduce in an integration will be, at best, a duplication of logic in the target application. At worst our logic will create erroneous errors.

I did this the other day with a Sorbet Struct. I was using it to organize data before posting it. Unfortunately a field was required in the struct that wasn’t required in the target system. This resulted in records failing in transit when the target system would have accepted them.


Many business systems are highly configurable. Changes in their configuration can lead to changes in their APIs, affecting integrations.

Airtable, for example, uses the column names as the JSON keys in their API, so changing a column name in the user interface can break an integration. We need to provide visibility into exactly what integrations are doing to help system admins avoid creating errors and quickly resolve them when they arise.


Business systems are diverse, created at different times by different developers using different technologies and design patterns. For integration work this⁠—most importantly⁠—leads to a wide variety of interfaces like FTP, REST, SOAP, JSON, XML, and GraphQL. If we want a centralized, standardized place to build integrations it needs to support whatever requirements are thrown at it.


Integrations deal with sensitive information, personally identifiable information (PII), compensation, and anything else that needs to move between business systems. We need to make sure that we aren’t exposing this data.


Small, point to point integrations are the most reliable and maintainable. This design has the potential to create a lot of duplicate code and infrastructure. If we want to build integrations quickly we need to reuse as much as possible.


Those are some nice high-level design priorities. How did we implement them?


From the beginning of the project, documentation has been a priority. We document

  • decisions that we’re making, so they’re understood and challenged in the future as needs change
  • the integrations living on our platform
  • the clients we’ve implemented for connecting to different systems and how to use them
  • how to build on the platform as a whole.

Initially we were using GitHub’s built-in wiki, but being able to version control our documentation and commit updates alongside the code made it easier to trace changes and ensure documentation was being kept up to date. Fortunately Shopify’s infrastructure makes it very easy to add a static site to a git repository.

Design priorities covered: transparency, reusability

Language Features

Ruby is a mature, feature-rich language. Beyond being Turing complete, over the years it’s added a plethora of features to make programming simpler and more concise. It also has an extensive package ecosystem thanks to Ruby’s wide usage, long life, and generous community. In addition to reusing our code, we’re able to leverage other developer’s and organization’s code. Many business systems have great, well-maintained gems, so integrating with them is as simple as adding the gem and credentials.

Design priorities covered: reusability

Rails Engines

We reused Shopify Core’s architecture, designing our application as a modular monolith made up of Rails Engines. Initially the application didn’t take advantage of Rails Engines and simply used namespaces within the app directory. It quickly became apparent that this model made tracking down an individual integration’s code difficult. You have to go through every one of the app directories, controllers, helpers, and more to see if an integration’s namespace was present.

After a lot of research and a few conversations with my Shopify engineering mentor, I began to understand Rails Engines. Rails engines are a great fit for our platform because integrations have relatively obvious boundaries, so it’s easy and advantageous to modularize them.

This design enabled us to reuse the same infrastructure for all our integrations. It also enabled us to share code across integrations by creating a common Rails Engine, without the overhead of packaging it up into rubygems or duplicating it. This reduces both development and maintenance costs.

In addition, this architecture benefitted transparency by keeping all of the code in one place and modularizing it. It’s easy to know what integrations exist and what code belongs to them.

Design priorities covered: reusability, transparent

Eliminating Data Storage

Our business system integration platform won’t be the source of truth for any business data. The business data comes from other business systems and passes through our application.

If we start storing data in our application it can become stale, out of sync with the source of truth. We could end up sending stale data to other systems and triggering the wrong processes. Tracking this all down requires digging through databases, logs, and timestamps in multiple systems, some without good visibility.

Data storage adds complexity, hurts transparency, and introduces security and compliance concerns.

Design priorities covered: transparent, minimal, secure


Business system integration consists almost entirely of business logic. In Rails, there are multiple places this could live, but they generally involve abstractions designed around building standalone applications, not integrations. Using one of these abstractions would add complexity and obfuscate the logic.

Actions were floating around Shopify as a potential home for business logic. They have the same structure as Active Jobs, one public method, perform, and don’t reference any other Actions. The Action concept provides consistency, making all integration logic easy to find. It also provides transparency by putting all business logic in one place, so it’s only necessary to look at one Action to understand a data flow.

One of the side effects of Actions is code duplication. This was a trade-off we accepted. Given that integrations should be acting independently, we would prefer to duplicate some code than tightly couple integrations.

Design priorities covered: transparent, minimal

Embracing Hashes

Dataflows are the purpose of our application. In every integration we are dealing with at least two API abstractions of complex systems. Introducing our own abstractions on top of these abstractions can quickly compound complexity. If we want the application to be transparent, it needs to be obvious what data is flowing through it and how the data is being modified.

Most of the data we’re working with is JSON. In Ruby, JSON is represented as a hash, so working with hashes directly often provides the best transparency with the least room for introducing errors.

I know, I know. We all hate to see strings in our code, but hear me out. You receive a JSON payload. You need to transform it and send out another JSON payload with different keys. You could map the original payload to an object, map that object to another object, and map the final object back to JSON. If you want to track that transformation, though, you need to track it through three transformations. On the other hand, you could use a hash and a transform function and have the mapping clearly displayed.

Using hashes leads to more transparency than abstracting them away, but it also can lead to typos and therefore errors, so it’s important to be careful. If you’re using a string key multiple times, turn it into a constant.

Design priorities covered: transparent, minimal

Low-level Mocking

At Shopify, we generally use Mocha for mocking, but for our use case we default to WebMock. WebMock mocks at the request level, so you see the URL, including query parameters, headers, and request body explicitly in tests. This makes it easy to work directly with business systems API documentation because this is the level it’s documented at, and it allows us to understand exactly what our integrations are doing.

There are some cases, though, where we use Mocha, for example with SOAP. Reading a giant XML text string doesn’t provide useful visibility into what data is being sent. WebMock tests also become complex when many requests are involved in the integration. We’re working on improving the testing experience for complex integrations with common factories and prebuilt WebMocks.

Design priorities covered: transparent


Perhaps most importantly, we’ve been able to tap into development at Shopify by leveraging our:

  • infrastructure, so all we have to do to stand up an application or add a component is run dev runtime
  • training team to help onboard our developers
  • developer pipeline for hiring
  • observability through established logging, metrics and tracing setups
  • internal shipment tracking service
  • security team standards and best practices

The list could go on forever.

Design priorities covered: reusability, security

It’s been a year since work on our Rails integration platform began. Now, we have 18 integrations running, have migrated all our Mulesoft apps to the new platform, have doubled the number of developers from one to two and have other teams building integrations on the platform. The current setup enables us to build simple integrations, the majority of our use case, quickly and securely with minimal maintenance. We’re continuing to work on ways to minimize and simplify the development process, while supporting increased complexity, without harming transparency. We’re currently focused on improving test mock management and the onboarding process and, of course, building new integrations.

Will is a Senior Developer on the Solutions Engineering Team. He likes building systems that free people to focus on creative, iterative, connective work by taking advantage of computers' scalability and consistency.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Möbius: Shopify’s Unified Edge

Möbius: Shopify’s Unified Edge

While working on improvements to the Shopify platform in terms of how we handle traffic, we identified that through time each change became more and more challenging. When deploying a feature, if we wanted the whole platform to benefit from it, we couldn’t simply build it “one way fits all”—we already had more than six different ways for the traffic to reach us. To list some, we had a traffic path for:

  • “general population” (GenPop) Shopify Core: the monolith serving shops that’s already using an edge provider
  • GenPop Shopify applications and services
  • GenPop Shopify applications and services that’s using an edge provider
  • Personally identifiable information (PII) restricted Shopify Core: where we need to make sure that the traffic and related data are kept in a specific area of the world
  • PII restricted Shopify applications and services
  • publicly-accessible APIs that are required for Mutual Transport Layer Security (mTLS) authentication.
LOTR Meme stating: "How many ways are there for traffic to reach Shopify?"

We had to choose which traffic path could use those new features or build the same feature in more than six different ways. Moreover, many different traffic paths means that observability gets blurred. When we receive requests and something doesn’t work properly, figuring out why and how to fix it requires more time. It also takes longer to onboard new team members to all of those possibilities and how to distinguish between them.

LOTR meme image of Frodo holding the ring stating: One edge to reach them all, one edge to secure them; One edge to bring them all and in the clusters bind them.
One edge to reach them all, one edge to secure them; One edge to bring them all and in the clusters bind them.

This isn’t the way to build a highway to new features and improvements. I’d like to tell you why and how we built the one edge to front our services and systems.

One Does Not Simply Know What “The Edge” Stands For

LOTR meme image of Aragorn stating: One does not simply know what "The Edge" stands for.
One does not simply know what "The Edge" stands for.

The most straightforward definition of the edge, or network edge, is the point at which an enterprise-owned network connects to a third-party network. With cloud computing, lines are slightly blurred as we use third parties to provide us with servers and networks (even more when using a provider to front your network, like we do at Shopify). But in both those cases, as long as they’re used and controlled by Shopify, they’re considered part of our network.

The edge of Shopify is where requests from outside our network are made to reach our network.

The Fellowship of the Edge

Unifying our edge became our next objective and two projects were born to make this possible: Möbius, which as the name taken from the “Möbius strip” suggests, was to be the one edge of Shopify and Shopify Front End (SFE), the routing layer that receives traffic from Möbius and dispatches it to where it needs to go.

A flow diagram showing Möbius’s traffic path that takes requests from the internet to the routing layer and then sends traffic to the application’s clusters for traffic to be served. Purple entities are on the traffic path for PII restricted traffic, while the beige ones are for the GenPop traffic.
Möbius’s traffic path takes requests from the internet to the routing layer and then sends traffic to the application’s clusters for traffic to be served. Purple entities are on the traffic path for PII restricted traffic, while the beige ones are for the GenPop traffic.

About a year before starting Möbius, we already had a small number of applications handled through our edge, but we saw limitations in terms of how to properly automate such an approach at scale, while the gains to the platform justified the monetary costs to reach those gains. We designed SFE and Möbius together, leading to a better separation of concerns between the edge and the routing layers.

The Shopify Front End

SFE is designed to provide a unified routing layer behind Möbius. Deployed in many different regions, routing clusters can receive any kind of web traffic from Möbius, whether for Shopify Core or Applications. Those clusters are mainly nginx deployments with custom Lua code to handle the routing according to a number of criteria, including but not limited to the IP address a client connected to and the domain that was used to reach Shopify. For the PII restricted requirements, parallel deployments of the same routing clusters code are deployed in the relevant regions.

To handle traffic for applications and services, SFE works by using a centralized API receiving requests from Kubernetes controllers deployed in every cluster using such applications and services. This allows linking the domain names declared by an application to the clusters where the application is deployed. We also use this to provide active/active (when two instances of a given service can receive requests at the same time) or active/passive (when only a single instance of a given service can receive requests) load balancing.

Providing load balancing at the routing layer instead of DNS allows for near instantaneous traffic changes instead of depending on the Time to Live as described in my previous post. It avoids those decisions being made on the client side and thus provides us with better command and control over the traffic.


Möbius’s core concerns are simple: we grab the traffic from outside of Shopify and make sure it makes its way inside of Shopify in a stable, secure, and performant manner. Outside of Shopify is any client connecting to Shopify from outside a Shopify cluster. Inside of Shopify is, as far as Möbius is concerned, the routing cluster with the lowest latency to the receiving edge’s point-of-presence (PoP).

Möbius is responsible for TLS and TCP termination with the clients, and doing that termination as close as possible to the client. It brings faster requests and better DDoS protection, plus it allows us to filter malicious requests before the traffic even reaches our clusters. This is something that was already done for our GenPop Shopify Core traffic, but Möbius now standardizes. On top of handling the certificates for the shops, we added an automated path to handle certificates for applications domains.

A flow diagram showing the configuration of the edge with Möbius and SFE. Domains updates are intercepted to update the edge provider’s domains and certificates store, making sure that we’re able to terminate TCP and TLS for those domains and let the request follow its path
Configuration of the edge with Möbius and SFE. Domains updates are intercepted to update the edge provider’s domains and certificates store, making sure that we’re able to terminate TCP and TLS for those domains and let the request follow its path

SFE already needs to be aware of the domains that the applications respond to, so instead of building the same logic a second time to configure the edge, we piggybacked on the work the SFE controller was already doing. We added handlers in the centralized API to configure those domains at the edge, through API requests to our vendor, and indicate we’re expecting to receive traffic on those, and to forward requests to SFE. Our API handler takes care of each and any DNS challenge to validate that we own the domain in order for the traffic to start flowing, but also obtains a valid certificate.

Prior to Möbius, if an application owner wanted to take advantage of the edge, they had to configure their domain manually at the edge (validating ownership, obtaining a certificate, setting up the routing), but Möbius provides full automation of that setup, allowing application owners to simply configure their ingress and DNS and ripe the benefits of the edge right away.

Finally, it’s never easy to have many systems migrate to use a new one. We aimed to make that change as easy as possible for application owners. With automation deploying all that was required, the last required step was a simple DNS change for applications domains, from targeting a direct-to-cluster record to targeting Möbius. We wanted to keep that change manual to make sure that application owners own the process and make sure that nothing gets broken.

A screenshot of the dashboard for the application. It displays hostnames configured to serve an application and the status of it's edge
Example dashboard for the application (accessible publicly and used for debugging connectivity issues with merchants). On the dashboard, we can find a link to its edge logs, see that the domains of the application are properly configured at the edge to receive traffic, and provide a TLS certificate. A test link also allows to simulate a connection to the platform using that domain so the response can be verified manually.

To make sure all is fine for an application before (and after!) migration, we also added observability in the form of easy:

  • access to the logs for a given application at the edge
  • identification of which domains an application will have configured at the edge,
  • understanding of what is the status of those domains.

This allows owners of applications and services to immediately identify if one of their domains isn’t configured or behaving as expected.

Our Precious Edge

A drawing of Gollum's face (from Lord of the Rings) staring at a Mobius strip like it's the ring

On top of all the direct benefits that Möbius provides right away, it allows us to build the future of Shopify’s edge. Different teams are already working on improvements to the way we do caching at the edge, for instance, or on ways to use other edge features that we’re not already taking advantage of. We also have ongoing projects to handle cluster-to-cluster communications by avoiding the traffic from going through the edge and coming back to our clusters by taking advantage of SFE.

Using new edge features and standardizing internal communications is possible because we unified the edge. There are exceptions where we need to avoid cross-dependency for applications and services on which either Möbius or SFE depend to function. If we were to onboard them to use Möbius and SFE, whenever an issue would happen, we would be in a crash-loop situation: Möbius/SFE requires that application to work, but that application requires Möbius/SFE to work.

It’s now way easier to explain to new Shopifolk how traffic reaches us and what happens between a client and Shopify. There’s no need for as many conditionals in those explanations, nor as many whiteboards… but we might need more of those to explain all that we do as we grow the capabilities on our now-unified edge!

Raphaël Beamonte holds a Ph.D. in Computer Engineering in systems performance analysis and tracing, and sometimes gives lectures to future engineers, at Polytechnique Montréal, about Distributed Systems and Cloud Computing.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

Code Ranges: A Deeper Look at Ruby Strings

Code Ranges: A Deeper Look at Ruby Strings

Contributing to any of the Ruby implementations can be a daunting task. A lot of internal functionality has evolved over the years or been ported from one implementation to another, and much of it is undocumented. This post is an informal look at what makes encoding-aware strings in Ruby functional and performant. I hope it'll help you get started digging into Ruby on your own or provide some additional insight into all the wonderful things the Ruby VM does for you.

Ruby has an incredibly flexible, if not unusual, string representation. Ruby strings are generally mutable, although the core library has both immutable and mutable variants of many operations. There’s also a mechanism for freezing strings that makes String objects immutable on a per-object or per-file basis. If a string literal is frozen, the VM will use an interned version of the string. Additionally, strings in Ruby are encoding-aware, and Ruby ships with 100+ encodings that can be applied to any string, which is in sharp contrast to other languages that use one universal encoding for all its strings or prevent the construction of invalid strings.

Depending on the context, different encodings are applied when creating a string without an explicit encoding. By default, the three primary ones used are UTF-8, US-ASCII, and ASCII-8BIT (aliased as BINARY). The encoding associated with a string can be changed with or without validation. It is possible to create a string with an underlying byte sequence that is invalid in the associated encoding.

The Ruby approach to strings allows the language to adapt to many legacy applications and esoteric platforms. The cost of this flexibility is the runtime overhead necessary to consider encodings in nearly all string operations. When two strings are appended, their encodings must be checked to see if they're compatible. For some operations, it's critical to know whether the string has valid data for its attached encoding. For other operations, it's necessary to know where the character or grapheme boundaries are.

Depending on the encoding, some operations are more efficient than others. If a string contains only valid ASCII characters, each character is one byte wide. Knowing each character is only a byte allows operations like String#[], String#chr, and String#downcase to be very efficient. Some encodings are fixed width—each "character" is exactly N bytes wide. (The term "character" is vague when it comes to Unicode. Ruby strings (as of Ruby 3.1) have methods to iterate over bytes, characters, code points, and grapheme clusters. Rather than get bogged down in the minutiae of each, I'll focus on the output from String#each_char and use the term "character" throughout.) Many operations with fixed-width encodings can be efficiently implemented as character offsets are trivial to calculate. In UTF-8, the default internal string encoding in Ruby (and many other languages), characters are variable width, requiring 1 - 4 bytes each. That generally complicates operations because it's not possible to determine character offsets or even the total number of characters in the string without scanning all of the bytes in the string. However, UTF-8 is backwards-compatible with ASCII. If a UTF-8 string consists of only ASCII characters, each character will be one byte wide, and if the runtime knows that it can optimize operations on such strings the same as if the string had the simpler ASCII encoding.

Code Ranges

In general, the only way to tell if a string consists of valid characters for its associated encoding is to do a full scan of all the bytes. This is an O(n) process, and while not the least efficient operation in the world, it is something we want to avoid. Languages that don't allow invalid strings only need to do the validation once, at string creation time. Languages that ahead-of-time (AOT) compile can validate string literals during compilation. Languages that only have immutable strings can guarantee that once a string is validated, it can never become invalid. Ruby has none of those properties, so its solution to reducing unnecessary string scans is to cache information about each string in a field known as a code range.

There are four code range values:


The code range occupies an odd place in the runtime. As a place for the runtime to record profile information, it's an implementation detail. There is no way to request the code range directly from a string. However, since the code range records information about validity, it also impacts how some operations perform. Consequently, a few String methods allow you to derive the string's code range, allowing you to adapt your application accordingly.

The mappings are:

Code range
Ruby code equivalent
No representation*
str.valid_encoding? && !str.ascii_only?

Table 1:  Mapping of internal code range values to public Ruby methods.

* – Code ranges are lazily calculated in most cases. However, when requesting information about a property that a code range encompasses, the code range is calculated on demand. As such, you may pass strings around that have an ENC_CODERANGE_UNKNOWN code range, but asking information about its validity or other methods that require the code range, such as a string's character length, will calculate and cache the code range before returning a value to the caller.

Given its odd standing as somewhat implementation detail, somewhat not, every major Ruby implementation associates a code range with a string. If you ever work on a Ruby implementation's internals or a native extension involving String objects, you'll almost certainly run into working with and potentially managing the code range value.


In MRI, the code range value is stored as an int value in the object header with bitmask flags representing the values. Each of the values is mutually exclusive from one another. This is important to note because logically, every string with an ASCII-compatible encoding and consists of only ASCII characters is a valid string. However, such a string will never have a code range value of ENC_CODERANGE_VALID. You should use the ENC_CODERANGE(obj) macro to extract the code range value and then compare it against one of the defined code range constants, treating the code range constants essentially the same as an enum (e.g., if (cr == ENC_CODERANGE_7BIT) { ... }).

If you try to use the code range values as bitmasks directly, you'll have very confusing and difficult to debug results. Due to the way the masks are defined, if a string is annotated as being both ENC_CODERANGE_7BIT and ENC_CODERANGE_VALID it will appear to be ENC_CODERANGE_BROKEN. Conversely, if you try to branch on a combined mask like if (cr & (ENC_CODERANGE_7BIT | ENC_CODERANGE_VALID)) { ... }, that will include ENC_CODERANGE_BROKEN strings. This is because the four valid values are only represented by two bits in the object header. The compact representation makes efficient use of the limited space in the object header but can be misleading to anyone used to working with bitmasks to match and set attributes.

To help illustrate the point a bit better, I've ported some of the relevant C code to Ruby (see Listing 1):

Listing 1: MRI's original C code range representation recreated in Ruby.

JRuby has a very similar implementation to MRI, storing the code range value as an int compactly within the object header, occupying only two bits. In TruffleRuby, the code range is represented as an enum and stored as an int in the object's shape. The enum representation takes up additional space but prevents the class of bugs from misapplication of bitmasks.

String Operations and Code Range Changes

The object's code range is a function of both its sequence of bytes and the encoding associated with the object to interpret those bytes. Consequently, when either the bytes change or the encoding changes, the code range value has the potential to be invalidated. When such an operation occurs, the safest thing to do is to perform a complete code range scan of the resulting string. To the best of our ability, however, we want to avoid recalculating the code range when it is not necessary to do so.

MRI avoids unnecessary code range scans via two primary mechanisms. The first is to simply scan for the code range lazily by changing the string's code range value to ENC_CODERANGE_UNKNOWN. When an operation is performed that needs to know the real code range, MRI calculates it on demand and updates the cached code range with the new result. If the code range is never needed, it's never calculated. (MRI will calculate the code range eagerly when doing so is cheap. In particular, when lexing a source file, MRI already needs to examine every byte in a string and be aware of the string's encoding, so taking the extra step to discover and record the code range value is rather cheap.)

The second way MRI avoids code range scans is to reason about the code range values of any strings being operated on and how an operation might result in a new code range. For example, when working with strings with an ENC_CODERANGE_7BIT code range value, most operations can preserve the code range value since all ASCII characters stay within the 0x00 - 0x7f range. Whether you take a substring, change the casing of characters, or strip off whitespace, the resulting string is guaranteed to also have the ENC_CODERANGE_7BIT value, so performing a full code range scan would be wasteful. The code in Listing 1 demonstrates some operations on a string with an ENC_CODERANGE_7BIT code range and how the resulting string always has the same code range.

Listing 2: Changing the case of a string with an ENC_CODERANGE_7BIT code range will always result in a string that also has an ENC_CODERANGE_7BIT code range.

Sometimes the code range value on its own is insufficient for a particular optimization, in which case MRI will consider additional context. For example, MRI tracks whether a string is "single-byte optimizable." A string is single-byte optimizable if its code range is ENC_CODERANGE_7BIT or if the associated encoding uses characters that are only one-byte wide, such as is the case with the ASCII-8BIT/BINARY encoding used for I/O. If a string is single-byte optimizable, we know that String#reverse must retain the same code range because each byte corresponds to a single character, so reversing the bytes can't change their meaning.

Unfortunately, the code range is not always easily derivable, particularly when the string's code range is ENC_CODERANGE_VALID or ENC_CODERANGE_BROKEN, in which case a full code range scan may prove to be necessary. Operations performed on a string with an ENC_CODERANGE_VALID code range might result in an ENC_CODERANGE_7BIT string if the source string's encoding is ASCII-compatible; otherwise, it would result in a string with an ENC_CODERANGE_VALID encoding. (We've deliberately set aside the case of String#setbyte which could cause a string to have an ENC_CODERANGE_BROKEN code range value. Generally, string operations in Ruby are well-defined and won't result in a broken string.) In Listing 2, you can see some examples of operations performed against a string with an ENC_CODERANGE_VALID code range resulting in strings with either an ENC_CODERANGE_7BIT code range or an ENC_CODERANGE_VALID coderange.

Listing 3: Changing the case of a string with an ENC_CODERANGE_VALID code range might result in a string with a different code range.

Since the source string may have an ENC_CODERANGE_UNKNOWN value and the operation may not need the resolved code range, such as String#reverse called on a string with the ASCII-8BIT/BINARY encoding, it's possible to generate a resulting string that also has an ENC_CODERANGE_UNKNOWN code range. That is to say, it's quite possible to have a string that is ASCII-only but which has an unknown code range that, when operated on, still results in a string that may need to have a full code range scan performed later on. Unfortunately, this is just the trade-off between lazily computing code ranges and deriving the code range without resorting to a full byte scan of the string. To the end user, there is no difference because the code range value will be computed and be accurate by the time it is needed. However, if you're working on a native extension, a Ruby runtime's internals, or are just profiling your Ruby application, you should be aware of how a code range can be set or deferred.

TruffleRuby and Code Range Derivations

As a slight digression, I'd like to take a minute to talk about code ranges and their derivations in TruffleRuby. Unlike other Ruby implementations, such as MRI and JRuby, TruffleRuby eagerly computes code range values so that strings never have an ENC_CODERANGE_UNKNOWN code range value. The trade-off that TruffleRuby makes is that it may calculate code range values that are never needed, but string operations are simplified by never needing to calculate a code range on-demand. Moreover, TruffleRuby can derive the code range of an operation's result string without needing to perform a full byte scan in more situations than MRI or JRuby can.

While eagerly calculating the code range may seem wasteful, it amortizes very well over the lifetime of a program due to TruffleRuby's extensive reuse of string data. TruffleRuby uses ropes as its string representation, a tree-based data structure where the leaves look like a traditional C-style string, while interior nodes represent string operations linking other ropes together. (If you go looking for references to "rope" in TruffleRuby, you might be surprised to see they're mostly gone. TruffleRuby still very much uses ropes, but the TruffleRuby implementation of ropes was promoted to a top-level library in the Truffle family of language implementations, which TruffleRuby has adopted. If you use any other language that ships with the GraalVM distribution, you're also using what used to be TruffleRuby's ropes.) A Ruby string points to a rope, and a rope holds the critical string data.

For instance, on a string concatenation operation, rather than allocate a new buffer and copy data into it, with ropes we create a "concat rope" with each of the strings being concatenated as its children (see Fig.1). The string is then updated to point at the new concat rope. While that concat rope does not contain any byte data (delegating that to its children), it does store a code range value, which is easy to derive because each child rope is guaranteed to have both a code range value and an associated encoding object.

Figure 1: A sample rope for the result of “Hello “ + “François”

Moreover, rope metadata are immutable, so getting a rope's code range value will never incur more overhead than a field read. TruffleRuby takes advantage of that property to use ropes as guards in inline caches for its JIT compiler. Additionally, TruffleRuby can specialize string operations based on the code ranges for any argument strings. Since most Ruby programs never deal with ENC_CODERANGE_BROKEN strings, TruffleRuby's JIT will eliminate any code paths that deal with that code range. If a broken string does appear at runtime, the JIT will deoptimize and handle the operation on a slow path, preserving Ruby's full semantics. Likewise, while Ruby supports 100+ encodings out of the box, the TruffleRuby JIT will optimize a Ruby application for the small number of encodings it uses.

A String By Any Other Name

Often string performance discussions are centered around web template rendering or text processing. While important use cases, strings are also used extensively within the Ruby runtime. Every symbol or regular expression has an associated string, and they're consulted for various operations. The real fun comes with Ruby's metaprogramming facilities: strings can be used to access instance variables, look up methods, send messages to objects, evaluate code snippets, and more. Improvements (or degradations) in string performance can have large, cascading effects.

Backing up a step, I don't want to oversell the importance of code ranges for fast metaprogramming. They are an ingredient in a somewhat involved recipe. The code range can be used to quickly disqualify strings known not to match, such as those with the ENC_CODERANGE_BROKEN code range value. In the past, the code range was used to fail fast when particular identifiers were only allowed to be ASCII-only. While not currently implemented in MRI, such a check could be used to dismiss strings with the ENC_CODERANGE_VALID code range when all identifiers are known to be ENC_CODERANGE_7BIT, and vice versa. However, once a string passes the code range check, there's still the matter of seeing if it matches an identifier (instance variable, method, constant, etc.). With TruffleRuby, that check can be satisfied quickly because its immutable ropes are interned and can be compared by reference. In MRI and JRuby, the equality check may involve a linear pass over the string data as the string is interned. Even that process gets murky depending on whether you're working with a dynamically generated string or a frozen string literal. If you're interested in a deep dive on the difficulties and solutions to making metaprogramming fast in Ruby, Chris Seaton has published a paper about the topic and I've presented a talk about it at RubyKaigi.


More so than many other contemporary languages, Ruby exposes functionality that is difficult to optimize but which grants developers a great deal of expressivity. Code ranges are a way for the VM to avoid repeated work and optimize operations on a per-string basis, guiding away from slow paths when that functionality isn't needed. Historically, that benefit has been most keenly observed when running in the interpreter. When integrated with a JIT with deoptimization capabilities, such as TruffleRuby, code ranges can help eliminate generated code for the types of strings used by your application and the VM internally.

Knowing what code ranges are and what they're used for can help you debug issues, both for performance and correctness. At the end of the day, a code range is a cache, and like all caches, it may contain the wrong value. While such instances within the Ruby VM are rare, they're not unheard of. More commonly, a native extension manipulating strings may fail to update a string's code range properly. Hopefully, with a firm understanding of code ranges, you’ll find Ruby's handling of strings less daunting.

Kevin is a Staff Developer on the Ruby & Rails Infrastructure team where he works on TruffleRuby. When he’s not working on Ruby internals, he enjoys collecting browser tabs, playing drums, and hanging out with his family.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

Leveraging Shopify’s API to Build the Latest Marketplace Kit

Leveraging Shopify’s API to Build the Latest Marketplace Kit

In February, we released the second version of Marketplace Kit: a collection of boilerplate app code and documentation allowing third-party developers to integrate with Shopify, build selected commerce features, and launch a world-class marketplace in any channel.

Previously, we used the node app generated by the Shopify command-line interface (CLI) as a foundation. However, this approach came with two drawbacks: If changes are made to the Shopify CLI, it would affect our code and documentation. Also, we had limited control over best practices, which forced us to use the CLI's node app dependencies.

Since then, we've decoupled code from the Shopify CLI and separated the Marketplace Kit sample app into two separate apps: a full-stack admin app and a buyer-facing client app. For these, we chose dependencies that were widely used, such as Express and NextJS, to appeal to the largest number of partners possible. Open-sourced versions of the apps are publicly available for anyone to try out.

Shopify’s APIs mentioned or shown in this article:

Here are a few ways we leveraged Shopify’s APIs to create the merchant-facing Marketplace admin app for version 2.0.

Before We Get Started

Here’s a brief overview of app development at Shopify. The most popular server-side language used within the Shopify CLI, to ease the development of apps is Node JS, a server-side JavaScript runtime. That’s why we used it for the Marketplace Kit’s sample admin app. With Node JS, we use a web framework library called Express JS, chosen for reasons such as ease of use and popularity word-wide.

On the client-side of the admin and buyer apps, we use the main JavaScript frontend library at Shopify: React JS. In the buyer-side app, we chose Next JS, a framework for React JS. This mainly provides structure to the application, as well as built-in features like server-side rendering and typescript support. When sharing data between frontend and backend apps, we use GraphQL and their helper libraries for ease of integration, Apollo Client and Apollo Server.

It’s also helpful to have familiarity with some key web development concepts, such as JSON Web Token (JWT) authentication, and Vanilla JavaScript, a compiled programming language best known for being used as a scripting language for web pages.

JWT Authentication with App Bridge

Let’s start with how we chose to handle authentication in our apps to assess a working example of using an internal library to ease development within Shopify. App Bridge is a standalone library offering React component wrappers for some app actions. It provides an out-of-the box solution for embedding your app inside of the Shopify admin, Shopify POS, and Shopify Mobile. Since we're using React for our embedded channel admin app, we leveraged additional App Bridge imports to handle authentication. Here is a client-side example from the channels admin app:

The app object, which is an instance of useAppBridge, is used to pass contextual information about the embedded app. We chose to wrap the authenticatedFetch function usage inside of a function with custom auth redirecting. Notice that the authenticatedFetch import does many things under the hood. Notably, it adds two HTTP headers: Authorization with a JWT session token created on demand and X-Requested-With with XMLHttpRequest (basically narrows down the request type and improves security).

This is the server-side snippet that handles the session token. It resides in our main server file, where we define our spec-compliant GraphQL server, using an Express app as middleware. Within the configuration of our ApolloServer’s context property, you'll see how we handle the auth header:

Notice how we leverage Shopify’s Node API to decode the session token and then to load the session data, providing us with the store’s access token. Fantastic!

Quick tip: To add more stores, you can switch out the store value in .env and run the Shopify CLI's shopify app serve command!

Serving REST & GraphQL With Express

In our server-side code, we use the apollo-server-express package instead of simply using apollo-server:

The setup for a GraphQL server using the express-specific package is quite similar to how we would do it with the barebones default package. The difference is that we apply the Apollo Server instance as middleware to an Express HTTP instance with graphQLServer.applyMiddleware({ app }) (or whatever you named your instance).

If you look at the entire file, you'll see that the webhooks and routes for the Express application are added after starting the GraphQL server. The advantage of using the apollo-server-express package over apollo-server is being able to serve REST and GraphQL at the same time using Express. Serving GraphQL within Express allows us to use Node middleware for common problems like rate-limiting, security and authentication. The trade-off is using a little bit more boilerplate, but since apollo-server is a wrapper to the Express specific one, there’s no noticeable performance difference.

Check out the Apollo team’s blog post Using Express with GraphQL to read more.

Custom Client Wrappers

Here’s an example of custom API clients for data fetching from Shopify’s Node API, applying GraphQL and REST:

This allows for easier control of our request configuration, like adding custom User-Agent headers with a unique header title for the project, including its npm package version.

Although Shopify generally encourages using GraphQL, sometimes it makes sense to use REST. In the admin app, we used it for one call: getting the products listings count. There was no need to create a specific query when an HTTP-GET request contains all the information required—using GraphQL would not offer any advantage. It’s also a good example of using REST in the application, ensuring developers who use the admin app as a starting point see an example that takes advantage of both ways to fetch data depending on what’s best for the situation.

Want to Challenge Yourself?

For full instructions on getting started with Marketplace Kit, check out our official documentation. To give you an idea, here are screenshots of the embedded admin app and the buyer app, in that order, upon completion of the tutorials in the docs:

More Stories on App Development and GraphQL:

For articles aimed at partners, check out our Shopify’s Partners blog where we cover more content related to app development at Shopify.

Kenji Duggan is a Backend Developer Intern at Shopify, working on the Strategic Partners team under Marketplace Foundation. Recently, when he’s not learning something new as a web developer, he is probably working out or watching anime

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

The Magic of Merlin: Shopify's New Machine Learning Platform

The Magic of Merlin: Shopify's New Machine Learning Platform

Shopify's machine learning platform team builds the infrastructure, tools and abstracted layers to help data scientists streamline, accelerate and simplify their machine learning workflows. There are many different kinds of machine learning use cases at Shopify, internal and external. Internal use cases are being developed and used in specialized domains like fraud detection and revenue predictions. External use cases are merchant and buyer facing, and include projects such as product categorization and recommendation systems.

At Shopify we build for the long term, and last year we decided to redesign our machine learning platform. We need a machine learning platform that can handle different (often conflicting) requirements, inputs, data types, dependencies and integrations. The platform should be flexible enough to support the different aspects of building machine learning solutions in production, and enable our data scientists to use the best tools for the job.

In this post, we walk through how we built Merlin, our magical new machine learning platform. We dive into the architecture, working with the platform, and a product use case.

The Magic of Merlin

Our new machine learning platform is based on an open source stack and technologies. Using open source tooling end-to-end was important to us because we wanted to both draw from and contribute to the most up-to-date technologies and their communities as well as provide the agility in evolving the platform to our users’ needs.

Merlin’s objective is to enable Shopify's teams to train, test, deploy, serve and monitor machine learning models efficiently and quickly. In other words, Merlin enables:

  1. Scalability: robust infrastructure that can scale up our machine learning workflows
  2. Fast Iterations: tools that reduce friction and increase productivity for our data scientists and machine learning engineers by minimizes the gap between prototyping and production
  3. Flexibility: users can use any libraries or packages they need for their models

For the first iteration of Merlin, we focused on enabling training and batch inference on the platform.

Merlin Architecture

A high level diagram of Merlin’s architecture

Merlin gives our users the tools to run their machine learning workflows. Typically, large scale data modeling and processing at Shopify happens in other parts of our data platform, using tools such as Spark. The data and features are then saved to our data lake or Pano, our feature store. Merlin uses these features and datasets as inputs to the machine learning tasks it runs, such as preprocessing, training, and batch inference.

With Merlin, each use case runs in a dedicated environment that can be defined by its tasks, dependencies and required resources — we call these environments Merlin Workspaces. These dedicated environments also enable distributed computing and scalability for the machine learning tasks that run on them. Behind the scenes, Merlin Workspaces are actually Ray clusters that we deploy on our Kubernetes cluster, and are designed to be short lived for batch jobs, as processing only happens for a certain amount of time.

We built the Merlin API as a consolidated service to allow the creation of Merlin Workspaces on demand. Our users can then use their Merlin Workspace from Jupyter Notebooks to prototype their work, or orchestrate it through Airflow or Oozie.

Merlin’s architecture, and Merlin Workspaces in particular, are enabled by one of our core components—Ray.

What Is Ray?

Ray is an open source framework that provides a simple, universal API for building distributed systems and tools to parallelize machine learning workflows. Ray is a large ecosystem of applications, libraries and tools dedicated to machine learning such as distributed scikit-learn, XGBoost, TensorFlow, PyTorch, etc.

When using Ray, you get a cluster that enables you to distribute your computation across multiple CPUs and machines. In the following example, we train a model using Ray:

We start by importing the Ray package. We call ray.init() to start a new Ray runtime that can run either on a laptop/machine or connect to an existing Ray cluster locally or remotely. This enables us to seamlessly take the same code that runs locally, and run it on a distributed cluster. When working with a remote Ray cluster, we can use the Ray Client API to connect to it and distribute the work.

In the example above, we use the integration between Ray and XGBoost to train a new model and distribute the training across a Ray cluster by defining the number of Ray actors for the job and different resources each Ray actor will use (CPUs, GPUs, etc.).

For more information, details and examples for Ray usage and integrations, check out the Ray documentation.

Ray In Merlin

At Shopify, machine learning development is usually done using Python. We chose to use Ray for Merlin's distributed workflows because it enables us to write end-to-end machine learning workflows with Python, integrate it with the machine learning libraries we use at Shopify and easily distribute and scale them with little to no code changes. In Merlin, each machine learning project comes with the Ray library as part of its dependencies, and uses it for distributed preprocessing, training and prediction.

Ray makes it easy for data scientists and machine learning engineers to move from prototype to production. Our users start by prototyping on their local machines or in a Jupyter Notebook. Even at this stage, their work can be distributed on a remote Ray cluster, allowing them to run the code at scale from an early stage of development.

Ray is a fast evolving open source project. It has short release cycles and the Ray team is continuously adding and working on new features. In Merlin, we adopted capabilities and features such as:

  • Ray Train: a library for distributed deep learning which we use for training our TensorFlow and PyTorch models
  • Ray Tune: a library for experiment execution and hyperparameter tuning
  • Ray Kubernetes Operator: a component for managing deployments of Ray on Kubernetes and autoscale Ray clusters

Building On Merlin

A diagram of the user’s development journey in Merlin

A user’s first interaction with Merlin usually happens when they start a new machine learning project. Let’s walk through a user’s development journey:

  1. Creating a new project: The user starts by creating a Merlin Project where they can place their code and specify the requirements and packages they need for development
  2. Prototyping: Next, the user will create a Merlin Workspace, the sandbox where they use Jupyter notebooks to prototype on a distributed and scalable environment
  3. Moving to Production: When the user is done prototyping, they can productionize their project by updating their Merlin Project with the updated code and any additional requirements
  4. Automating: Once the Merlin Project is updated, the user can orchestrate and schedule their workflow to run regularly in production
  5. Iterating: When needed, the user can iterate on their project by spinning up another Merlin Workspace and prototyping with different models, features, parameters, etc.

Let's dive a little deeper into these steps.

Merlin Projects

The first step of each machine learning use case on our platform is creating a dedicated Merlin Project. Users can create Merlin Projects for machine learning tasks like training a model or performing batch predictions. Each project can be customized to fit the needs of the project by specifying the system-level packages or Python libraries required for development. From a technical perspective, a Merlin Project is a Docker container with a dedicated virtual environment (e.g. Conda, pyenv, etc.), which isolates code and dependencies. As the project requirements change, the user can update and change their Merlin Project to fit their new needs. Our users can leverage a simple-to-use command line interface that allows them to create, define and use their Merlin Project.

Below is an example of a Merlin Project file hierarchy:

The config.yml file allows users to specify the different dependencies and machine learning libraries that they need for their use case. All the code relevant to a specific use case is stored in the src folder.

Once users push their Merlin Project code to their branch, our CI/CD pipelines build a custom Docker image.

Merlin Workspaces

Once the Merlin Project is ready, our data scientists can use the centralized Merlin API to create dedicated Merlin Workspaces in prototype and production environments. The interface abstracts away all of the infrastructure-related logic (e.g. deployment of Ray clusters on Kubernetes, creation of ingress, service accounts) so they can focus on the core of the job.

A high level architecture diagram of Merlin Workspaces

Merlin Workspaces also allow users to define the resources required for running their project. While some use cases need GPUs, others might need more memory and additional CPUs or more machines to run on. The Docker image that was created for a Merlin Project will be used to spin up the Ray cluster in a dedicated Kubernetes namespace for a Merlin Workspace. The user can configure all of this through the Merlin API, which gives them either a default environment or allows them to select the specific resource types (GPUs, memory, machine types, etc.) that their job requires.

Here’s an example of a payload that we send the Merlin API in order to create a Merlin Workspace:

Using this payload will result in a new Merlin Workspace which will spin up a new Ray cluster with the specific pre-built Docker image of one of our models at Shopify—our product categorization model, which we’ll dive into more later on. This cluster will use 20 Ray workers, each one with 10 CPUs, 30GB of memory and 1 nvidia-tesla-t4 GPU. The cluster will be able to scale up to 30 workers.

After the job is complete, the Merlin Workspace can be shut down, either manually or automatically, and return the resources back to the Kubernetes cluster.

Prototyping From Jupyter Notebooks

Once our users have their Merlin Workspace up and running, they can start prototyping and experimenting with their code from Shopify’s centrally hosted JupyterHub environment. This environment allows them to spin up a new machine learning notebook using their Merlin Project's Docker image, which includes all their code and dependencies that will be available in their notebook.

An example of how our users can create a Merlin Jupyter Notebook

From the notebook, the user can access the Ray Client API to connect remotely to their Merlin Workspaces. They can then run their remote Ray Tasks and Ray Actors to parallelize and distribute the computation work on the Ray cluster underlying the Merlin Workspace.

This method of working with Merlin minimizes the gap between prototyping and production by providing our users with the full capabilities of Merlin and Ray right from the beginning.

Moving to Production

Once the user is done prototyping, they can push their code to their Merlin Project. This will kick off our CI/CD pipelines and create a new version of the project's Docker image.

Merlin was built to be fully integrated with the tools and systems that we already have in place to process data at Shopify. Once the Merlin Project's production Docker image is ready, the user can build the orchestration around their machine learning flows using declarative YAML (yet another markup language) templates or by configuring a DAG (directed Acyclic Graph) in our Airflow environment. The jobs can be scheduled to run periodically, call the production Merlin API to spin up Merlin Workspaces and run Merlin jobs on them.

A simple example of an Airflow DAG running a training job on Merlin

The DAG in the image above demonstrates a training flow, where we create a Merlin Workspace, train our model on it and—when it’s done—delete the workspace and return the resources back to the Kubernetes cluster.

We also integrated Merlin with our monitoring and observability tools. Each Merlin Workspace gets its own dedicated Datadog dashboard which allows users to monitor their Merlin job. It also helps them understand more about the computation load of their job and the resources it requires. On top of this, each Merlin job sends its logs to Splunk so that our users can also debug their job based on the errors or stacktrace.

At this point, our user's journey is done! They created their Merlin Project, prototyped their use case on a Merlin Workspace and scheduled their Merlin jobs using one of the orchestrators we have at Shopify (e.g Airflow). Later on, when the data scientist needs to update their model or machine learning flow, they can go back to their Merlin Project to start the development cycle again from the prototype phase.

Now that we explained Merlin's architecture and our user journey, let's dive into how we onboarded a real-world algorithm to Merlin—Shopify’s product categorization model.

Onboarding Shopify’s Product Categorization Model to Merlin

A high level diagram of the machine learning workflow for the Product Categorization model

Recently we rebuilt our product categorization model to ensure we understand what our merchants are selling, so we can build the best products that help power their sales. This is a complex use case that requires several workflows for its training and batch prediction. Onboarding this use case to Merlin early on enabled us to validate our new platform, as it requires large scale computation and includes complex machine learning logic and flows. The training and batch prediction workflows were migrated to Merlin and converted using Ray.

Migrating the training code

To onboard the product categorization model training stage to Merlin, we integrated its Tensorflow training code with Ray Train, for distributing training across a Ray cluster. With Ray Train, changing the code to support the distributed training was easy and required few code changes - the original logic stayed the same, and the core changes are described in the example below.

The following is an example of how we integrated Ray Train with our Tensorflow training code for this use-case:

The TensorFlow logic for the training step stays the same, but is separated out into its own function. The primary change is adding Ray logic to the main function. Ray Train allows us to specify the job configuration, with details such as number of workers, backend type, and GPU usage.

Migrating inference

The inference step in the product categorization model is a multi-step process. We migrated each step separately, using the following method. We used Ray ActorPool to distribute each step of batch inference across a Ray cluster. Ray ActorPool is similar to Python's `multiprocessing.Pool` and allows scheduling Ray tasks over a fixed pool of actors. Using Ray ActorPool is straightforward and allows easy configuration for parallelizing the computation.

Here’s an example of how we integrated Ray ActorPool with our existing inference code to perform distributed batch predictions:

We first need to create our Predictor class (a Ray Actor), which includes the logic for loading the product categorization model, and performing predictions on product datasets. In the main function, we use the size of the cluster (ray.available_resources()["CPU"]) to create all the Actors that will run in the ActorPool. We then send all of our dataset partitions to the ActorPool for prediction.

While this method works for us at the moment, we plan to migrate to using Ray Dataset Pipelines which provides a more robust way to distribute the load of the data and perform batch inference across the cluster with less dependence on the number of data partitions or their size.

What's next for Merlin

As Merlin and its infrastructure mature, we plan to continue growing and evolving to better support the needs of our users. Our aspiration is to create a centralized platform that streamlines our machine learning workflows in order to enable our data scientists to innovate and focus on their craft.

Our next milestones include:

  • Migration: Intention to migrate all of Shopify’s machine learning use cases and workflows to Merlin and adding a low code framework to onboard new use cases
  • Online inference: Support real time serving of machine learning models at scale
  • Model lifecycle management: Add model registry and experiment tracking
  • Monitoring: Support monitoring for machine learning

While Merlin is still a new platform at Shopify, it’s already empowering us with the scalability, fast iteration and flexibility that we had in mind when designing it. We're excited to keep building the platform and onboarding new data scientists, so Merlin can help enable the millions of businesses powered by Shopify.

Isaac Vidas is a tech lead on the ML Platform team, focusing on designing and building Merlin, Shopify’s machine learning platform. Connect with Isaac on LinkedIn.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

Best-in-Class Developer Experience with Vite and Hydrogen

Best-in-Class Developer Experience with Vite and Hydrogen

Hydrogen is a framework that combines React and Vite for creating custom storefronts on Shopify. It maximizes performance for end-users and provides a best-in-class developer experience for you and your team. Since it focuses on evergreen browsers, Hydrogen can leverage modern capabilities, best practices, and the latest tooling in web development to bring the future of ecommerce closer.

Creating a framework requires a lot of choices for frontend tooling. One major part of it is the bundler. Traditionally, developers had no native way to organize their code in JavaScript modules. Therefore, to minimize the amount of code and waterfall requests in the browser, new frontend tools like Webpack started to appear, powering projects such as Next.js and many more.

Bundling code became the de facto practice for the last decade, especially when using view libraries like React or Vue. Whereas these tools successfully solved the problem, they quickly became hard to understand and maintain due to the increasing complexity of the modern web. On top of that, the development process started to slow down because bundling and compiling are inherently slow: the more files in a project, the more work the tool needs to do. Repeat this process for every change made in a project during active development, and one can quickly notice how the developer experience (DX) tanks. 

Diagram showing bundle-based dev server. Modules are bundled and compiled to be server ready
Bundle-based dev server image from Vite.js docs

Thanks to the introduction of ES Modules (a native mechanism to author JavaScript modules) and its support in browsers, some new players like Snowpack and Parcel appeared and started shaping up the modern web development landscape.

Image showing use of native ES Modules to minimize the amount of bundling required during development
Native ESM-based dev server from Vite.js docs

This new generation of web tooling aims to improve the DX of building apps. Whereas Webpack needs a complex configuration, even for simple things due to its high flexibility, these new tools provide very sensible but configurable default values. Furthermore, they leverage native ES Modules to minimize the amount of bundling required during development. In particular, they tend to bundle and cache only third-party dependencies to keep network connections low (the number of files downloaded by the browser). Some dependencies may have dozens or hundreds of files, but they don't need to be updated often. On the other hand, it provides user code unbundled to the browser, thus speeding up the refresh rates when making changes.

Enter Vite. With its evergreen and modern philosophy, we believe Vite aligns perfectly with Hydrogen. Featuring a lightning-fast development server with hot module replacement, a rich plugin ecosystem, and clever default configurations that make it work out of the box for most apps, Vite was among the top options to power Hydrogen's development engine.

Why Vite?

Vite is French for "quick", and the Hydrogen team can confirm: it's really fast. From the installation and setup to its hot reloading, things that used to be a DX pain are (mostly) gone. It’s also highly configurable and simple to use.

Partially, this is thanks to the two magnificent tools that power it: ESBuild, a Golang-based, lightning-fast compiler for JavaScript, and Rollup, a flexible and intelligible bundler. However, Vite is much more than the sum of these parts.

Ease of Use

In Vite, the main entry point is a simple index.html file, making it a first-class citizen instead of an after-thought asset. Everything else flows from here by using stylesheets and scripts tags. It crawls and analyzes all of the imported assets and transforms them accordingly.

Thanks to its default values, most flavors of CSS and JavaScript, including JSX, TypeScript (TS), PostCSS, work out of the box.

Let me reiterate this: it just works™. No painful configuration is needed to get those new CSS prefixes or the latest TS type checking working. It even lets you import WebAssembly or SVG files from JavaScript just like that. Also, since Vite's main target is modern browsers, it’s prepared to optimize the code and styles by using the latest supported features by default.

We value the simplicity Vite brings in Hydrogen and share it with our users. It all sums up to saving a lot of time configuring your tooling compared to other alternatives.

A Proven Plugin System

Rollup has been around for a much longer time than Vite. It does one thing and does it very well: bundling. The key here is that Vite can tell it what to bundle.

Furthermore, Rollup has a truly rich plugin ecosystem that is fully compatible with Vite. With this, Vite provides hooks during development and building phases that enable advanced use cases, such as transforming specific syntax like Vue files. There are many plugins out there that use these hooks for anything you can imagine: Markdown pages with JSX, SSR-ready icons, automatic image minification, and more.

In Hydrogen, we found these Vite hooks are easier to understand and use than those in Webpack, and it allows us to write more maintainable code.


A common task that tends to slow down web development is compiling JavaScript flavors and new features to older and widely supported code. Babel, a compiler written in JavaScript, has been the king in this area for a long time.

However, new tools like ESBuild started to appear recently with a very particular characteristic: they use a machine-compiled language to transform JavaScript instead of using JavaScript itself. In addition, and perhaps more importantly, they also apply sophisticated algorithms to avoid repeating AST parsing and parallelize work, thus establishing a new baseline for speed.

Apart from using ESBuild, Vite applies many other optimizations and tricks to speed up development. For instance, it pre-bundles some third-party dependencies and caches them in the filesystem to enable faster startups.

All in all, we can say Vite is one of the fastest alternatives out there when it comes to local development, and this is something we also want our users to benefit from in Hydrogen.


Along with Snowpack and Parcel, Vite is one of the first tools to embrace ECMAScript Modules (ESM) and inject JavaScript into the browser using script tags with type=module.

This, paired with hot-module replacement (HMR), means that changes to files on the local filesystem are updated instantly in the browser.

Vite is also building for the future of the web and the NPM ecosystem. While most third-party libraries are still on CommonJS (CJS) style modules (native in Node.js), the new standard is ESM. Vite performs an exhaustive import analysis of dependencies and transforms CJS modules into ESM automatically, thus letting you import code always in a modern fashion. And this is not something to take lightly. CJS and ESM interoperability is one of the biggest headaches web developers have faced in recent years.

As app developers ourselves in Hydrogen, it is such a relief we can focus on coding without wasting time on this issue. Someday most packages will, hopefully, follow the ESM standard. Until that day, Vite has us covered.

Server-Side Rendering

Server-side rendering (SSR) is a critical piece to modern frameworks like Hydrogen and is another place where Vite shines. It extends Rollup hooks to provide SSR information, thus enabling many advanced use cases.

For example, it is possible to transform the same imported file in different ways depending on the running environment (browser or server). This is key to supporting some advanced features we need in Hydrogen, such as React Server Components, which was only available in Webpack to this date.

Vite can also load front-end code in the server by converting dependencies to a Node-compatible runtime and modules to CJS. Think of simply importing a React application in Node. It greatly eases the way SSR works and is something Hydrogen leverages to remove extra dependencies and simplify code.


Last but not least, Vite has a large and vibrant community around it.

Many projects in addition to Hydrogen are relying on and contributing to Vite, such as Vitest, SvelteKit, Astro, Storybook, and many more.

And it's not just about the projects, but also the people behind them who are incredibly active and always willing to help in Vite's Discord channel. From Vite's creator, @youyuxi, to many other contributors and maintainers such as @patak_dev, @alecdotbiz, or @antfu7.

Hydrogen is also a proud sponsor of Vite. We want to support the project to ensure it stays up to date with the latest DX improvements to make web developers’ life easier.

How Hydrogen uses Vite

Our goal when building Hydrogen on top of Vite was to keep things as “close to the metal” as possible and not reinvent the wheel. CLI tools can rely on Vite commands internally, and most of the required configuration is abstracted away.

Creating a Vite-powered Hydrogen storefront is as easy as adding the @shopify/hydrogen/plugin plugin to your vite.config.js:

Behind the scenes, we are invoking 4 different plugins:

  • hydrogen-config: This is responsible for altering the default Vite config values for Hydrogen projects. It helps ensure bundling for both Node.js and Worker runtimes work flawlessly, and that third-party packages are processed properly.
  • react-server-dom-vite: It adds support for React Server Components (RSC). We extracted this plugin from Hydrogen core and made it available in the React repository.
  • hydrogen-middleware: This plugin is used to hook into Vite’s dev server configuration and inject custom behavior. It allows us to respond to SSR and RSC requests while leaving the asset requests to Vite’s default web server.
  • @vite/plugin-react: This is an official Vite plugin that adds some goodies for React development such as fast refresh in the browser.

Just with this, Hydrogen is able to support server components, streaming requests, clever caching, and more. By combining this with all the features Shopify already provides, you can unlock unparalleled performance and best-in-class DX for your storefront.

Choosing the Right Tool

There are still many advanced use cases where Webpack is a good fit since it is very mature and flexible. Many projects and teams, such as React’s, rely heavily on it for their day-to-day development.

However, Vite makes building modern apps a delightful experience and empowers framework authors with many tools to make development easier. Storefront developers can enjoy a best-in-class DX while building new features at a faster pace. We chose Vite for Hydrogen and are happy with that decision so far.

Fran works as a Staff Software Engineer on the Hydrogen team at Shopify. Located in Tokyo, he's a web enthusiast and an active open source contributor who enjoys all things tech and all things coconut. Connect with Fran on Twitter and GitHub.

If you’re passionate about solving complex problems at scale, and you’re eager to learn more, we're hiring! Reach out to us or apply on our careers page.

Continue reading

10 Books Shopify’s Tech Talent Think You Should Read

10 Books Shopify’s Tech Talent Think You Should Read

How we think, absorb information, and maximize time—these are the topics Shopify developers and engineers are reading up on.

We have a book bar of the company’s favorite reads and make sure any employee who wants a copy of any title can get one. So we thought we’d flip the script and ask 10 of our technical minds to tell us the books they think everyone in tech should read this year.

Many of their choices were timeless, suggesting a clear desire to level up beyond hard skills. There are a couple deep dives into software design and computing systems, but many of the titles on this reading list are guides for reframing personal habits and patterns: taking notes, receiving feedback, sharing knowledge, and staying focused amid myriad distractions.

The Talent Code by Daniel Coyle

(Bantam Books)

I received my copy of The Talent Code shortly before uprooting my life to attend a front-end bootcamp. The school sent a copy to every student about to start their nine-week program. Coyle’s thesis is “Greatness isn’t born. It’s grown.” He highlights areas that allow us to become great at almost anything: deep practice, passion, and master coaching. The book made me rethink whether I’m destined to be bad at some things. One example for me was softball, but a more pressing use case was my upcoming immersion in coding. Coyle’s lessons helped me thrive during my course’s long hours, but I haven’t applied the same lessons to softball, yet.

Carys Mills, Staff Front End Developer

The 5 Elements of Effective Thinking by Edward B. Burger and Michael Starbird

(Princeton University Press)

I’ve always followed the adage of “work smarter, not harder,” but in knowledge work, how do we “think smarter, not harder”? The 5 Elements of Effective Thinking presents an answer, packaged in a framework that’s applicable in work and life more broadly. The book is short and pithy. I keep it near my desk. The elements of the book include how to understand a topic, how to think about failure, how to generate good questions, and how to follow those questions. I won’t spoil the fifth element for you, you’ll have to read about it yourself!

Ash Furrow, Senior Staff Developer

Thanks for the Feedback: The Science and Art of Receiving Feedback Well by Sheila Heen and Douglas Stone

(Viking Adult)

As developers, we give and receive feedback all the time—every code review, tech review, and, of course, feedback on our foundational and soft skills too. There’s a lot of focus on how to do a good code review—how to give feedback, but there’s also an art of receiving feedback. Sheila Heen and Douglas Stone’s Thanks for the Feedback: The Science and Art of Receiving Feedback Well does an excellent job of laying out the different layers involved in receiving feedback and the different kinds there are. Being able to identify the kind of feedback I’m getting (beyond "constructive")—appreciation or encouragement, coaching or evaluative—has helped me leverage even poorly delivered feedback to positively impact my personal and professional growth.

Swati Swoboda, Development Manager, Shipping

How to Take Smart Notes by Sönke Ahrens


Occasionally there are books that will totally flip how you think about doing something. How to Take Smart Notes is one of those. The title is about notes, but the book is about taking a totally different approach to learning and digesting information. Even if you choose not to follow the exact note taking technique it describes, the real value is in teaching you how to think about your own methods of absorbing and integrating new information. It’s completely changed the approach I take to studying nonfiction books.

Rose Wiegley, Staff Software Engineer

Extreme Ownership: How U.S. Navy SEALs Lead and Win by Jocko Willink and Leif Babin

(Echelon Front)

The book that I'd recommend people read, if they haven't read it before, is actually a book we recommend internally at Shopify: Extreme Ownership by Jocko Willink and Leif Babin. Don't let the fact that it's about the Navy SEALs put you off. There are so many generally applicable lessons that are critical as our company continues to grow at a rapid pace. Success in a large organization—especially one that is globally distributed—is about decentralized leadership from teams and individuals: we all have the autonomy and permission to go forth and build amazing things for our merchants, so we should do just that whilst setting great examples for others to follow.

James Stanier, Director of Engineering, Core

The Elements of Computing Systems: Building a Modern Computer from First Principles by Noam Nisan and Shimon Schocken

(The MIT Press)

Curious how tiny hardware chips become the computers we work on? I highly recommend The Elements of Computing Systems for any software developer wanting a more well-rounded understanding of a computer’s abstraction layers—not just at the level you’re most comfortable with, but even deeper. This workbook guides you through building your own computer from the ground up: hardware chip specifications, assembly language, programming language, and operating system. The authors did a great job of including the right amount of knowledge to not overwhelm readers. This book has given me a stronger foundation in computing systems while working at Shopify. Don’t like technical books? The authors also have lectures on Coursera available for free.

Maple Ong, Senior Developer

A Philosophy of Software Design by John Ousterhout

(Yaknyam Press)

A Philosophy of Software Design tackles a complicated topic: how to manage complexity while building systems. And, surprisingly, it’s an easy read. One of Stanford computer science professor John Ousterhout’s insights I strongly agree with is that working code isn’t enough. Understanding the difference between tactical vs strategic coding helps you level up—specifically, recognizing when a system is more complex than it needs to be is a crucial yet underrated skill. I also like how Ousterhout likens software to playing a team sport, and when he explains why our work as developers isn’t only writing code that works, but also creating code and systems that allow others to work easily. Read with an open mind. A Philosophy of Software Design offers a different perspective from most books on the subject.

Stella Miranda, Senior Developer

Living Documentation by Cyrille Martraire

(Addison-Wesley Professional)

Living Documentation isn’t so much about writing good documentation than about transmitting knowledge, which is the real purpose of documentation. In the tech world where the code is the source of truth, we often rely on direct interactions when sharing context, but this is a fragile process as knowledge can be diluted from one person to another and even lost when people leave a company. On the other side of the spectrum lies traditional documentation. It’s more perennial but requires significant effort to keep relevant, and that’s the main reason why documentation is the most overlooked task in the tech world. Living Documentation is an attempt at bridging the gap between these two extremes by applying development tools and methods to documentation in an incremental way, ensuring knowledge transmission in a 100-year company.

Frédéric Bonnet, Staff Developer

Uncanny Valley by Anna Wiener

(MCD Books)

Sometimes you need to read something that’s both resonant and entertaining in addition to job or specific skill-focused books. In the memoir Uncanny Valley, Anna Wiener vividly describes her journey from working as a publishing assistant in New York to arriving in the Bay Area and befriending CEOs of tech unicorns. At a time when tech is one of the biggest and most influential industries in the world, her sharp observations and personal reflections force those of us working in the sector to look at ourselves with a critical eye.

Andrew Lo, Staff Front End Developer

Deep Work: Rules For Focused Success in a Distracted World by Cal Newport

(Grand Central Publishing)

I've found that the most impactful way to tackle hard problems is to first get into a flow state. Having the freedom to work uninterrupted for long blocks of time has often been the differentiator in discovering creative solutions. Once you've experienced it, it's tough going back to working any other way. Most of the activities we do as knowledge workers benefit from this level of attention and focus. And if you've never tried working in long, focused time blocks, Deep Work should convince you to give it a shot. A word of warning though: make sure you have a bottle of water and some snacks handy. It's easy to completely lose track of time and skip meals. Don't do that!

Krishna Satya, Development Manager

For more book recommendations, check out this Twitter thread from last year’s National Book Lovers Day.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

How We Built the Add to Favorite Animation in Shop

How We Built the Add to Favorite Animation in Shop

I just want you to feel it

Jay Prince, from the song Feel It

I use the word feeling a lot when working on animations and gestures. For example, animations or gestures sometimes feel right or wrong. I think about that word a lot because our experiences using software are based on an intuitive understanding of the real world. When you throw something in real life, it influences how you expect something on screen to behave after you drag and release it.

By putting work, love, and care into UI details and designs, we help shape the experience and feeling users have when using an app. All the technical details and work is in service of the user's experiences and feelings. The user may not consciously notice the subtle animations we create, but if we do our job well, the tiniest gesture will feel good to them.

The team working on Shop, our digital shopping assistant, recently released a feature that allows buyers to favorite products and shops. By pressing a heart button on a product, buyers can save those products for later. When they do, the product image drops into the heart icon (containing a list of favorite products) in the navigation tab at the bottom.

In this post, I’ll show you how I approached implementing the Add to Favorite animation in Shopify’s Shop app. Specifically, we can look at the animation of the product image thumbnail appearing, then moving into the favorites tab bar icon:

Together, we'll learn:

  • How to sequence animations.
  • How to animate multiple properties at the same time.
  • What interpolation is.

Getting Started

When I start working on an animation from a video provided by a designer, I like to slow it down so I can see what's happening more clearly:

If a slowed video isn’t provided, you can record the animation using Monosnap or Quicktime. This also allows you to slowly scrub through the video. Fortunately, we also have this great motion spec to work with as well:

As you can see, the motion spec defines the sequence of animations. Based on the spec, we can determine:

  • which properties are animating
  • what values to animate to
  • how long each animation will take
  • the easing curve of the animation
  • the overall order of the animations

Planning the Sequence

Firstly, we should recognize that there are two elements being animated:

  • the product thumbnail
  • the favorites tab bar icon

The product thumbnail is being animated first, then the Favorites tab bar icon is being animated second. Let's break it down step by step:

1. Product thumbnail fades in from 0% to 100% opacity. At the same time, it scales from 0 to 1.2.
2. Product thumbnail scales from 1.2 to 1
(A 50 ms pause where nothing happens)
3. Product thumbnail moves down, then disappears instantly at the end of this step.
4. The Favorite tab bar icon moves down. At the same time, it changes color from white to purple.
5. The Favorite tab bar icon moves up. At the same time, it changes color from purple to white.
6. The Favorite tab bar icon moves down.
7. The Favorite tab bar icon moves up to its original position.


Each of the above steps is an animation that has a duration and easing curve, as specified in the motion spec provided by the motion designer. Our motion specs define these easing curves that define how a property changes over time:

Coding the Animation Sequence

Let's write code! The Shop app is a React Native application and we use the Reanimated library to implement animations.

For this animation sequence, there are multiple properties being animated at times. However, these animations happen together, driven by the same timings and curves. Therefore we can use only one shared value for the whole sequence. That shared progress value can drive animations for each step by moving from values 1 to 2 to 3 etc.

So the progress value tells us which step of the animation we are in, and we can set the animated properties accordingly. As you can see, this sequence of steps match the steps we wrote down above, along with each step's duration and easing curves, including a delay at step 3:

We can now start mapping the progress value to the animated properties!

Product Thumbnail Styles

First let's start with the product thumbnail fading in:

What does interpolate mean?

Interpolating maps a value from an input range to an output range. For example, if the input range is [0, 1] and the output range is [0, 10], then as the input increases from 0 to 1, the output increases correspondingly from 0 to 10. In this case, we're mapping the progress value from [0, 1] to [0, 1] (so no change in value).

In the first step of the animation, the progress value changes from 0 to 1 and we want the opacity to go from 0 to 1 during that time so that it fades in. “Clamping” means that when the input value is greater than 1, the output value stays at 1 (it restricts the output to the maximum and minimum of the output range). So the thumbnail will fade in during step 1, then stay at full opacity for the next steps because of the clamping.

However, we also want the thumbnail to disappear instantly at step 3. In this case, we don't use interpolate because we don't want it to animate a fade-out. Instead, we want an instant disappearance:

Now the item is fading in, but it also has to grow in scale and then shrink back a bit:

This interpolation is saying that from step 0 to 1, we want scale to go from 0 to 1.2. From step 1 to 2, we want the scale to go from 1.2 to 1. After step 2, it stays at 1 (clamping).

Let's do the final property, translating it vertically:

So we're moving from position -60 to -34 (half way behind the tab bar) between steps 2 and 3. After step 3, the opacity becomes 0 and it disappears! Let's test the above code:

Nice, it fades in while scaling up, then scales back down, then slides down halfway under the tab bar, and then disappears.

Tab Bar Icon Styles

Now we just need to write the Favorite tab bar icon styles!

First, let's handle the heart becoming filled (turning purple), then unfilled (turning white). I did this by positioning the filled heart icon over the unfilled one, then fading in the filled one over the unfilled one. Therefore, we can use a simple opacity animation where we move from 0 to 1 and back to 0 over steps 3, 4 and 5:

For the heart bouncing up and down we have:

From steps 3 to 7, this makes the icon move up and down, creating a bouncing effect. Let's see how it looks!

Nice, we now see the tab bar icon react to having a product move into it.

Match Cut

By using a single shared value, we ensured that the heart icon moves down immediately when the thumbnail disappears, creating a match cut. A “match cut” is a cinematic technique where the movement of an item immediately cuts to the movement of another item during a scene transition. The movement that the users’ eye expects as the product thumbnail moves down cuts to a matching downward movement of the heart icon. This creates an association of the item and the Favorites section in the user's mind.

In another approach, I tried using setTimeout to start the tab bar icon animation after the thumbnail one. I found that when the JS thread was busy, this would delay the second animation, which ruined the match cut transition! It felt wrong when seeing it with that delay. Therefore, I did not use this approach. Using withDelay from Reanimated would have avoided this issue by keeping the timer on the UI thread.

When I started learning React Native, the animation code was intimidating. I hope this post helps make implementing animations in React Native more fun and approachable. When done right, they can make user interactions feel great!

You can see this animation by favoriting a product in the Shop app!

Special thanks to Amber Xu for designing these animations, providing me with great specs and videos to implement them, and answering my many questions.

Andrew Lo is a Staff Front End Developer on the Shop's Design Systems team. He works remotely from Toronto, Canada. 

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

A Data Scientist’s Guide To Measuring Product Success

A Data Scientist’s Guide To Measuring Product Success

If you’re a data scientist on a product team, much of your work involves getting a product ready for release. You may conduct exploratory data analyses to understand your product’s market, or build the data models and pipelines needed to power a new product feature, or design a machine learning model to unlock new product functionality. But your work doesn’t end once a product goes live. After a product is released, it’s your job to help identify if your product is a success.

Continue reading

Using Terraform to Manage Infrastructure

Using Terraform to Manage Infrastructure

Large applications are often a mix of code your team has written and third-party applications your team needs to manage. These third-party applications could be things like AWS or Docker. In my team’s case, it’s Twilio TaskRouter.

The configuration of these services may not change as often as your app code does, but when it does, the process is fraught with the potential for errors. This is because there is no way to write tests for the changes or easily roll them back–things we depend on as developers when shipping our application code.

Using Terraform improves your infrastructure management by allowing users to implement engineering best practices in what would otherwise be a GUI with no accountability, tests, or revision history.

On the Conversations team, we recently implemented Terraform to manage a piece of our infrastructure to great success. Let’s take a deeper look at why we did it, and how.

My team builds Shopify’s contact center. When a merchant or partner interacts with an agent, they are likely going through a tool we’ve built. Our app suite contains applications we’ve built in-house and third-party tools. One of these tools is Twilio TaskRouter.

TaskRouter is a multi-channel skill-based task routing API. It handles creating tasks (voice, chat, etc.) and routing them to the most appropriate agent, based on a set of routing rules and agent skills that we configure.

As our business grows and becomes more complex, we often need to make changes to how merchants are routed to the appropriate agent.

Someone needs to go into our Twilio console and use the graphic user interface (GUI) to update the configuration. This process is fairly straightforward and works well for getting off the ground quickly. However, the complexity quickly becomes too high for one person to understand it in its entirety.

In addition, the GUI doesn’t provide a clear history of changes or a way to roll them back.

As developers, we are used to viewing a commit history, reading PR descriptions and tests to understand why changes happened, and rolling back changes that are not working as expected. When working with Twilio TaskRouter, we had none of these.

Using Terraform to Configure Infrastructure

Terraform is an open source tool for configuring infrastructure as code.

It is a state machine for infrastructure that gives teams all the benefits of engineering best practices listed above to infrastructure that was previously only manageable via a GUI.

Terraform requires three things to work:

  1. A reliable API. We need a reliable API for Terraform to work. When using Terraform, we will stop using the GUI and rely on Terraform to make our changes for us via the API. Anything you can’t change with the API, you won’t be able to manage with Terraform.
  2. A Go client library. Terraform is written in Go and requires a client library for the API you’re targeting written in Go. The client library makes HTTP(S) calls to your target app.
  3. A Terraform provider. The core Terraform software uses a provider to interact with the target API. Providers are written in Go using the Terraform Plugin SDK.

With these three pieces, you can manage just about any application with Terraform!

Image from:<

A Terraform provider adds a set of resources Terraform can manage. Providers are not part of Terraform’s code. They are created separately to manage a specific application. Twilio did not have a provider when we started this project, so we made our own.

Since launching this project, Twilio has developed its own Terraform provider, which can be found here.

At its core, a provider enables Terraform to perform CRUD operations on a set of resources. Armed with a provider, Terraform can manage the state of the application.

Creating a Provider

Note: If you are interested in setting up Terraform for a service that already has a provider, you can skip to the next section.

Here is the basic structure of a Terraform provider:

This folder structure contains your Go dependencies, a Makefile for running commands, an example file for local development, and a directory called twilio. This is where our provider lives.

A provider must contain a resource file for every type of resource you want to manage. Each resource file contains a set of CRUD instructions for Terraform to follow–you’re basically telling Terraform how to manage this resource.

Here is the function defining what an activity resource is in our provider:

Note: Go is a strongly typed language, so the syntax might look unusual if you’re not familiar with it. Luckily you do not need to be a Go expert to write your own provider!

This file defines what Terraform needs to do to create, read, update and destroy activities in Task Router. Each of these operations is defined by a function in the same file.

The file also defines an Importer function, a special type of function that allows Terraform to import existing infrastructure. This is very handy if you already have infrastructure running and want to start using Terraform to manage it.

Finally, the function defines a schema–these are the parameters provided by the API for performing CRUD operations. In the case of Task Router activities, the parameters are friendly_name, available, and workspace_sid.

To round out the example, let’s look at the create function we wrote:

Note: Most of this code is boilerplate Terraform provider code which you can find in their docs.

The function is passed context, a schema resource, and an empty interface.

We instantiate the Twilio API client and find our workspace (Task Router activities all exist under a single workspace).

Then we format our parameters (defined in our Schema in the resourceTwilioActivity function) and pass them into the create method provided to us by our API client library.

Because this function creates a new resource, we set the id (setID) to the sid of the result of our API call. In Twilio, a sid is a unique identifier for a resource. Now Terraform is aware of the newly created resource and it’s unique identifier, which means it can make changes to the resource.

Using Terraform

Once you have created your provider or are managing an app that already has a provider, you’re ready to start using Terraform.

Terraform uses a DSL for managing resources. The good news is that this DSL is more straightforward than the Go code that powers the provider.

The DSL is simple enough that with some instruction, non-developers should be able to make changes to your infrastructure safely–but more on that later.

Here is the code for defining a new Task Router activity:

Yup, that’s it!

We create a block declaring the resource type and what we want to call it. In that block, we pass the variables defined in the Schema block of our resourceTwilioActivity, and any resources that it depends on. In this case, activities need to exist within a workspace. So we pass in the workspace resource in the depends_on array. Terraform knows it needs this resource to exist or to create it before attempting to create the activity.

Now that you have defined your resource, you’re ready to start seeing the benefits of Terraform.

Terraform has a few commands, but plan and apply are most common. Plan will print out a text-based representation of the changes you’re about to make:

Terraform makes visualizing the changes to your infrastructure very easy. At this planning step you may uncover unintended changes - if there was already an offline activity the plan step would show you an update instead of a create. At this step, all you need to do is change your resource block’s name,and run terraform plan again.

When you are satisfied with your changes, run terraform apply to make the changes to your infrastructure. Now Terraform will know about the newly created resource, and its generated id, allowing you to manage it exclusively through Terraform moving forward.

To get the full benefit of Terraform (PRs, reviews, etc.), we use an additional tool called Atlantis to manage our GitHub integration.

This allows people to make pull requests with changes to resource files, and have Atlantis add a comment to the PR with the output of terraform plan. Once the review process is done, we comment atlantis apply -p terraform to make the change. Then the PR is merged.

We have come a long way from managing our infrastructure with a GUI in a web app! We have a Terraform provider communicating via a Go API client to manage our infrastructure as code. With Atlantis plugged into our team’s GitHub, we now have many of the best practices we rely on when writing software–reviewable PRs that are easy to understand and roll back if necessary, with a clear history that can be scanned with a git blame.

How was Terraform Received by Other Teams?

The most rewarding part of this project was how it was received by other teams. Instead of business and support teams making requests and waiting for developers to change Twilio workflows, Terraform empowered them to do it themselves. In fact, some people’s first PRs were changes to our Terraform infrastructure!

Along with freeing up developer time and making the business teams more independent, Terraform provides visibility to infrastructure changes over time. Terraform shows the impact of changes, and the ease of searching GitHub for previous changes makes it easy to understand the history of changes our teams have made.

Building great tools will often require maintaining third-party infrastructure. In my team’s case, this means managing Twilio TaskRouter to route tasks to support agents properly.

As the needs of your team grow, the way you configure your infrastructure will likely change as well. Tracking these changes and being confident in making them is very important but can be difficult.

Terraform makes these changes more predictable and empowers developers and non-developers alike to use software engineering best practices when making these changes.

Jeremy Cobb is a developer at Shopify. He is passionate about solving problems with code and improving his serve on the tennis court.

Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Design.

Continue reading

Creating a React Library for Consistent Data Visualization

Creating a React Library for Consistent Data Visualization

At Shopify, we tell a lot of stories through data visualization. This is the driving force behind business decisions—not only for our merchants, but also for teams within Shopify.

With more than 10,000 Shopify employees, though, it is only natural that different teams started using different tools to display data, which is great—after all, creative minds create diverse solutions, right? The problem is that it led to a lot of inconsistencies, like these two line charts that used to live in the Shopify admin—the page you see after logging in to Shopify, where you can set up your store, configure your settings, and manage your business—for example:

Let’s play Spot the Difference: line widths, dashed line styles, legend styles, background grids, one has labels on the X Axis, the other doesn’t... This isn’t just a “visual styles” problem since they use different libraries, one was accessible to screen readers and the other wasn’t; one was printable the other not.

To solve this problem, the Insights team has been working on creating a React data visualization library—Polaris Viz—that other teams can rely on to quickly implement data visualization without having to solve the same problems over and over again.

But first things first, if you haven’t yet, I recommend you start by reading my co-worker Miru Alves’ amazing blog post where she describes how we used Delta-E and Contrast Ratio calculations to create a color matrix with a collection of colors we can choose from to safely use without violating any accessibility rules.

This post is going to focus on the process of implementing the light and dark themes in the library, as well as allowing library consumers to create their own themes, since not all Shopify brands like Shop, Handshake, or Oberlo use the same visual identity.

Where Did the Inconsistencies Come From?

When we started tackling this issue, the first thing we noticed was that even in places that were already using only Polaris Viz, we had visual inconsistencies. This is because our original components API looked like this:

As you can see, changing the appearance of a chart involved many different options spread in different props, and you either had to create a wrapping component that has all the correct values or pass the props over and over again to each instance. OK, this explains a lot.

Ideally, all charts in the admin should use either the default dark or light themes that the UX team created, so we should make it easy for developers to choose light or dark without all this copyin’ && pasta.

Implementing Themes

To cover the use cases of teams that used the default dark or light themes, we removed all the visual style props and introduced a new theme prop to all chart components:

  • The theme prop accepts the name of a theme defined in a record of Themes.
  • The Theme type contains all visual configurations like colors, line styles, spacing, and if bars should be rounded or not.

These changes allow consumers to have all the good styles by default—styles that match our visual identity, take accessibility into consideration, and have no accidental discrepancies—and they just have to pass in theme=’Light’ if they want to use the Light instead of the Dark theme

This change should cover the majority of use cases, but we still need to support other visual identities. Putting back all those style props would lead to the same problems for whoever wasn’t using the default styles. So how could we make it easy to specify a different visual identity?

Introducing the PolarisVizProvider

We needed a way to allow consumers to define what their own visual identity looks like in a centralized manner so all charts across their applications would just use the correct styles. So instead of having the chart components consume the themes record from a const directly, we introduced a context provider that stores the themes:

By having the provider accept a themes prop, we allow consumers to overwrite the Default and Light themes or add their own. This implementation could cause some problems though: what happens if a user overwrites the Default theme, but doesn’t provide all the properties that are necessary to render a chart. For example what if they forget to pass the tooltip background color?

To solve this, we first implemented a createTheme function:

createTheme allows you to pass in a partial theme and obtain a complete theme. All the properties that are missing in the partial theme will just use the library’s default values.

Next, we implemented a createThemes function. It guarantees that even if properties are overwritten, the theme record will always contain the Default and Light themes:

With both of these in place, we just needed to update the PolarisVizProvider implementation:

Overwriting the Default Theme

From a consumer perspective, this means that you could wrap your application with a PolarisVizProvider, define your Default theme, and all charts will automagically inherit the correct styles. For example:

All charts inside of <App/> will have a blue background by default:

It hurts my eyes, but IT WORKS!

Creating Multiple Themes

You can also define multiple extra themes in the PolarisVizProvider. Each top level key in this object is used as a theme name that you can pass to individual charts later on. For example:

The first chart uses a theme named AngryRed and the second HappyGreen

We did have to repeat the definition of the single series color twice though—seriesColors.single = [‘black’]—it would be even more annoying if we had multiple properties shared by both themes and only wanted to overwrite some. We can make this easier by changing the implementation of the createTheme function to accept an optional baseTheme, instead of always using the default from the library:

With those changes in place, as a consumer I can just import createTheme from the library and use AngryRed as the base theme when creating HappyGreen:

Making Colors Change According to the Data Set

Another important feature we had in the library and didn’t want to lose was to change the series colors according to the data.

In this example, we’re applying a green gradient to the first chart to highlight the highest values as having more ordered items—more sales—is a good thing! In the second chart though, we’re applying a red gradient to highlight the highest values, since having more people return what they ordered isn’t such a good thing.

It would be super cumbersome to create extra themes any time we wanted a specific data series to use a different color, so we changed our DataSeries type to accept an optional colour that can overwrite the series colour coming from the theme:

So for the example above, we could have something like:

Next Steps

Polaris Viz will be open source soon! If you want to get access to the beta version of the library, help us test, or suggest features that might be useful for you, reach out to us at

Krystal is a Staff Developer on the Visualization Experiences team. When she’s not obsessing over colors, shapes and animation she’s usually hosting karaoke & billiards nights with friends or avoiding being attacked by her cat, Pluma.

Continue reading

Test Budget: Time Constrained CI Feedback

Test Budget: Time Constrained CI Feedback

At Shopify we run more than 170,000 tests in our core monolith. Naturally, we're constantly exploring ways to make this faster, and the Test Infrastructure team analyzed the feasibility of introducing a test budget: a fixed amount of time for tests to run. The goal is to speed up the continuous integration (CI) test running phase by accepting more risk. To achieve that goal we used prioritization to reorder the test execution plan in order to increase the probability of a fast failure. Our analysis provided insights into the effectiveness of executing prioritized tests under a time constraint. The single most important finding was that we were able to find failures after we had run only 70% of the test-selection suite.

The Challenge

Shopify’s codebase relies on CI to avoid regressions before releasing new features. As the code submission rate grows along with the development team size, so does the size of the test pool and the time between code check-ins and test result feedback. As seen in the figure below developers will occasionally get late CI feedback while other times the CI builds complete in under 10 minutes. This non-normal cadence of receiving CI feedback leads to more frequent context switches.

The feedback time varies

Various techniques exist to speed up CI such as running tests in parallel or reducing the number of tests to run with test selection. Balancing the cost of running tests against the value of running them is a fundamental topic in test selection. Furthermore, if we think of the value as a variable then we can make the following observations for executing tests:

  • No amount of tests can give us complete confidence that no production issue will occur.
  • The risk of production issues is lower if we run all the tests.
  • As complexity of the system increases, the value of testing any individual component decreases.
  • Not all tests increase our confidence level the same way.

The Approach

It’s important to note first the difference between the test selection and test prioritization. Test selection selects all tests that correspond to the given changes using a call graph deterministically. On the other hand, test prioritization orders the test with the goal of discovering failures fast. Also, that sorted set won’t always be the same for the same change since the prioritization techniques use historical data.

The system we built produces a prioritized set of tests on top of test selection and constrains the execution of those tests using a predetermined time budget. Having established that there’s a limited time to execute the tests, the next step is to determine what’s the best time to stop executing tests and enforce it.

The time constraint or budget, and where the name Test Budget comes from, is the predetermined time we terminate test execution while considering that we must find as many failures as possible during that period of time.

System Overview

The guiding principle we used to build the Test Budget was: we can't be sure there will be no bugs in production that affect the users after running our test suite in any configuration.

To identify the most valuable tests to run within an established time budget, the following steps must be performed:

  1. identify prioritization criteria and compute the respective prioritized sets of tests
  2. compute the metrics for all criteria and analyze the results to determine the best criteria
  3. further analyze the data to pick a time constraint for running the tests

The image below gives a structural overview of the test prioritization system we built. First, we are computing the prioritized sets of tests using historical test results for every prioritization criterion (for example the criterion failure rate will have it’s own prioritized set of tests). Then, given some commit and the test-selection set that corresponds to that commit, we’re executing the prioritized tests as a CI build. These prioritized tests are a subset of the test selection test suite.

Test Prioritization System

First, the system obtains the test result data needed by the prioritization techniques. The data is ingested into a Rails app that’s responsible for the processing and persistence. It exposes the test results through a HTTP API and a GUI. For persistence, we chose to use Redis, not only because of the unstructured nature of our data, but also because of the Redis Sorted Sets data structure that enables us to query for ordered sets of tests in O(logn) time, where n is the number of elements in the set.

The goal of the next step is to select a subset of tests given the changes of the committed code. We created a pipeline that’s being triggered for a percentage of the builds that contain failures. We execute this pipeline with a specific prioritization each time and calculate metrics based on it.

Modeling Risk

During the CI phase, the risk of not finding a fault can be thought of as a numbers game. How certain are we that the application will be released successfully if we have tested all the flows? What if we test the same flows 1000 times? We leaned on test prioritization to order the tests in such a way that early faults are found as soon as possible, which encouraged the application of heuristics as the prioritization criteria. This section explores how to measure the risk of not detecting faults using the time budget and if we don’t skip a test randomly but after using the best heuristics.

Prioritization Criteria

We built six test prioritization criteria that produced a rating for every test in the codebase:

  • failure_rate: how frequently a test fails based on historical data.
  • avg_duration: how fast a test executes. Executing faster tests allows us to execute more tests in a short amount of time.
  • churn: a file that’s changing too much could be more brittle.
  • coverage: how much of the source code is executed when running a test.
  • complexity: based on the lines of code per file.
  • default: this is the random order set.

Evaluation Criteria

After we get the prioritized tests, we need to evaluate the results of executing the test suite following the prioritized order. We chose two groups of metrics to evaluate the criteria:

  1. The first includes the Time to First Failure (TTFF) which acts as a tripwire since if the time to first failure is 10 minutes then we can’t enforce a lower time constraint than 10 minutes.
  2. The second group of metrics includes the Average Percentage of Faults Detected (APFD) and the Convergence Index. We needed to start thinking of the test execution timing problem using a risk scale, which would open the way for us to run fewer tests by tweaking how much risk we will accept.

The APFD is a measure of how early a particular test suite execution detects failures. APFD is calculated using the following formula:

APFD Formula

The equation tells us that in order to calculate the APFD we will take the difference between 1 and the sum of the positions of the tests that expose each failure. In the equation above:

  • n is the number of test cases in the test suite
  • m is the total number of failures in the test suite
  • Fi is the position in the prioritized order set of the first test that exposes the fault i.

The APFD values range from 0 to 1, where higher APFD values imply a better prioritization.

For example, for the test suites (produced by different prioritization algorithms) T1 and T2 that each have a total number of tests (n) = 100 and total number of faults (m) = 4, we get the following matrix:
















And we calculate their APFD values:

APFD Values
APFD Values

The first prioritization has a better APFD rating (0.7525 versus 0.6425).

The Convergence Index tells us when to stop testing within a time constrained environment because a high convergence indicates we’re running fewer tests and finding a big percentage of failures.

Convergence Index = Percentage of faults detectedPercentage of tests executed
Convergence Index

The formula to calculate the Convergence Index is the percentage of faults detected divided by the percentage of tests executed.

Data Analysis

For each build, we created and instrumented a prioritized pipeline to produce artifacts for building the prioritization sets and emit test results to Kafka topics.

The prioritization pipeline in Buildkite

We ran the prioritized pipeline multiple times to apply statistical analysis to our results. Finally, we used Python Notebooks to combine all the measurements and easily visualize the percentiles. For APFD and TTFF we decided to use boxplot to visualize possible outliers and skewness of the data.

When Do We Find the First Failing Test?

We used the TTFF metric to quantify how fast we could know that the CI will eventually fail. Finding the failure within a time window is critical because the goal is to enforce that window and stop the test execution when the time window ends.


In the figure above we present the statistical distributions for the prioritization criteria using boxplots. The median time to find a failure is less than five minutes for all the criteria. Complexity, churn, and avg_duration have the worst third quartile results with a maximum of 16 minutes. On the other hand, default and failure_rate gave more promising results with a median of less than three minutes.

Which Prioritization Criteria Have the Best Failure Detection Rates?

We used the APFD metric to compare the prioritization criteria. A higher APFD value indicates a better failure detection rate.

APFD scores

The figure above presents the boxplots of APFD values for all the prioritization criteria. We notice that there isn’t a significant difference between the churn and complexity prioritization criteria. Both of these have median values close to zero which make them very inappropriate for prioritizing the tests. We also see that the failure_rate has the best detection rate that’s marginally better than the random (default) one.

Which Prioritization Criteria Has the Quickest Convergence Time?

The increase of test failures detected decreases as we execute more tests. This is what we visualized with the convergence index data and using a step chart. In all the convergence graphs the step is 10% of the test suite executed.

Mean convergence index

The above figure indicates that while all the criteria find a percentage of faults after running only 50% of the test suite for the mean, the default and failure_rate prioritization criteria stand out.

For the mean case, executing 50% of the test suite finds 50% of the failures using the default prioritization and 60% using the failure_rate. The failure_rate criterion is able to detect 80% of the failures after running only 60% of the test suite.

How Much Can We Shrink the Test Suite Given a Time Constraint?

The p20 and p5 visualizations of the convergence quantify how reliably we could detect faults within the time budget. We use the p20 and p5 visualizations because a higher value of convergence is better. The time budget is an upper bound. The CI system executes the tests up to that time bound.

Convergence index p20

For example, after looking at the p20 (80% of builds) plot (the above figure), we need to execute 60% of the test-selection tests (the test-selection suite is 40% of the whole test suite on median) to detect an acceptable amount of failures. Then, the time budget is the time it takes to execute 60% of the selected tests.

Convergence index p5

Looking at the plot of the 5th percentile (95% of the builds) plot (see the figure above), we notice that we could execute 70% of the already test-selection reduced test suite to detect 50% of the failures.

The Future of Test Budget Prioritization

By looking at our convergence and TTFF and if we want to emphasize the discovery of a faulty commit, that is the first failure, we can see that we could execute less than 70% of the test-selection suite.

The results of the data analysis suggest several alternatives for future work. First, deep learning models could utilize the time budget as a constraint while they are building the prioritized sets. Prioritizing tests using a feedback mechanism could be the next prioritization to explore, where tests that never run could be automatically deleted from the codebase, or failures that result in problems during production testing could be given a higher priority.

Finally, one possible potential for a Test Budget prioritization system could be outside the scope of the Continuous Integration environment: the development environment. Another way of looking at the ordered sets is that the first tests are more impactful or more susceptible to failures. Then we could use such data to inform developers during the development phase that parts of the codebase are more likely to have failing tests in CI. A message such as “this part of the codebase is covered by a high priority test which breaks in 1% of the builds” would give feedback to developers immediately while they’re writing the code. It would shift testing to the left by giving code suggestions during development, and eventually reduce the costs and time of executing tests in the CI environment.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

Adding the V8 CPU Profiler to v8go

Adding the V8 CPU Profiler to v8go

V8 is Google’s open source high-performance JavaScript and WebAssembly engine written in C++. v8go is a library written in Go and C++ allowing users to execute JavaScript from Go using V8 isolates. Using Cgo bindings allows us to run JavaScript in Go at native performance.

The v8go library, developed by Roger Chapman, aims to provide an idiomatic way for Go developers to interface with V8. As it turns out, this can be tricky. For the past few months, I’ve been contributing to v8go to expose functionality in V8. In particular, I’ve been adding support to expose the V8 CPU Profiler

From the start, I wanted this new API to be:

  • easy for the library's Go users to reason about
  • easy to extend for other profiler functionality eventually
  • aligned closely with the V8 API
  • as performant as possible.

The point about performance is especially interesting. I theorized that my first iteration of the implementation was less performant than a proposed alternative. Without benchmarking them, I proceeded to rewrite. That second implementation was merged, and I moved on with my life. So when I was like "Hey! I should write a post about the PR and benchmark the results" only to actually see the benchmarks and reconsider everything.

If you’re interested in API development, Go/Cgo/C++ performance or the importance of good benchmarks, this is a story for you.

Backing Up to the Starting Line: What Was My Goal?

The goal of adding the V8 CPU Profiler to v8go was so users of the library could measure the performance of any JavaScript being executed in a given V8 context. Besides providing insight on the code being executed, the profiler returns information about the JavaScript engine itself including garbage collection cycles, compilation and recompilation, and code optimization. While virtual machines and the like can run web applications incredibly fast, code should still be performant, and it helps to have data to understand when it's not. 

If we have access to a CPU profiler, we can ask it to start profiling before we start executing any code. The profiler samples the CPU stack frames at a preconfigured interval until it's told to stop. Sufficient sampling helps show the hot code paths whether that be in the source code or in the JavaScript engine. Once the profiler has stopped, a CPU profile is returned. The profile comes in the form of a top-down call tree composed of nodes. To walk the tree, you get the root node and then follow its children all the way down.

Here’s an example of some JavaScript code we can profile:

Using v8go, we start by creating the V8 isolate, context, and CPU profiler. Before running the above code, the profiler is told to start profiling:

After the code has finished running, the profiling is stopped and the CPU profile returned. A simplified profile in a top-down view for this code looks like:

Each of these lines corresponds to a node in the profile tree. Each node comes with plenty of details including:

  • name of the function (empty for anonymous functions)
  • id of the script where the function is located
  • name of the script where the function originates
  • number of the line where the function originates
  • number of the column where the function originates
  • whether the script where the function originates is flagged as being shared cross-origin
  • count of samples where the function was currently executing
  • child nodes of this node
  • parent node of this node
  • and more found in the v8-profiler.h file.

For the purposes of v8go, we don’t need to have opinions about how the profile should be formatted, printed, or used since this can vary. Some may even turn the profile into a flame graph. It’s more important to focus on the developer experience of trying to generate a profile in a performant and idiomatic way.

Evolving the API Implementation

Given the focus on performance and an idiomatic-to-Go API, the PR went through a few different iterations. These iterations can be categorized into two distinct rounds: the first where the profile was lazily loaded and the second where the profile was eagerly loaded. Let’s start with lazy loading.

Round 1: Lazy Loading

The initial approach I took aligned v8go with V8's API as closely as possible. This meant introducing a Go struct for each V8 class we needed and their respective functions (that is, CPUProfiler, CPUProfile, and CPUProfileNode).

This is the Go code that causes the profiler to stop profiling and return a pointer to the CPU profile:

This is the corresponding C++ code that translates the request in Go to V8's C++:

With access to the profile in Go, we can now get the top-down root node:

The root node exercises this C++ code to access the profiler pointer and its corresponding GetTopDownRoot() method:

With the top-down root node, we can now traverse the tree. Each call to get a child, for instance, is its own Cgo call as shown here:

The Cgo call exercises this C++ code to access the profile node pointer and its corresponding GetChild() method:

The main differentiator of this approach is that to get any information about the profile and its nodes, we have to make a separate Cgo call. For a very large tree, this makes at least kN more Cgo calls where k is the number of properties queried, and N is the number of nodes. The value for k will only increase as we expose more properties on each node.

How Go and C Talk to Each Other

At this point, I should explain more clearly how v8go works. v8go uses Cgo to bridge the gap between Go and V8's C code. Cgo allows Go programs to interoperate with C libraries: calls can be made from Go to C and vice versa.

Upon some research about Cgo's performance, you'll find Sean Allen’s Gophercon 2018 talk where he made the following recommendation:

“Batch your CGO calls. You should know this going into it, since it can fundamentally affect your design. Additionally once you cross the boundary, try to do as much on the other side as you can. So for go => “C” do as much as you can in a single “C” call. Similarly for “C” => go do as much as you can in a single go call. Even more so since the overhead is much higher.”

Similarly, you’ll find Dave Cheney’s excellent “cgo is not go” that explains the implications of using cgo: 

“C doesn’t know anything about Go’s calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.

The take-away is that the transition between the C and Go world is non trivial, and it will never be free from overhead.”

When we talk about “overhead,” the actual cost can vary by machine but some benchmarks another contributor v8go (Dylan Thacker-Smith) ran show an overhead of about 54 nanoseconds per operation (ns/op) for Go to C calls and 149 ns/op for C to Go calls:

Given this information, the concern for the lazy loading is justified: when a user needs to traverse the tree, they’ll make many more Cgo calls, incurring the overhead cost each time. After reviewing the PR, Dylan made the suggestion of: building the entire profile graph in C code and then passing a single pointer back to Go so Go could rebuild the same graph using Go data structures loaded with all the information that can then be passed to the user. This dramatically reduces the number of Cgo calls. This brings us to round #2.

Round 2: Eager Loading

To build out a profile for visualization, users will need access to most if not all of the nodes of the profile. We also know that for performance, I want to limit the number of C calls that have to be made in order to do so. So, we move the heavy-lifting of getting the entire call graph inside of our C++ function StopProfiling so that the pointer we return to the Go code is to the call graph fully loaded with all the nodes and their properties. Our go CPUProfile and CPUProfileNode objects will match V8’s API in that they have the same getters, but now, internally, they just return the values from the structs private fields instead of reaching back to the C++ code.

This is what the StopProfiling function in C++ does now: once the profiler returns the profile, the function can traverse the graph starting at the root node and build out the C data structures so that a single pointer to the profile can be returned to the Go code that can traverse the graph to build corresponding Go data structures.

The corresponding function in Go, StopProfiling, uses Cgo to call the above C function (CPUProfilerStopProfiling) to get the pointer to our C struct CPUProfile. By traversing the tree, we can build the Go data structures so the CPU profile is completely accessible from the Go side:

With this eager loading, the rest of the Go calls to get profile and node data is as simple as returning the values from the private fields on the struct.

Round 3 (Maybe?): Lazy or Eager Loading

There’s the potential for a variation where both of the above implementations are options. This means allowing users to decide where they want to lazily or eagerly load everything on the profile. It’s another reason why, in the final implementation of the PR, the getters were kept instead of just making all of the Node and Profile fields public. With the getters and private fields, we can change what’s happening under the hood based on how the user wants the profile to load.

Speed is Everything, So Which One's Faster?

Comparing lazy and eager loading required a test that executed some JavaScript program with a decently sized tree so we could exercise a number of Cgo calls on many nodes. We would measure if there was a performance gain by building the tree eagerly in C and returning that complete call graph as a pointer back to Go.

For quite a while, I ran benchmarks using the JavaScript code from earlier. From those tests, I found that:

  1. When lazy loading the tree, the average duration to build it is ~20 microseconds.
  2. When eagerly loading the tree, the average duration to build it is ~25 microseconds.

It's safe to say these results were unexpected. As it turns out, the theorized behavior of the eager approach wasn’t more optimal than lazy loading, in fact, it was the opposite. It relied on more Cgo calls for this tree size. 

However, because these results were unexpected, I decided to try a much larger tree using the Hydrogen starter template. From testing this, I found that:

  1. When lazy loading the tree, the average duration to build it is ~90 microseconds.
  2. When eagerly loading the tree, the average duration to build it is ~60 microseconds.

These results aligned better with our understanding of the performance implications of making numerous Cgo calls. It seems that, for a tiny tree, the cost of traversing it three times (twice to eagerly load information and once to print it) doesn’t cost less than the single walk to print it that includes numerous Cgo calls. The true cost only shows itself on a much larger tree where the benefit of the upfront graph traversal cost greatly benefits the eventual walkthrough of a large tree to be printed. If I hadn’t tried a different sized input, I would never have seen that the value of eager loading eventually shows itself. If I drew the respective approaches of growth lines on a graph, it would look something like:

Simple graph with time to build profile on the y axis and size of javascript on x axis. 2 lines indicating eager and lazy are plotted on the graph with lazy being higher

Looking Back at the Finish line

As a long time Go developer, there’s plenty of things I take for granted about memory management and performance. Working on the v8go library has forced me to learn about Cgo and C++ in such a way that I can understand where the performance bottlenecks might be, how to experiment around them, and how to find ways to optimize for them. Specifically contributing the functionality of CPU profiling to the library reminded me that:

  1. I should benchmark code when performance is critical rather than just going with my (or another’s) gut. It absolutely takes time to flesh out a sufficient alternative code path to do fair benchmarking, but chances are there are discoveries made along the way. 
  2. Designing a benchmark matters. If the variables in the benchmark aren’t reflective of the average use case, then the benchmarks are unlikely to be useful and may even be confusing.

Thank you to Cat Cai, Oliver Fuerst, and Dylan Thacker-Smith for reviewing, clarifying, and generally just correcting me when I'm wrong.

About the Author:

Genevieve is a Staff Developer at Shopify, currently working on Oxygen.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

RubyConf 2021: The Talks You Might Have Missed

RubyConf 2021: The Talks You Might Have Missed

Shopify loves Ruby and opportunities to get together with other engineers who love Ruby to learn, share, and build relationships. In November, Rubyists from Shopify’s Ruby and Rails infrastructure teams gathered in Denver at RubyConf 2021 to immerse themselves in all things Ruby with a community of their peers. If you weren’t there or want to revisit the content, we’ve compiled a list of the talks from our engineers. 

A History of Compiling Ruby by Chris Seaton

Love Ruby compilers? Chris does.

“Why is it worth looking at Ruby compilers? Why is it worth looking at compilers at all? Well, I think compilers are fascinating. I’ve been working on them for a couple of decades. I think one of the great things about compilers, you can talk to anyone who’s a developer about compilers, because we all use compilers. Everyone’s got an opinion on how the languages should be designed. You can have conversations with anyone at every level about compilers, and compilers are just really fun. They may seem like a deeply technical topic, but they’re conceptually fairly simple. They take a file as input, they do something internally, and they produce a file as output.”

In this talk, Chris dives into the history of Ruby compilers, the similarities and differences, and what we can learn from them.

Learn more about Chris’ work on TruffleRuby:

Some Assembly Required by Aaron Patterson 

In typical Aaron style, this talk is filled with puns and humor while being educational and thought-provoking. Aaron shares why he wrote a JIT compiler for Ruby. Why did he write a JIT compiler? 

To see if he could.

“I wanted to see if I could build this thing. For me, programming is a really creative and fun endeavor. I love to program. And many times I’ll just write a project just to see if I can do it. And this is one of those cases. So, I think maybe people are asking, ‘does this thing actually work?’” 

Watch Aaron’s talk to find out if it does work and learn how to build a JIT compiler in pure Ruby. 

Learn more about TenderJIT on GitHub

Building a New JIT Compiler Inside CRuby by Maxime Chevalier Boisvert

In this talk, Maxime talks about YJIT, an open-source project led by a small team of developers at Shopify to incrementally build a new JIT compiler inside CRuby. She discusses the key advantages of YJIT, the approach the team is taking to implement YJIT, and early performance results.

“The objective is to produce speedups on real-world software. For us, real-world software means large web workloads, such as Ruby on Rails. The benefits of our approach is we’re highly compatible with all existing Ruby code and we’re able to support all of the latest Ruby features.”

Check out YJIT in Ruby 3.1!

Learn more about YJIT:

Gradual Typing in Ruby–A Three Year Retrospective by Ufuk Kayserilioglu and Alexandre Terrasa 

Ufuk and Alexandre share a retrospective of adopting Sorbet at Shopify, why you don’t have to go full-in on types out of the gate, and why gradual typing might be a great middle-ground for your team. They also share lessons learned from a business and technical perspective. 

“You shouldn’t be getting in the way of people doing work. If you want adoption to happen, you need to ramp up gently. We’re doing gradual type adoption. And because this is gradual-type adoption, it’s totally okay to start slow, to start at the lowest strictness levels, and to gradually turn it up as people are more comfortable and as you are more comfortable using the tools.”

Check out the following posts from Ufuk and Alexandre to learn more about static typing for Ruby and adopting Sorbet at scale at Shopify.

Building Native Extensions. This Could Take A While... by Mike Dalessio 

At RubyKaigi 2021, Mike did a deep dive into the techniques and toolchain used to build and ship native C extensions for Ruby. In his latest talk at RubyConf 2021, Mike expands upon the conversation to explore why Nokogiri evolved to use more complex techniques for compilation and installation over the years and touches upon human trust and security. 

“Nokogiri is web-scale now. Since January (2021), precompiled versions of Nokogiri have been downloaded 60 million times. It’s a really big number. If you do back of the envelope power calculations, assuming some things about your core, 2.75 megawatts over 10 months have been saved.”

Mike has provided companion material to the talk on GitHub.

Parsing Ruby by Kevin Newton

Kevin digs into the topic of Ruby parsers with a thorough deep dive into the technical details and tradeoffs of different tools and implementations. While parsing is a technically challenging topic, Kevin delivers a talk that speaks to junior and senior developers, so there’s something for everyone! 

“Parser generators are complicated technologies that use shift and reduce operations to build up syntax trees. Parser generators are difficult to maintain across implementations of languages. They’re not the most intuitive of technologies and it’s difficult to maintain upstream compatibility. It’s a good thing that Ruby is going to slow down on syntax and feature development because it’s going to give an opportunity for all the other Ruby implementations to catch up.”

Problem Solving Through Pair Programming by Emily Harber

We love pair programming at Shopify. In this talk, Emily explores why pair programming is a helpful tool for getting team members up to speed and writing high-quality code, allowing your team to move faster and build for the long term. Emily also provides actionable advice to get started to have more productive pairing sessions.

“Pair programming is something that should be utilized at all levels and not exclusively as a part of your onboarding or mentorship processes. Some of the biggest benefits of pairing carry through all stages of your career and through all phases of development work. Pairing is an extremely high fidelity way to build and share context with your colleagues and to keep your code under constant review and to combine the strengths of multiple developers on a single piece of a shared goal.”


Achieving Fast Method Metaprogramming: Lessons from MemoWise by Jemma Issroff

In this talk, Jemma and Jacob share the journey of developing MemoWise, Ruby’s most performant memoization gem. The presentation digs into benchmarking, unexpected object allocations, performance problems common to Ruby metaprogramming, and their experimentation to develop techniques to overcome these concerns.

“So we were really critically concerned with optimizing our performance as much as possible. And like any good scientist, we followed the scientific method to ensure this happens. So four steps: Observation, hypothesis, experiment, and analysis. Benchmarks are one of the best ways to measure performance and to an experiment that we can use over and over again to tell us exactly how performant our code is or isn’t.” 

Programming with Something by Tom Stuart

In this talk, Tom explores how to store executable code as data in Ruby and write different kinds of programs that process it. He also tries to make “fasterer” and “fastererer” words, but we’ll allow it because he shares a lot of great content.

“A simple idea like the SECD machine is the starting point for a journey of iterative improvement that lets us eventually build a language that’s efficient, expressive, and fast.”

If you are interested in exploring the code shown in Tom’s talk, it’s available on GitHub.

The Audacious Array by Ariel Caplan

Do you love Arrays? In this talk, Ariel explores the “powerful secrets” of Ruby arrays by using…cats! Join Ariel on a journey through his game, CatWalk, which he uses to discuss the basics of arrays, adding and removing elements, creating randomness, interpretation, arrays as sets, and more. 

“When we program, many of the problems that we solve fall into the same few categories. We often need to create constructs like a randomizer, a 2D representation of data like a map, some kind of search mechanism, or data structures like stacks and queues. We might need to take some data and use it to create some kind of report, And sometimes we even need to do operations that are similar to those we do on a mathematical set. It turns out, to do all of these things, and a whole lot more, all we need is a pair of square brackets. All we need is one of Ruby’s audacious arrays.” 

If you want to explore the code for Ariel’s “nonsensical” game, CatWalk, check it out on GitHub

Ruby Archaeology by Nick Schwaderer

In this talk, Nick “digs” into Ruby archeology to run old code and explore Ruby history and interesting gems from the past and shares insights into what works and what’s changed from these experiments.  

“So why should you become a Ruby archeologist? There are hundreds of millions, if not billions, of lines of valid code, open source for free, on the internet that you can access today. In the Ruby community today, sometimes it feels like we’re converging.”

Keeping Developers Happy With a Fast CI by Christian Bruckmayer

As a member of Shopify’s test infrastructure team, Christian ensures that the continuous integration (CI) systems are scalable, robust, and usable. In this talk, Christian shares techniques such as monitoring, test selection, timeouts, and the 80/20 rule to speed up test suites. 

“The reason we have a dedicated team is just the scale of Shopify. So the Rails core monolith has approximately 2.8 million lines of code, over a thousand engineers work on it, and in terms of testing we have 210,000 Ruby tests. If you execute them it would take around 40 hours. We run around 1,000 builds per day, which means we run around 100 million test runs per day. So that’s a lot.”

Read more about keeping development teams happy with fast CI on the blog.

Note: The first 1:40 of Christian’s talk has minor audio issues, but don’t bail on the talk because the audio clears up quickly, and it’s worth it!

Parallel Testing With Ractors–Putting CPU's to Work by Vinicius Stock

Vini talks about using Ractors to parallelize test execution, builds a test framework built on Ractors, compares current solutions, and discusses the advantages and limitations.

“Fundamentally, tests are just pieces of code that we want to organize and execute. It doesn’t matter if in Minitest they are test methods and in RSpec they are Ruby blocks, they’re just blocks of code that we want to run in an organized manner. It then becomes a matter of how fast we can do it in order to reduce the feedback loop for our developers. Then we start getting into strategies for parallelizing the execution of tests.”

Optimizing Ruby's Memory Layout by Peter Zhu & Matt Valentine-House

Peter and Matt discuss how their variable width allocation project can move system heap memory into Ruby heap memory, reducing system heap allocations, and providing finer control of the memory layout to optimize for performance.

“We’re confident about the stability of variable width allocation. Variable width allocation passes all tests on CI on Shopify’s Rails monolith, and we ran it for a small portion of production traffic of a Shopify service for a week, where it served over 500 million requests.”

Bonus: Meet Shopify's Ruby and Rails Infrastructure Team (AMA)

There were a LOT of engineers from the Ruby and Rails teams at Shopify at RubyConf 2021. Attendees had the opportunity to sit with them at a meet and greet session to ask questions about projects, working at Shopify, “Why Ruby?”, and more.

Jennie Lundrigan is a Senior Engineering Writer at Shopify. When she's not writing nerd words, she's probably saying hi to your dog.

We want your feedback! Take our reader survey and tell us what you're interested in reading about this year.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

How to Get an Engineering Internship at Shopify: A Complete Guide

How to Get an Engineering Internship at Shopify: A Complete Guide

An important component of being an engineer is getting hands-on experience in the real world, which internships can provide. This is especially true for engineering internships, which are critical for helping students develop real-life skills they can’t learn in the classroom. Sadly the internship market has suffered heavily since the pandemic with internship opportunities dropping by 52%, according to Glassdoor.

The silver lining is that many companies are now transitioning to incorporate virtual internships like we did. Whether you are a student, recent graduate, career switcher, bootcamp graduate, or another type of candidate, our virtual engineering internships are designed to kickstart your career and impact how entrepreneurs around the world do business.

What is it like to be a Shopify intern? During our latest intern satisfaction survey, 98% of respondents said they would recommend the program to friends. There are many opportunities to build your career, make an impact, and gain real-world experience at Shopify. But don’t just take our word for it! Keep reading to see what our interns had to say about their experience and learn how you can apply.

How to Get an Internship at Shopify

If you’re looking to jumpstart your personal growth and start an internship that can help lay the foundation for a successful career, we can help. Interning at Shopify allows you to work on real projects, solve hard problems, and gain practical feedback along the way. We provide the tools you need to succeed and trust you to take ownership and make great decisions. Here’s the steps to getting started.

Step 1: Review Available Opportunities

At Shopify, our engineering internships vary in length from three to eight months, with disciplines such as front-end and back-end development, infrastructure engineering, data engineering, mobile development, and more. Currently we run three intern application cycles a year. Applicants for the Fall 2022 cohort will be able to apply in May of 2022. Join our Shopify Early Talent Community, and we'll notify you. We also list available internships on our Early Careers page, these include a variety of three, four, and eight-month paid programs.

Step 2: Apply Online

Getting a Shopify engineering internship starts with an online application. We'll ask you for your resume, cover letter, contact information, education status, LinkedIn profile, and personal website. You will also be asked to complete an Intern Challenge to demonstrate your interest in the internship topic. This is a great place to show off your love for engineering. Perhaps you have your site using Ruby on Rails. We’d love to hear about it!

Step 3: Get Ready for the Skills Challenge

Depending on your specialization, you may be asked to submit a personal project like a GitHub link so that the recruiter can test your skills. Challenges differ by category, but you might be asked to design a Shopify store or to use a coding language like Python or Ruby on Rails to solve a problem. We want to see that you care about the subject, so be specific and put effort into your challenges to make your skills stand out.

Step 4: Prepare for the Interview Process

Shopify's interview process is divided into two phases. Our first stage allows us to get to know you better. Our conversation is called the Life Story, and it's a two-sided conversation that presents both your professional and personal experiences so far. Our second stage is used to assess your technical skills. A challenge will be presented to you, and you will be asked to propose a technical solution.

Top Skills for Engineering Interns

In a series of recent Twitter discussions from August and January, we asked about the most important skills for an engineering intern. More than 100 hiring managers, engineering professionals, and thought leaders responded. Here’s a summary of the skills they look for, along with how our very own interns have learned and applied them.

A visual representation of interns acting out the top skills: collaboration, lifelong learning, curiosity, GitHub experience, remote work experience, communication, interviewing, and accountability
Top skills for engineering interns


When you are working with a team, as most interns do, you need to be able to work together smoothly and effectively. Collaboration can encompass several characteristics, including communication, group brainstorming, emotional intelligence, and more. According to one follower on Twitter: “tech is a small part of software engineering, the valuable part is working well in teams.”

Our interns collaborate with talented people around the world. Emily Liu, a former intern and upcoming UX designer at Shopify, said the core team was spread out across five countries. The time differences didn’t hinder them from collaborating together to achieve a common goal. “Teamwork makes the dream work,” says Emily.

Lifelong Learning

Being a constant learner is one of Shopify's values and is considered a measure of success. As one Twitter follower pointed out, this is especially important in engineering since you should "always be willing to learn, to adapt, and to accept help" and that “even the most senior staff developer can learn something from an intern.”

This is echoed by former intern Andrea Herscovich, who says “if you are looking to intern at a company which values impact over everything, lifelong learning, and entrepreneurship, apply to Shopify!”


Without curiosity, an intern might become stagnant and not stay on top of the latest tools and technologies. Lack of curiosity can hinder an intern's career in engineering, where technological developments are rapid. In a response given by a hiring manager, curiosity is one of the key things he looks for among engineering interns, but says it's hard to find.

Andrea Herscovich also says she was encouraged to "be curious." This curiosity allowed her to build her own path for the internship. A particularly memorable project involved contributing to Polaris, Shopify's open-source design system, says Andrea. When working on adding a feature to a component in Polaris, Andrea learned how to develop for a more general audience.   

GitHub Experience

GitHub is an essential tool for collaborating with other developers in most engineering environments. As one Twitter user says: “I don't care if you got an A+ or C- in compilers; I'm going to look at your GitHub (or other public work) to see if you've been applying what you learned.” At Shopify, GitHub plays an important role in collaboration.

Using GitHub, former Shopify intern Kelly Ma says that her mentor provided a list of challenges instead of clearly-defined work for her to solve. During this time, Kelly had a chance to ask questions and learn more about the work of her team. As a result, she interacted with Shopifolk outside of her team and forged new relationships.

Remote Work Experience

A growing number of engineers are now working remotely. The trend is likely to continue well into the future due to COVID-19. As an intern, you will have the opportunity to gain experience working remotely, which can prepare you for the growing virtual workforce. Perhaps you're wondering if a remote internship can deliver the same experience as an in-person internship?

One former Shopify intern, Alex Montague, was anxious about how a remote internship would work. After completing the program, he told us, "working from home was pretty typical for a normal day at work" and that the tools he used made remote work easy, and he was "just as productive, if not more so, than if I was in the office." Alex is now a front-end developer on our App Developer Experience team, which provides insights and tools to help partners and merchants build and maintain apps.


Today, communication is one of the most important skills engineers can have—and one that they sometimes lack. As one Twitter follower puts it: "nothing in CS you learn will be more important than how to communicate with humans." Fortunately, as an intern, you get the chance to improve on these skills even before you enter the workforce.

Meeting over Google Hangouts, pair programming on Tuple, brainstorming together on Figma, communicating through Slack, and discussing on GitHub are just a few ways that Shopify interns communicate, says Alex Montague. Interns can take advantage of these opportunities to develop core communication skills such as visual communication, written communication, and nonverbal communication.


“Practice interviewing. This is a skill,” one Twitter follower advises. Interviewing well is key to a successful internship search, and it can set you apart from other candidates. At Shopify, the interview process is divided into two different phases. We begin with a Life Story to learn more about you, what motivates you, and how we can help you grow. Our later rounds delve into your technical skills.

As part of his preparation for his Life Story internship interview, Elio Hasrouni noted all the crucial events in his life that have shaped who he is today, starting from his childhood. Among other things, he mentioned his first job, his first coding experience, and what led him into Software Engineering. Elio is now a full-time developer within our Retail and Applications division, which helps power our omnichannel commerce.


Accountability involves taking responsibility for actions, decisions, and failures. For an engineering intern, accountability might mean accepting responsibility for your mistakes (and you’ll make plenty of them) and figuring out how to improve. Acknowledging your mistakes helps you demonstrate self-awareness that enables you to identify the problem, address it, and avoid repeating it.

How do you keep yourself accountable? Kelly Ma credits stretch goals, which are targets that are designed to be difficult to achieve, as a way to remain accountable. Other ways she keeps accountable include exploring new technological frontiers and taking on new challenges. One way that Shopify challenged her to be accountable was by asking her to own a project goal for an entire cycle (six weeks). This process included bringing stakeholders together via ad hoc meetings, updating GitHub issues to convey the state of the goal, and learning how to find the right context.

Tips to Help You Succeed

As you might expect, great internship opportunities like this are highly competitive. Applicants must stand out in order to increase their chances of being selected. In addition to the core skills discussed above, there are a few other things that can make you stand out from the crowd.

Practice Our Sample Intern Challenges

It is likely that we will ask you to participate in an Intern Challenge to showcase your skills and help us better understand your knowledge. To help you prepare, you can practice some of our current and previous intern challenges below.

Showcase Your Past Projects

You don’t need prior experience to apply as a Shopify intern, but if you compile all your previous projects relevant to the position you're applying for, your profile will certainly stand out. Your portfolio is the perfect place to show us what you can do. 

Research the Company

It's a good idea to familiarize yourself with Shopify before applying. Take the time to learn about our product, values, mission, vision, and find a connection with them. In order to achieve success, our goals and values should align with your own. 

Additional Resources

Want to learn more about Shopify's Engineering intern program? Check out these posts:

Want to learn more about the projects our interns work on? Check out these posts:

About the Author:

Nathan Quarrie is a Digital Marketing Lead at Shopify based in Toronto, Ontario. Before joining Shopify, he worked in the ed-tech industry where he developed content on topics such as Software Engineering, UX & UI, Cloud Computing, Data Analysis, and Web Development. His content and articles have been published by more than 30 universities including Columbia, Berkeley, Northwestern, and University of Toronto.

If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by default.

Continue reading

Start your free 14-day trial of Shopify