Mohammed Ridwanul Islam: How Mentorship, the T Model and a Pen Are the Keys to His Success

Mohammed Ridwanul Islam: How Mentorship, the T Model and a Pen Are the Keys to His Success

Mohammed Ridwanul Islam: How Mentorship, the T Model and a Pen Are the Keys to His Success
Mohammed’s feature is part of our series called Behind The Code, where we share the stories of our employees and how they’re solving meaningful problems at Shopify and beyond.

Mohammed Ridwanul is a software engineer on the Eventscale team and joined Shopify a year and a half ago.

Mohammed grew up in Dubai but was born in Noakhali, a small village in Bangladesh before moving when he was five. The village was far-removed from technology — most of the areas had no electricity, and you could count the number of TVs with one hand. The people of Noakhali were extremely practical and had ingenious solutions to the problems that would arise. Adults who had an engineering education or background were highly-regarded for how they improved the quality of life in the village. This inspired and motivated Mohammed to pursue a career in engineering, and he hopes eventually, to impact communities the way those individuals did to his.

What has your career path looked like?
I’ve had the opportunity to work in different industries including sales, advertising, and design. Also, I’m an avid musician and love making my own music and doing shows with my band. With all these different skills, I thought perhaps I could make my own game. While trying to learn everything I could about game development, I wrote my first line of code which was in C#.

All my experiences have one thing in common; I love to face tough challenges and see a rapid manifestation of the things I do or build. So, I studied engineering and got an internship working at Shopify during my undergrad which turned into my current full-time role.

What type of Engineering did you study?
I went to the University of Waterloo and took a Bachelor of Applied Sciences in Electrical Engineering.

What does your team do at Shopify?
The Eventscale team is part of the Data Platform Engineering organization. Shopify receives an immense amount of data. Acquiring such large amounts of data so that we can clean, process, reliably store, and provide easy access for analysis, requires highly performant specialized tools and infrastructure. The Data Platform Engineering team are responsible for building these tools.

The Eventscale team builds the tools, libraries, and infrastructure to collect event-oriented streaming data. This data is used for both internal and merchant analytics and other operational needs. We build for all platforms at Shopify including web, backend, and mobile.

What was something difficult for you to learn, and how did you go about acquiring it?
During my first time leading a team project, I had some challenges learning useful team management principles. Like understanding the needs of each team member, aligning everyone to a shared vision and goal to get the work done, required a different set of skills which took time and experience to learn. Luckily my senior co-workers consistently mentored me and taught me concepts such as project cost estimates, team management strategies, success metrics, and other fundamental project management principles. My team lead also guided me towards several books and whitepapers from other companies which have helped me develop strong opinions related to project management and strategy. Check out my Goodreads profile for a list of those books and read Ben Thomson's work on Stratechery.com.

How does your daily routine help you cultivate a good work ethic?
Mohammed Ridwanul Islam's Daily Routine
Habits, in my opinion, are useful in navigating life. I believe humans are creatures of habits; it’s challenging to have a constant cognitive load to tell yourself to do x, y and z tasks that are good for you. Instead, by building a habit, you reduce the load as your body and mind start to realize that this is a way of life. My daily routine helped me achieve this habit formation.

What’s your favorite dev tool?
VIM. It has a learning curve, but you can have so much fun with it once you learn it. VIM is an editor you can mold into your own little product; personalized for you with custom configurations using dotfiles. You can pretty much make it behave however you want. I love it! If you’re interested, feel free to check out my custom VIM settings.

What’s your favorite language and why?
Java, mostly because it’s a strongly typed language, and to this day I prefer explicitly defining types without having the language make assumptions on types.

Are you working on any side projects?
Yes, I’m working on an enterprise project management software that can be used by a consulting team to manage a large number of projects in parallel. Essentially, it’s a centralized repository for all the current projects that the consultants are handling, along with the cost breakdown and timeline details. Also, it allows the user to dig into each project further and keep records of how human resources are applied. The software tries to enforce a framework of thinking about resource management and project strategy which I have developed over the years.

What are some ways you think through challenging work?
Writing things down on paper has been my go-to method to work through challenging things. I don’t start writing code until I’ve designed the overall larger components on paper. Similarly, for any other situations in life, writing has always helped me tackle challenges.

What book(s) are you currently reading?
Designing Data-Intensive Applications by Martin Kleppmann and The Essential Rumi by Rumi.

What is the best career advice you’ve gotten?
It doesn’t matter what you do as long as it meets two criteria: 1. It positively impacts society and is aligned with your values, and 2. It allows you to push and grow yourself by doing work to the best of your abilities.

What kind of advice would you give to someone trying to break into the technology industry?
I’m a big fan of the “T” model of learning, which essentially states that you should try and be competent in a few different things (small horizontal line), but you should strive to be the authoritative figure for at least one thing (longer vertical line). Programming might be the tool used to solve tough engineering problems, but the ability to solve problems is the more critical skill. So focus on chiseling that ability which comes with exposure and specialization in one specific area.

If you’d like to get in touch with Mohammed, check out his website www.mohammedri.com.

We’re hiring! If you’re interested in joining our team, check out our Engineering career page for available opportunities.

Continue reading

Dev Degree - A Big Bet on Software Education

Dev Degree - A Big Bet on Software Education

“Tell me and I forget, teach me and I may remember, involve me and I learn.”
- Benjamin Franklin


When I decided to study computer science at university, my parents were skeptical. They didn’t know anyone who had chosen this as a career. Computer science was, and still is, in its infancy. Software development isn’t pure science or pure engineering — it’s a combination of the two, mixed with a remarkable amount of artistic flare. It's a profession where you grow by learning the theory and then doing. A lot of doing. It’s a profession that’s increasingly in demand. And it’s a profession so new that schools are still learning how to teach it. The supply isn’t matching the demand; not even close.

Our industry is fraught with critical shortages of skills and diversity — software developers are more valuable to companies than money [1]. It’s pretty obvious, we have to aggressively invest in growing and developing software professionals more than ever.

Shopify has figured out an important part of how to solve these problems. We call it Dev Degree — a work-integrated learning (WIL) program that combines an accredited university degree with double the experience of a traditional co-op. The program is already in its 3rd year, and it’s time to talk about why it’s a big deal to us.

The Beginnings of Dev Degree

While living and working in Australia, my company invested in hiring hundreds of graduate developers. The graduates were intelligent and knew their theory, but they lacked the fundamental skills and experience required for software development. This held them back in making quick impacts to our small but growing company.

To fill in the gaps, we developed an internal training program for new graduates. It helped them level up faster and transitioned best practices they learned in school into practical skills for the world of software development. It wasn’t long before I recognized that this knowledge gap wasn't an isolated incident. There wasn’t just one university churning out students ill-prepared for the workforce, it was a systemic issue.

I decided to tour Australian universities and talk to their Computer Science departments. I pitched the idea of adding pieces of our training program to their curriculum to better prepare students for their careers. My company even offered to pay to develop the program. The universities loved the idea, but they didn't know how to make it a reality within their academic frameworks. I saw many nods of agreement on that tour, but no action.

Dev Degree started, in earnest, when I returned to Canada and joined Shopify. The main lesson I learned from Australia was that universities couldn’t implement a WIL curriculum without industry partners in a true long-term arrangement. Shopify seemed born to step into that role. When I approached Tobi with this embryo of an idea, he was on board to make it a reality. Tobi had his own positive experience with apprenticeships in Germany. Our shared passion for software development and Canada motivated us to give this idea another shot, and we started searching for a university partner.

Canadian universities were eager to get involved, but again, most weren’t sure how to make it happen. For many, the question was: how is this different from our co-op program?

The co-op model is straightforward. Students alternate between a school term and a work term throughout their program. In this structure, students are thrown over the wall of academia into an industry with no connection to their curriculum. WIL, on the other hand, requires a structural change to the education system that creates a fully integrated and deep learning experience for the students. To do this properly, we needed to make changes to the curriculum and assessments, fully integrate universities and companies, launch new learning programs, and provide additional student support. This was a multi-dimensional problem.

Carleton University rose to the challenge, becoming the first and founding university partner of Dev Degree. Their team understood the value of WIL and were already exploring ways to incorporate this style of learning when we met. It was clear to both sides that we had found the perfect partner to make WIL a successful reality. We were both eager to innovate and weren’t afraid to make huge structural changes to our operations.

Carleton didn’t just talk about being involved, they developed an entirely separate stream of their Bachelor of Computer Science program that allocated over 20% of credits to student practicums. This required Carleton’s Senate approval, which was granted after thoughtful debate. Our first strong partnership was formed and we were ready to get started.

Inside Dev Degree

The Dev Degree FamilyThe Dev Degree Family


The core of the Dev Degree model is building tighter feedback loops between theory and practice while layering programming and personal growth skills early on. Each semester students take 3 courses at University and spend 25 hours a week at Shopify.

Because K-12 software education is lacking, we wanted to turbo-boost students to be able to write and deploy production software, solving real problems, before they even graduate. Our bet was that this model would better engage a more diverse set of students, empower deeper understanding, and foster more critical thought when building software.

Dev Degree - Hand-On Learning

These types of challenges are not part of the university curriculum — students can only get this experience in an industry setting. Thomas Edison said innovation is 1% inspiration and 99% perspiration. By that measure, Dev Degree is a real-time training program in experimental perspiration.

But there’s also a strong link to validating that competencies are acquired. The partner university allocates at least 20% of the degrees credits for their work done with Shopify development teams. Students write a practicum report at the end of every term (every four months) and submit the practicum report to the university. In the practicum, the student describes how they have achieved specific learning outcomes. The learning outcomes used in the Dev Degree program were influenced by standards from the Association for Computing Machinery (ACM) and the IEEE Computer Society.

During the first two years, we learned a lot. It wasn’t a smooth ride as we ironed out how best to deliver this program with the University, Students, and teams in Shopify. Here are some of the most important lessons we’ve learned.

Key Lesson #1: Re-Learn True Collaboration

During our school career, we learn that the final mark is most important. We strive to deliver the perfect assignment to get that A+. This is the complete opposite of how to get good results in the real world. The best students, and the most successful people, are the ones who share their ideas early, get feedback, experiment, explore, re-compose, and iterate. They embrace failure and keep trying.

The end result is important, but you have to cheat to get the best version of it. Sounds counterintuitive, I know. But by “cheating,” I mean asking people for help and incorporating the lessons they teach you into your own work. Collaboration is a prerequisite for true learning and growth. The Lone Wolf mentality instilled in students from years of schooling is more difficult to change than we anticipated, but working directly alongside other developers, pairing regularly, allowed us to break down those habits over time.

Key Lesson #2: Start with Development Skills

Our first cohort joined Shopify after three months of Developer Skills Training, based on the ACM framework I mentioned. This was quite ambitious on our end, but we hoped it was enough time to prepare them for the real-world work they would do with our teams.

It wasn’t. After the three months, our students still didn’t have enough knowledge to make a strong impact at Shopify. To better support them, our Dev Degree team hosted additional workshops on various developer tools and technologies to get them up to speed, but we knew there was more to be done.

It was clear that we needed to pivot the first year of our program to focus more heavily on Developer Skills Training. Our students needed to be better prepared to enter a fast-paced team building impactful products. Now, Dev Degree students participate in Developer Skills Training for their entire first year at Shopify. By tripling the amount of time they spend in training, we’ve seen Dev Degree students create earlier and more positive impacts on Shopify teams.

Key Lesson #3: Mentorship Comes in Many Forms

In 2016, students were paired with technical mentors once they joined a development team. The technical mentor is a software developer who guides their mentee on a daily basis by giving direction, reviewing work, offering feedback, and answering questions. While this was successful, we identified a gap where we weren’t equipping students with the tools and support they needed to transition into the workforce. We were giving them tons of technical support, but that didn’t necessarily help them conquer the social aspects of the job.

Now, Dev Degree students receive an additional layer of mentorship. Each student is paired with two people: a technical mentor and a Life@Shopify mentor. The Life@Shopify mentor is a trusted supporter, friend, and guide who provides a listening ear and supports the student’s growth. It’s a big leap to go from high school to being a trusted member of a company. We’ve found that this combination provides students with a diverse range of support throughout their time at Shopify.

The Results

To put it bluntly, the Dev Degree model works.

We see above average retention rates compared to traditional academia. Generally, 20-50% of students dropout of their initial program or from postsecondary programs completely. In Dev Degree, our retention rate is 95%. We’ve increased gender diversity in the program, with women accounting for over 50% of Shopify Dev Degree students — a dramatic rise from the 19% of women graduating with a computer science degree.

Companies have been focusing 66% of their philanthropic tech education on K-12 programs, with only 3% on post-secondary programs. But we need to look at the entire education system to solve the skills shortage and lack of diversity in STEM programs. And it needs to happen faster.

Traditionally, new graduates hired at Shopify take anywhere from six months to two years to fully complete onboarding and start making an impact on development teams. Skill acquisition in our WIL program happens three times faster than the average developer education: Dev Degree students become productive members of their teams after only nine months into the program, instead of up to two years after graduation.

We have a lot more to learn, and we’re not done yet. While we’re excited by our early results, a true measure of success will be seeing more universities and industry partners adopt this model. We’re working to scale the program with our partners so that the Dev Degree model starts popping up all over Canada.

That’s why we’re excited to announce the expansion of our Dev Degree program to York University’s Lassonde School of Engineering! Our first Toronto-based students have started their journey with Dev Degree, and we’re excited to see what challenging problems they’ll solve.

None of this would be possible without our academic partners at Carleton and York who worked relentlessly to get Senate approval for new WIL computer science streams and design the model itself. We truly believe that if more universities worked hand-in-hand with industry to better prepare students for the workforce, Canada would become the leader in talent development for years to come.

Continue reading

Introducing the Deprecation Toolkit

Introducing the Deprecation Toolkit

Shopify is happy to announce that we’ve open sourced the Deprecation Toolkit, a ruby gem that keeps track of deprecations in your codebase in an efficient way.


At Shopify, the leading cloud-based, multi-channel commerce platform with 600,000+ merchants in over 175 countries, upgrading our dependencies is a frequently applied best practice. We even have bots that automatically upgrade dependencies when a minor version is released.

However, more complex upgrades require human intervention and the time required varies from dependency to dependency, some even taking years. We realized that we could speed up this process if our application were using as little deprecated code as possible.

The motivation for building the Deprecation Toolkit came after a few unsuccessful attempts to prevent the hundreds of developers working on our monolith from accidentally using deprecated code in libraries, but also in our codebase.

Why Should You Use This Gem and How Can It Help?

Did I just called a new deprecated method? “Did I just call a new deprecated method?“ 🤔

If you are the creator/maintainer of a library or if you’d like to deprecate methods in your application, you have couple options to notify consumers of your code about a future API change. The most common option is to output a warning message on the standard output explaining the change happening in the next release.

This approach has a major caveat: it doesn’t prevent developers from using the deprecated code by accident. The only warning is the deprecation message, which is very easy to miss and becomes impossible to spot if there is already a lot of them.

The second option is to provide a callback mechanism whenever a deprecation is triggered. If you are familiar with Ruby on Rails or Active Support you might have heard about the ActiveSupport::Deprecation module which allows you to configure the behavior of your choice that gets called whenever a deprecation is triggered. Active Support provides few behavior options by default, the two most common ones are log or raise.


Raising an error when deprecated code is triggered looked like a solution, but it would mean we’d have to fix every single deprecation before activating the configuration; otherwise, our CI wouldn’t pass and that would block developers from doing their daily tasks. We needed a different way to solve this problem that didn’t require fixing all deprecations at once and treat existing deprecations as “acceptable” allowing us time to fix those gradually. New deprecations, however, should be handled differently and be the one that raises errors. This is the approach we took with the Deprecation Toolkit.

Internally, we called this process the “Shitlist-driven development.” My colleague Flo gave an amazing talk at the Red Dot Ruby Conference in 2017 you can view called "Shitlist-driven development and other tricks for working on large codebases."

How Does It Work?

Introducing the Deprecation Toolkit

The Deprecation Toolkit uses a whitelist approach. First, you need to record all existing deprecations in your application by running your test suites, either locally or on CI. The toolkit writes each deprecation that gets triggered for a given test inside YAML files. These YAML files will consist of your whitelist of acceptable deprecations.

The next time your tests run, the toolkit will compare all the deprecations that got triggered in the test run against the ones marked as acceptable. If a mismatch is found it either means a deprecation was introduced or removed, either way, the Deprecation Toolkit will trigger the behavior of your choice, but by default, it’ll raise an error.

The toolkit has many configuration options, however, if the default configuration suits your needs, all you need to do is add the gem in your Gemfile. The Deprecation Toolkit README has a detailed configuration reference to help you setup the toolkit in the way you need. You can, for example, configure the toolkit to ignore some deprecations, dynamically determine where deprecations should be recorded, or even create custom behaviors when new deprecations are introduced.

Deprecation Toolkit in ActionDeprecation Toolkit in Action

Keeping your system free of deprecations is part of having a sane codebase, whether that's fixing deprecations from libraries or your codebase. We’ve used the Deprecation Toolkit in our core application for about a year now. It helped us to reduce the number of deprecations in our system significantly and contributed towards speeding up our dependencies upgrade process. It’s instrumental in making every developer involved in fixing deprecations as Pull Requests can’t be merged if the code is introducing new deprecations.

Last but not least, we gamified fixing existing deprecations amongst developers. All deprecations were grouped by component and assigned an owner, usually a team lead, to help fix them. Over time, we counted the failures and progression of each team. All participating teams viewed their results in a shared Google sheet. Splitting the deprecated code into chunks and assigning each one to a different owner made the process super smooth and even faster.

Give the Deprecation Toolkit a try; we are looking forward to hearing if it helped you and how we can improve it! If the current workflow doesn’t work for you or if you’d like to see a new feature in this gem, feel free to open an issue in our issue tracker.

Continue reading

Mobile Tophatting at Shopify

Mobile Tophatting at Shopify

At Shopify, the leading cloud-based, multi-channel commerce platform for 600,000+ merchants in over 175 countries, it’s crucial to test and verify the functionality of the new features that get introduced in the platform. Since the company doesn’t have a QA team by design, testing features is the developer's responsibility. To do so, we set up a project to contain automated test steps which execute via our continuous integration infrastructure (CI) and additional manual checks are performed by developers.


One of those manual checks is trying out the changes before merging them into the codebase. We call this process “tophatting” after the 🎩 emoji. Back when Github didn’t have support for code review requests, Shopify relied on emojis to easily communicate the state of the code review process. 🎩 indicates that the reviewer not only looked at the code but also ran it locally to make sure everything works as expected, especially when the changes affected the user interface.

The tophat process requires the developer to save their current work, checkout a different git branch, set up their local environment for that branch and build the app. For mobile developers, this process is tedious because changing the git branch often invalidates the cache, increasing the build time in Xcode and Android Studio. Depending on the project, it can take up to 15 minutes to build the app, during which developers can’t do any other work in the same project.

To eliminate their pain points and facilitate best practices, we’ve created a fast and frictionless tophatting process which integrates seamlessly with our CI infrastructure and dev, Shopify's all-purpose development tool that all mobile developers have running in their environments. In this post, I’ll describe how we built our frictionless tophatting process and show you an example of what it looks like.

Setting up Projects for Tophat

The slowest part of the mobile tophatting process is compilation. To speed this up for mobile developers we skipped the compilation step. We already build the apps on CI, so the application binaries are available in the disposable environments we created for running the PR builds. We updated the projects pipeline to export the binaries so that we can list and access them through the CI API. Depending on the platform (iOS or Android) the exported app has a different format:

  • iOS: Apps are folders and we zip the folder using a naming convention that includes the name of the app and its version. For example, an exported Shopify app version 3.2.1 would be named Shopify-3.2.1.app.zip
  • Android: APK files are zip archives, so we export them with its existing name. 

Once the apps are exported we leverage GitHub commit statuses to let developers know that their PRs have tophattable builds:

Tophat Github Commit Status

Command line interface

Dev is an internal tool that provides a set of standard commands across all the projects at the company (you can read more about it on devproductivity.io). One of the commands that backend developers use is tophat and we extended its use to support mobile projects.

The command looks like:

dev platform tophat resource


Where platform can be either ios or android and the resource can be any of the following:

  • Pull request URL: For tophatting other developer’s work
  • Repository URL: For tophatting the main branch of a repository
  • Repository branch URL: For tophatting a specific branch
  • Build URL: For tophatting a specific build from CI

For example, if a developer would like to tophat the pull request 35 of the project android, they could run the command:

dev android tophat https://github.com/shopify/android/pulls/35

Under the Hood

When the tophat command is run, the following steps are executed:

  1. The user is authenticated on the Buildkite and GitHub API if they aren’t already authenticated. The access token is persisted in the macOS keychain to be reused in future API calls.
  2. If the given resource is a GitHub URL, we use commit statuses to get the URL of the build.
  3. Since the list of artifacts might contain resources that can’t go through tophatting, we filter them out and only show the valid ones. If there’s more than one in the repository, the developer can select which app they’d like to tophat.
  4. After selecting the app:
    1. For iOS projects, we list the system simulators and boot into the one the user selects. Most times, developers tend to use the same simulator so the command remembers and suggests it as default
    2. For Android projects, we list the emulators available in the environment and a few more default ones in case the developer doesn’t have any emulators configured locally yet.
  5. Once the simulator is booted, we install the app and launch it. 

The example below shows the process of tophatting Shopify Mobile for iOS:

An example of the mobile tophatting process

Future Improvements

We’re thrilled with the response received from our mobile developers; they love the feature. Since we launched, Shopifolks enthusiastically submitted bug reports and proposals with many ideas about how we can keep improving the tophatting process. Some of the improvements we’re currently incorporating are:

  • Caching: Every time we start the tophat process, we pull artifacts from Buildkite, even if we already tophatted the build. Adding a local cache will prevent downloading the artifact again and copy it from the cache instead.
  • Real devices: Developers usually try the apps on real devices and we’d like to facilitate this. For iOS, the builds need to be signed with a valid certificate that allows installing the app on the testing devices.
  • Tophat from commit and branch: Rather than passing the whole GitHub URL we simplify the input by letting developers specify the repository and the branch/commit they’d like to tophat.

Testing someone else’s work is now easier than ever. Our developers don’t need to know how to set up the environment or compile the apps they are tophatting. They can run a single command and the tool does the rest. The Mobile Tooling team is committed to gathering feedback and working with our mobile developers to add improvements and bug fixes that facilitate their workflows.

Continue reading

Shaping the Future of Payments in the Browser

Shaping the Future of Payments in the Browser

Part 1: Setting up Our Experiment with the Payment Request API

By Anna Gyergyai and Krystian Czesak

At Shopify, the leading multi-channel commerce platform that powers over 600,000 businesses in approximately 175 countries, we aim at making commerce better for everyone. This sometimes means investing in new technologies and giving back what we learned to the community, especially if it’s a technology we think will drastically change the status quo. To that end, we joined the World Wide Web Consortium's (W3C) Web Payments Working Group in 2016 to take part in shaping the future of native browser payments. Since then, we’ve engaged in opinionated discussions and participated in a few hack-a-thons (Interledger Payment App as an example) as a result of this collaborative and innovative working environment.

The W3C aims to develop protocols and guidelines that ensure the long-term growth of the Web. The Web Payments Working Group’s goal is to make payments easier and more secure on the Web. The first specification they introduced was Payment Request: a javascript API that replaces traditional checkout forms and vastly simplifies the checkout experience for users. The first iteration of this specification was recently finalized and integrated into a few browsers, most notably Chrome.

Despite being in Candidate Recommendation, Payment Request’s adoption by platforms and developers alike is still in the early stages. We found this to be a perfect opportunity to test it out and explore this new technology. The benefits of such an initiative are threefold. We gather data that helps the W3C and browser vendors grow this technology, continue to contribute to the working group, and encourage participation through further experimentation.

Defining the Project

To present detailed findings to the community, we first needed a properly formulated hypothesis. We wanted to have at least one qualitative and one quantitative success metric, and we came up with the following:

We believe that Payment Request is market ready for all users of our platform (of currently supported browsers). We’ll know this to be true when we see that the checkout completion rate for select merchants remains unchanged or gets better, and the purchase experience is better and faster.

This was our main driving success metric. We define checkout completion rate (CCR) as the number of people that completed a purchase vs the total number of people that demonstrated an intent to purchase. An intent to purchase is indicated by buyers who clicked the “checkout” button on the cart page. In addition, we monitored time to completion of the purchase and drop-off rates.

For our qualitative metric, we spent time comparing Payment Request’s checkout experience with Shopify’s existing purchase experience. This metric was mostly driven by user experience research and was less of a data-driven comparison. We’ll cover this in a follow-up post.

We set off to launch an A/B experiment with a group of select merchants that showed interest in the potential this technology had to offer. We built this experiment outside of our core platform because a few key benefits allowed us to:

  • Iterate fast and in isolation
  • Leverage our own platform’s existing APIs
  • Release the app to our app marketplace for everyone to use, if valuable

Payment Request API Terminology

The Payment Request API has interesting value propositions: it surfaces more than one payment method, it’s agnostic of the payment method used, and it can indicate back upstream if the buyer is able to proceed with a purchase or not. This last feature is referenced as the canMakePayment() method call, which returns a boolean value indicating that the browser supports any of the desired payment methods indicated by the merchant.

Most browsers that implement Payment Request allow processing credit card payments through it (this payment method is referenced as basic-card in the specification). At the time of writing, basic-card was the only payment method widely implemented in browsers, and as a result, we ran our experiment with credit card transactions in mind only.

In the case of basic-card, canMakePayment() would return true if the end user already had a credit card provisioned. As an example on Chrome, the method returning true would mean that the user had already a credit card on file in their browser either through one of Chrome’s services, autofill or from having already gone once through the Payment Request experience.

Payment Request demo on Chrome Android
Payment Request demo on Chrome Android

Finally, the UI presented to the buyer during their purchase journey through Payment Request is called the payment sheet. Its implementation depends on the browser vendor, which means that the experience might differ from one browser to another. As seen in the demo above, it usually contains the buyer’s contact information, shipping address and payment method. Once the shipping address is selected, the buyer is allowed to select their shipping method (if applicable).

Defining our A/B Experiment

Our A/B experiment ran on select merchants and tested buyer behaviour. The conditions of the experiment are as follows:

Merchant Qualification

Merchant Qualification

Since most Payment Request implementations in browsers only support the basic-card payment method, we were limited to merchants who accept direct credit card payments as their primary method of payment. With this limitation, one of the primary merchant qualifications was the use of a credit card based payment processor.

Audience Eligibility

Our experiment audience is buyers. A buyer is eligible to be part of the experiment if their browser supports Payment Request. At the time of writing, Payment Request is available with the latest Chrome (on all devices), Microsoft Edge on desktop and Samsung Browser (available on Samsung mobile devices). We were only able to gather experiment data on Chrome. We experienced minimal browser traffic through Samsung Browser, and Microsoft Edge's Payment Request implementation only supports North American credit cards.

Experiment Segmentation

From the qualified buyers, when they clicked the “checkout” button on the cart page, 50% of them are placed in a control group and the other 50% in an experiment group. The control group are buyers that won’t see the payment sheet and continue through our regular checkout. The buyers that go through the Payment Request purchase experience and see the payment sheet are the experiment group.

Payment Request Platform Integration

In order to build our experiment in an isolated manner, we leveraged our current app ecosystem. The experiment ran in a simple ruby app that uses our existing rails engine for Shopify Apps. We used our existing infrastructure to quickly deploy to Google Cloud (more on our move to the cloud here). In conjunction with our existing ShipIt deployment tool, we were able to setup a pipeline in a matter of minutes, making deployment a breeze.

After setting up our continuous delivery, we then shifted our focus towards the app lifecycle, which can be better explained in 2 phases: merchant facing app installation and the buyer’s storefront experience.

App Installation

The installation process is pretty straightforward: once the merchant gives permission to run the experiment on their storefront, we then install our app in their backend. Upon installation, our app injects a script tag on the merchant’s storefront. This javascript file contains our experiment logic and would run for every buyer visiting that merchant’s shop.

Storefront Experience

The buyer’s storefront experience is split into two processes: binding the experiment logic and surfacing the right purchase experience.

Storefront Experience - Binding the Experiment LogicBinding the experiment logic

Every time a buyer visits the cart page, our front-end logic first determines if the user is eligible for our experiment. If so, the javascript code pings our app backend, which in turn gathers the shop’s information through our REST Admin API. This ping determines if the shop still has a credit card based processor and if the merchant supports discount codes or gift cards. This information determines the shop’s eligibility for the experiment and displays the proper alternative flow if gift cards or discount codes are accepted. When both the buyer and the merchant are eligible for the experiment, we override the “checkout” button on the cart page. We usually discourage this practice, as it can cause the checkout experience to be adversely affected. For our purposes, we allowed it for the duration of the experiment only.

Surfacing the Purchase ExperienceSurfacing the purchase experience

Upon clicking the Checkout button, buyers in our control group would get redirected to Shopify’s existing web checkout. Buyers in our experiment group would enter the Payment Request experimental flow via the Payment sheet, and the javascript would interact with Shopify’s Checkout API to complete a payment.

Alternative Payment Flows

Since the majority of merchants on the Shopify platform accept discount codes and gift cards as part of their purchase flow, it was important to not negatively impact the merchants’ business during this experiment due to the Payment Request API not supporting discount code entry.

Shopify only supports this feature on the regular checkout flow, and implementing this feature on the cart page prior to checkout would involve a non-trivial effort. Therefore, we needed to provide an ability for buyers to opt out of the experiment if they wanted to provide a discount code. We included a link under the checkout button that read: “Discount or gift card?”. Clicking this link would redirect the buyer to our normal checkout flow, where they could use those items, and they would never see the payment sheet.

Finally, if the buyer cancelled the payment sheet purchase flow or an exception occurred, we’d show a link under the checkout button that reads: “Continue with regular checkout”.

What’s Next

The Payment Request API can provide a better purchase experience by eliminating checkout forms. Shopify is extremely interested in this technology and ran an experiment to see if Payment Request was market ready. Now that we've talked about how the experiment was set up, we’re excited to share experiment data points and lessons in the second part of Shaping the Future of Payments in the Browser. It will include breakdowns in time to completion times, user flow learnings in buyer interactions and Payment Request’s overall effect on the purchase experience (both quantitative and qualitative).

Part 2: The Results of Our Experiment with Payment Request API

In Part 1, we dove into how we ran an experiment in order to test the readiness of Payment Request. The goal was to invest in this new technology and share what we learned back to the W3C and browser vendors, in order to improve web payments as a whole. Regardless of the conclusion of the experiment on our platform, we continue to invest in the current and future web payments specifications.

As a reminder, our hypothesis was as follows:

We believe that Payment Request is market ready for all users on our platform (of currently supported browsers). We’ll know this to be true when we see that the checkout completion rate for select merchants remains unchanged or gets better, and the purchase experience is better and faster.

We define checkout completion rate (CCR) as the number of people that completed a purchase vs the total number of people that demonstrated an intent to purchase. An intent to purchase is indicated by buyers who clicked the “checkout” button on the cart page.

In this post, we investigate and analyze the data gathered during the experiment, including checkout completion rates, checkout completion times, and drop-off rates. This data provides insight on future Payment Request features, UX guidelines, and buyer behaviour.

Data Insights

We ran our experiment for over 2 months with 30 merchants participating. At its peak, there were around 15,000 payment sheet opens per week. The sample size allowed us to have high confidence in our data and our standard error is ±1%.

Time to Completion

Time to Completion

Form Factor

canMakePayment()

10th percentile Median time 90th percentile
Desktop true 0:54 2:16 6:23
Desktop false 1:33 3:13 7:57
Mobile true 0:56 2:35 6:29
Mobile false 1:35 3:22 8:08

Time to completion by device form factor

The time to completion is defined as the time between when the buyer clicks the “checkout” button until their purchase is completed (i.e. they’re on the order status page). The value of canMakePayment() determines if the buyer has a credit card provisioned or not. As an example on Chrome, the method returning true would mean that the buyer had already a credit card on file in their browser; either through one of Chrome’s services, autofill, or from having already gone once through the Payment Request experience.

The median time for buyers with canMakePayment() = false is 3:17 whereas the median time for buyers with canMakePayment() = true is 2:25. This is promising, as both medians are faster than our standard checkout. We can also take a look at the 10th percentile with canMakePayment() = true and see that the checkout completion times are under a minute.

Checkout Completion Rates

As mentioned previously, we define checkout completion rate (CCR) as the number of people that completed a purchase vs the total number of people that demonstrated an intent to purchase. Comparing the control group to the experiment group, we saw a average 7% drop of CCR (with a standard error of ±1%), regardless of canMakePayment().

It is important to put this 7% into perspective. The Payment Request API is still in its infancy: the purchase experience it’s leveraging (through the payment sheet) is something buyers are still getting accustomed to. A CCR drop in the context of our experiment is to be expected, as buyers on our platform are familiar with a very specific and tailored process.

Our experiment did not adversely affect the merchants overall CCR, being that it only ran on a very small subset of buyer traffic. Looking at all eligible merchants, the experiment represented roughly 5% of their traffic, as seen in the following graph:

Overall experiment traffic relative to normal site traffic

We started by slowly ramping up the experiment to select eligible merchants. This explains the low traffic percentage at the beginning of the graph above.

User Flow Analysis

The graph below documents the buyer’s journey through the payment sheet by listing all possible events, in the order they occurred during the purchase session. An event is a user interaction like the user clicking the checkout button or selecting a shipping method. All the possible events can be seen on the right side of the graph below. Not shown on the graph, is that 10% of buyers prefer clicking the provided “Discount or gift card?” link rather than on the “checkout” button, before entering into the experiment.

The ideal user flow for the experiment is:

  1. The buyer clicks the “checkout” button
  2. The payment sheet opens
  3. The buyer selects a shipping address
  4. The buyer selects a shipping method
  5. The buyer clicks “pay”
  6. The payment is successful

The number at the top of the bars indicate the percentage of events that occurred at that step relative to step 1. For example, by step 6, a total of 43% of events were emitted compared to step 1.

Payment sheet event breakdown by order of occurrence
Payment sheet event breakdown by order of occurrence

Here are some ways the user flows break down:

  • [Step #1 to Step #2] Not all buyers who click the button will see the payment sheet. This is due to the various conflicting Javascript on the merchant’s storefront, leading to exceptions
  • [Step #3] Upon seeing the payment sheet, 60% of buyers will drop out without interacting with their shipping contact information or provided shipping methods
  • [Step #4] Once they exited the sheet, 35% of buyers prefer clicking on one of the other links provided. 84% of these will click the “Discount or gift card?” link while the rest will click on the “Continue with regular checkout” link. A small percentage of buyers will retry the payment sheet.
  • [Step #5] 32% of buyers will initiate a payment in the payment sheet by clicking the “Pay” call to action
  • [Step #6] At this point, 28% of buyers are able to complete their checkout. The rest will have to retry a payment because of a processing error such as an invalid credit card, insufficient funds, etc...

Of the buyers that don’t go to through the payment sheet, only 30% of them will retry one or two times to go through Payment Request again and 7% of buyers will retry two or more times.

Furthermore, we don't know why 60% of buyers drop out of the payment sheet, as the Payment Request API doesn’t provide event listeners on all sheet interactions. However, we think that the payment sheet being fairly foreign to buyers might be part of the cause. This 60% drop out rate certainly accounts for the 7% CCR drop we mentioned earlier. This is not to say that the purchase experience is subpar; rather, that it will take time for buyers to get accustomed to. As this new specification gains traction and adoption broadens, we think that the number of buyers that drop out will significantly decrease. Our merchant feedback seems to support our hypothesis:

“I found the pop-up really surprising and confusing because it doesn't go with the rest of our website.”

“[...] it comes up when you are still on the cart page even though you expect to be taken to checkout. It's just not what you are used to seeing as a standard checkout process [...]”

“My initial thoughts on it is that the UI/UX is harshly different than the rest of our site and shopify [...]”

Merchants were definitively apprehensive of Payment Request, but were quite excited by the prospect of a streamlined purchase experience that could leverage the buyers information securely stored in the browser. This is best reflected in the nuanced feedback we received after our experiment ended:

"I just wanted to check in and see if there was any update with this. We’d really love to try out the new checkout again."

“[...] I love the test, it’s just a pretty drastic change from what online shoppers are used to in terms of checkout process.”

Finally, to better understand merchant feedback, we performed user experience research on the different payment sheet UIs implemented by browser vendors. We’ll share specific research insights with the concerned browser vendors, but the lessons listed below can be applied to all payment sheets and are recurring throughout implementations.

We found that in order to create more familiarity with the buyer as they navigate from the storefront to the payment sheet, it’s useful to surface the merchant’s name or logo as part of it. Furthermore, it’s important to keep “call to actions” with negative connotations (i.e. cancel or abort) in the same area in every payment sheet screen. This helps to set the proper expectations for the buyer. An example is having the “Pay” call to action in the bottom right of the very first screen, then having a “Cancel” call to action in the bottom right of the next screen.

As for the user experience, it’s preferred not to surface grayed out fields unless they are preselected. An example is surfacing a grayed out shipping address to the buyer on the very first screen of the payment sheet, without it being preselected. The buyer might think that they don’t have to select a shipping address as it’s already presented to them. This leads to confusion for the buyer and relates well to merchant feedback we’ve received:

“When this pops up, it's really unclear how to proceed so much so that it was jarring to see "Pay" as the CTA button [...]”

Finally, to prevent unnecessary back and forth between screens, surface validation errors as soon as possible in the flow (ideally in the form, near the fields).

Experiment Conclusion

Reiterating our initial hypothesis:

We believe that Payment Request is market ready for all users on our platform (of currently supported browsers). We will know this to be true when we see that the checkout completion rate for select merchants remains unchanged or gets better, and the purchase experience is better and faster.

Event though merchants were interested in the prospect of Payment Request, we don’t believe that Payment Request is a good fit for them yet. We pride ourselves on offering a highly optimized checkout across all browsers. We constantly tweak it by running extensive UX research, testing it against multiple devices, and regularly offering new features and interesting value propositions for merchants and buyers alike. These include Google Autocomplete for Shopify, Shopify Pay or Dynamic Checkout, which allow us to streamline the purchase experience.

As buyer recognition of the feature grows and browsers tweak their UI to improve the payment sheet, we believe that the aforementioned 7% Checkout Conversion Rate drop and the 60% drop of buyers at the payment sheet will greatly diminish. Paired with the very promising time to completion medians, we are excited to see how the specification will grow in the upcoming months.

What’s next

Payment Request has a bright future ahead of it as both the W3C and browser vendors show interest in pushing this technology forward. The next major milestone for Payment Request is to accept third party payment methods through the new Payment Handler API, which will definitely help adoption of this technology. It was, up until recently, only available behind a feature flag in Chrome but Google has officially rolled it out as part of v68. We’ve already started experimenting with this next specification and are quite excited by its possibilities. You can find several demos we recorded for the W3C here: Shopify Web Payments Demos. We chose Affirm and iDeal as payment methods for the exploration, and the results are promising.

Shopify’s excited to be part the Web Payments Working Group and thrilled to hear your comments. We invite you to explore the specification by implementing it on your own website. Then join the discussion over at the Web Payments Slack group or over at W3C’s wiki page, where you’ll find resources to comment, discuss and help us in developing this new standard.

We do believe Payment Request has great potential and will shift the status quo in web payments. We’re excited to see the upcoming changes to Payment Request. Shopify is very keen on the technology and remains active in W3C discussions regarding web payments.

 

Continue reading

Iterating Towards a More Scalable Ingress

Iterating Towards a More Scalable Ingress

Shopify, the leading cloud-based, multi-channel commerce platform, is growing at an incredibly fast pace. Since the beginning of 2016, the number of merchants on the platform increased from 375,000 to 600,000+. As the platform scales, we face new and exciting challenges such as implementing Shopify’s Pod architecture and future proofing our cloud storage usage. Shopify’s infrastructure relies heavily on Kubernetes to serve millions of requests every minute. An essential component of any Kubernetes cluster is its ingress, the first point of entry in a cluster that routes incoming requests to the corresponding services. The ingress controller implementation we adopted at the beginning of the year is ingress-nginx, an open source project.

Before ingress-nginx, we used Google Cloud Load Balancer Controller (glbc). We opted out of glbc because, for Shopify, it underperformed on the cloud. We observed underperforming load balancing and request queueing, particularly during deployments. Shopify currently deploys around 40 times per day without scheduling downtime. At the time we identified these problems, glbc wasn’t endpoint aware while ingress-nginx was. Having endpoint awareness allows the ingress to implement alternative load balancing solutions and not rely on the solution offered by Kubernetes Services through kube-proxy. The above reasons, together with the NGINX expertise Shopify acquired through running and maintaining its NGINX (supercharged with Lua) edge load balancers, made the Edgescale team migrate the ingress on our Kubernetes clusters from glbc to ingress-nginx.

Even though we now leverage endpoint awareness through ingress-nginx to enhance our load balancing solution, there are still additional performance issues that arise at our scale. The Edgescale team, which is in charge of architecting, building and maintaining Shopify’s edge infrastructure, began contributing optimizations to the ingress-nginx project to ensure it performs well at Shopify’s scale and as a way to give back to the ingress-nginx community. This post focuses on the dynamic configuration optimization we contributed to the project which allowed us to reduce the number of NGINX reloads throughout the day.

Now’s the perfect time to introduce myself 😎— my name is Francisco Mejia, and I’m a Production Engineering Intern on the Edgescale team. One of my major goals for this internship was to learn and become familiar with Kubernetes at scale, but little did I know that I would spend most of my internship contributing to a Kubernetes project!

One of the first performance bottlenecks we identified when using ingress-nginx was the high frequency of NGINX reloads during application deployments. Whenever application deployments occurred on the cluster, we observed increased latencies for end users which lead us to investigate and find a solution to this problem.

NGINX uses a configuration file to store the active endpoints for every service it routes traffic to. During deployments to our clusters, Pods running the older version are killed and replaced with Pods running the updated version. It’s possible that a single deployment may trigger multiple reloads, as the controller receives updates for the endpoint changes. Any time NGINX reloads it reads an NGINX configuration file into memory, starts new worker processes and signals the old worker processes to shutdown gracefully.

Although NGINX reloads gracefully, reloads are still detrimental from a performance perspective. Old worker processes being shut down results in increased memory consumption, and the reset of keepalive connections and load balancing state. Clients that previously had open keepalive connections with the old worker processes now need to open new connections with the new worker processes. In addition, opening connections at a faster rate means that the server will need to allocate more resources to handle connection requests. We addressed this issue by introducing dynamic configuration to the ingress controller.

To reduce the number of NGINX reloads when deployments occur we added the ability for ingress-nginx to update application endpoints by maintaining them in-memory, thereby eliminating the need for NGINX to regenerate the configuration file and issue a reload. We accomplished this by creating an HTTP endpoint inside NGINX using lua-nginx-module that receives endpoint configuration updates from the ingress controller and modifies an internal Lua shared dictionary that stores the endpoint configuration for all services. This mechanism enabled us to both: skip NGINX reloads during deployments and significantly improved request latencies, especially during deploys.

Here’s a more granular look at the general flow when we instruct the controller to dynamically configure endpoints:

  1. A Kubernetes resource is modified, created or deleted.
  2. The ingress controller sees the changes and sends a POST request to /configuration/backends containing the up to date list of endpoints for every service.
  3. NGINX receives a POST request to /configuration/backends which is served by our Lua configuration module.
  4. The module handles the request by receiving the list of endpoints for all services and updates a shared dictionary that keeps track of the endpoints for all backends.

My team carried out tests to compare the latency of requests between glbc and ingress-nginx with dynamic configuration enabled. The test consisted of the following:

  1. Find a request rate for the load generator where the average request latency is under 100ms when using glbc to access an endpoint.
  2. Use the same rate to generate load on an endpoint behind ingress-nginx and compare latencies, standard deviation and throughput.
  3. Repeat step 1, but this time carry out application deploys while load is being generated to endpoints.

The latencies were distributed as follows:

Latency by percentile distribution glbc vs dynamic

Up until the 99.9th percentile of request latencies both ingresses are very similar, but when we reach 99.99th percentile or greater, ingress-nginx outperforms glbc by multiple orders of magnitude. It’s vital to minimize the request latency as much as possible as it highly impacts merchants success.

We also compared the request latencies when running the ingress controller with and without dynamic configuration. The results were the following:

Latency by percentile distribution - Dynamic configuration enabled vs disabled

From the graph, we can see that the 99th percentile of latencies when using dynamic configuration is comparable to the 99th percentile when using the vanilla ingress controller - with roughly similar results.

We also carried out the previous test, but this time during application deploys - here’s where we really get to see the impact of the dynamic configuration feature. The results are depicted below:

Latency by percentile distribution deploys - dynamic vs vanilla

It’s clear from the graph that there was a huge increase in performance after the 80th percentile from ingress-nginx with dynamic configuration.

When operating at Shopify’s scale a whole new world of engineering challenges and opportunities arise. Together with my teammates, we have the opportunity to find creative ways to solve optimization problems involving both Kubernetes and NGINX. We contributed our NGINX expertise to the ingress-nginx project and will continue doing so. The contribution explained throughout this post wouldn’t have been possible without the support of the ingress-nginx community, massive kudos to them 🎉! Keep an eye out for more ingress-nginx updates on its GitHub page!

Continue reading

E-Commerce at Scale: Inside Shopify's Tech Stack - Stackshare.io

E-Commerce at Scale: Inside Shopify's Tech Stack - Stackshare.io

 9 minute read

Before 2015, we had an Operations and Performance team. Around this time, we decided to create the Production Engineering department and merge the teams. The department is responsible for building and maintaining common infrastructure that allows the rest of product development teams to run their code. Both Production Engineering and all the product development teams share responsibility for the ongoing operation of our end user applications. This means all technical roles share monitoring and incident response, with escalation happening laterally to bring in any skill set required to restore service in case of problems.  

Continue reading

Behind The Code: Jennice Colaco, Backend Developer

Behind The Code: Jennice Colaco, Backend Developer

Behind the Code is an initiative with the purpose of sharing the various career stories of our engineers at Shopify, to show that the path to development is non-linear and quite interesting. The features will showcase people just starting their careers, those who made career switches, and those who've been in the industry for many years. Enjoy!

Continue reading

Scaling iOS CI with Anka

Scaling iOS CI with Anka

Shopify has a growing number of software developers working on mobile apps such as Shopify, Shopify POS and Frenzy. As a result, the demand for a scalable and stable build system increased. Our Developer Acceleration team decided to invest in creating a single unified build system for all continuous integration and delivery (CI/CD) pipelines across Shopify, which includes support for Android and iOS.

We want our developers to build and test code in a reliable way, as often as they want. Having a CI system that makes this effortless. The result is that we can deliver new features quickly and with confidence, without sacrificing the stability of our products.

Shopify’s Build System

We have built our own CI system at Shopify, which we call Shopify Build. It’s based on Buildkite, and we run it on our own infrastructure. We’ve deployed our own version of the job bootstrap script that sets up the CI environment, rather than the one that ships with Buildkite. This allows us to accomplish the following goals:

  • Provide a standard way to define general purpose build pipelines
  • Ensure the build environment integrates well with our other developer tools and are consistent with our production environment
  • Ensure builds are resilient against infrastructure failures and flakiness of third-party dependencies
  • Provide disposable build environments so that subsequent jobs can’t interfere with each other
  • Support selective builds for monorepos, or repositories with multiple projects in them

Initially, Shopify Build only supported Linux environments using Docker to provide disposable environments, and it works extremely well for backend and Android projects. Previously, we had separate CI systems for iOS projects, but we wanted to provide our iOS developers with the same benefits as our other developers by integrating iOS into Shopify Build.

Build Infrastructure for iOS

Building infrastructure for iOS comes with its unique set of challenges. It’s the only piece of infrastructure at Shopify that doesn’t run on top of Linux. We can leverage the same Google Cloud infrastructure we already use in production for our Android build nodes. Unfortunately, Cloud providers such as Amazon Web Services (AWS) and Google Cloud Platform (GCP) don’t provide infrastructure that can run macOS. The only feasible option for us is using a non-cloud provider like MacStadium but the tradeoff is that we can’t auto-scale the infrastructure based on demand.

2017: VMware ESXi and a Storage Area Network

Since we published our blog post on the VMware-based CI for Android and iOS, we’ve learned many lessons. We had a cluster of Mac Pros running ephemeral VMs on top of VMware ESXi. Although it served us well in 2017, it was a maintenance burden on the small team. We relied on tools such as Packer and ovftool, but we built many custom provisioning scripts to build and distribute VMware virtual machines.

On top being difficult to maintain, the setup had a single point of failure: the Storage Area Network (SAN). Each Mac Pro shared this solid-state based infrastructure. By the end of 2017, we exceeded the write throughput, degrading build stability and speed for all of our mobile developers. Due to our write-heavy CI workload, the only solution was to upgrade to a substantially more expensive dedicated storage solution. Dedicated storage would push us a bit farther, but the system would not be horizontally scalable.

2018: Disposable Infrastructure with Anka

During the time we had our challenges with VMWare, a new virtualization technology called Anka was released by Veertu. Anka provides a Docker-like command line interface for spinning up lightweight macOS virtual machines, built on top of Apple’s Hypervisor.framework.

Anka has the concept of a container registry similar to Docker with push and pull functionality, fast boot times, and easy provisioning provided through a command line interface. With Anka, we can quickly provision a virtual machine with the preferred macOS version, disk, memory, CPU configuration and Xcode version.

Mac Minis Versus Mac Pros

Our VMWare-based setup was running a small cluster of 12-core Mac Pros in MacStadium. The Mac Pros provided high bandwidth to the shared storage and ran multiple VMs in parallel. For that reason, they were the only viable choice for a SAN-based setup. However, Anka runs on local storage, and therefore it doesn’t require a SAN.

After further experimentation, we realized a cluster of Core i7 Mac Minis would be a better fit to run with Anka. They are more cost-effective than Mac Pros while providing the same or higher per-core CPU performance. For the price of a single Mac Pro, we could run about 6 Mac Minis. Mac Minis don’t provide 10 Gbit networking, but that isn’t a deal breaker in our Anka setup as we no longer need a SAN. We’re running only one Anka VM per Mac Mini, giving us four cores and up to 16 GB memory per build node. Running a single VM also avoids the performance degradation that we found when running multiple VMs on the same host, as they need to share resources.

Distributing Anka Images to Nodes in Different Regions

We use a separate Mac Mini as a controller node that provisions an Anka VM with all dependencies such as Xcode, iOS simulators and Ruby. The command anka create generates the base macOS image in about 30 minutes and only needs a macOS installer (.app) from the Mac App Store as input.

Anka’s VM image management optimizes disk space usage and data transfer times when pushing and pulling the VMs on the Mac nodes. Our images build automatically in multiple layers to benefit from this mechanism. Multiple layers allow us to make small changes to an image quickly. By re-using previous layers, changing a small number of files in an image across our nodes can be done in under 10 minutes, and upgrading the Xcode version in about an hour.

After the provisioning completes, our controller node continues by suspending the VM and pushes it to our Anka registries. The image is tagged with its unique git revision. We host the Anka Registry on machines with 10 Gbps networking. Since all nodes run Anka independently, we can run our cluster in two MacStadium data centers in parallel. If a regional outage occurs, we offload builds to just one of the two clusters, giving us extra resiliency.

The final step of the image distribution is a parallel pull performed on the Mac Minis with each pulling only the new layers from the available images in their respective Anka Registry to speed up the process. Each Mac Mini has 500 GB of SSD storage, which is enough to store all our macOS image variants. We allow build pipelines to specify images with both name and tags, such as macos-xcode92:latest or macos-xcode93:<git-revision>, similar to how Docker manages images. The Anka Image Distribution Process

Running Builds With Anka and Buildkite

We use Buildkite as the scheduler and front-end for CI at Shopify. It allows for fine-grained customization of pipelines and build scripts, which makes it a good fit for our needs.

We run a single Buildkite Agent on each Mac Mini and keep our git repositories cached on each of the hosts, for a fast git checkout. We also support shallow clones. We found that with a single large repository and many git submodules, a local cache gives the best performance. As mentioned before, we maintain copies of suspended Anka images on each of the Mac Minis. Suspended Anka VMs, rather than stopped ones, can boot in under a second, which is a significant improvement over our VMware VMs, which took about one minute to boot even from a suspended state.

As part of running a build, a sequence of Anka commands is invoked. First, we clone the base image to a temporary snapshot. This is done using anka clone. We then start the VM, wait for it to be booted and continue by mounting volumes to expose artifacts. With anka run we execute the command corresponding to the Buildkite step and wait for it to finish. Artifacts are uploaded to cloud storage and the Anka VM is deleted afterwards with anka delete. The Lifecycle of a Build Job Using Anka Containers.

We monitor the demand for build nodes and work with MacStadium to scale the number of Mac Minis in both data centers. It’s easier than managing Macs ourselves, but it’s still a challenge as we can’t scale our cluster dynamically. In the graph below, you can see the green line indicating the total number of available build nodes and the required agent count in orange.

Our workload is quite spiky, with high load exceeding our capacity at moments during the day. During those moments, our queue time will increase. We expect to add more Mac Minis to our cluster as we grow our developer teams to keep our queue times under control.

 A Graph Showing Shopify's iOS CI Workload over 4 Hours During Our Work Day

Summary

It took us about four months to implement the new infrastructure on top of Anka with a small team. Building your own CI system requires an investment in engineering time and infrastructure, and at Shopify, we believe it’s worth it for companies that plan to scale while continuing to iterate at a high pace on their iOS apps.

By using Anka, we substantially improved the maintainability and scalability of our iOS build infrastructure. We recommend it to anyone looking for macOS virtualization in a Docker-like fashion. During the day, our team of about 60 iOS developers runs about 350 iOS build jobs per hour. Anka provides superior boot times by reducing the setup time of a build step. Upgrading to new versions of macOS and Xcode is easier than before. We have eliminated shared storage as a single point of failure thereby increasing the reliability of our CI system. It also means the system is horizontally scalable, so we can easily scale with the growth of our engineering team.

Finally, the system is easier to use for our developers by being part of Shopify Build, sharing the same interface we use for CI across Shopify.

If you would like to chat with us about CI infrastructure or other developer productivity topics, join us at Developer Productivity on Slack.

Continue reading

Introducing the Merge Queue

Introducing the Merge Queue

Scaling a Majestic Monolith

Shopify’s primary application is a monolithic Rails application that powers our cloud-based, multi-channel commerce platform for 600,000+ merchants in over 175 countries. As we continue to grow the number of developers working on the app, our tooling has grown with them. At Shopify, we mostly follow a trunk based development workflow, and every week more developers write more code, open more pull requests, and merge more commits to master. Occasionally, master merges can go wrong. For example, two unrelated merges can affect one another, the introduction of a new flaky test, or even accidental merges of work in progress. Even a low percentage of a growing number of failed merges will eventually become too big to ignore, so we needed to improve our tooling around merging pull requests.

Shipit is our open source deployment coordination tool. It’s our source of truth of what is deployed, what’s being deployed (if anything), and what’s about to be deployed. There are times we don’t want any more commits merged to master (e.g. if CI on master is failing; if there’s an ongoing incident and we don’t want any more changes introduced to the master branch; or if the batch size of undeployed commits is too high) and Shipit is also the source of truth for this. Originally, we expected developers to check the status of master by hand before merging. This quickly became unsustainable, so Shipit has a browser extension which tells the developer the status of stack right on their pull request:

Introducing the Merge Queue - Stack Status Clear

If for some reason, it’s unsafe to merge, then the developer is asked to hold off:

Introducing the Merge Queue - Please Hold Off Merging

Developers had to manually revisit their pull request to see if it was safe to merge. Large batches of undeployed commits are also considered unsafe for more merges, a condition Shipit considers ‘backlogged’:

Introducing the Merge Queue - Backlogged Status

A rapidly growing development team brings scaling challenges (and lots of frustration) because when a stack returned to a mergeable state, developers rushed to get their changes merged before the pipeline became backlogged again. As we continued to grow, this became more and more disruptive, and so the merge queue idea was born.

The Merge Queue

Shipit was the obvious candidate to house this new automation — it’s the source of truth for the state of master and deploys, and already is integrated with Github. We added the ability to enqueue pull requests for merge directly within Shipit (you can see how it’s configured here in the Shipit Github repo). Once queued and the state of master is ok, a pull request is merged very quickly. We didn’t want our developers to have to leave Github to enqueue pull requests, and we looked at the browser extension to solve that problem!

Introducing the Merge Queue - Merge Pull Request

If a stack has the merge queue enabled, we inject an ‘Add to merge queue’ button. Integrating the button with the normal development flow was important for developer adoption. During testing, we discovered that people still merged directly to master for routine merging and interviews revealed that they instinctively ‘pressed the big green button to merge’. We wanted the merge queue to become the default mechanism for merges and by tweaking our extension to de-emphasise the default ‘Merge pull request’ button by turning it gray, and we saw a further boost in adoption.

By bringing the merge event into the regular deploy pipeline, we’re able to codify some other things we consider best practices — for example, the merge queue can be configured to reject pull requests if it's diverged from its merge base beyond configurable thresholds. Short-lived branches are very important for trunk based development, so old branches (both in terms of date and number of commits diverged) represent an increased risk, and need to be discouraged. The merge queue is configured inside shipit.yml, so the discussions that inform these decisions are all traceable back to a pull request!

It’s important to stress that the merge queue is highly encouraged, but not enforced. At Shopify, we trust our developers to override the automation, if they feel it’s required, and merge directly to master.

After launching the merge queue, we quickly learned that the queue wasn’t always behaving as developers expected. We configured the queue to require certain CI statuses before merging and if a pull request wasn’t ready, Shipit would eject it from the queue, making the developer re-enqueue the pull request later. There are some common situations where this causes frustrations for developers. For example, after a code review, some small tweaks are made to satisfy reviewers, and the pull request is ready to merge pending CI passing. The developer wants to queue the pull request for merging and move on to their next task but needs to monitor CI. Similarly, this also happened with minor changes (readme updates and the like) and developers would save a lot of time if they could queue-and-forget, so that’s what we did! If CI is pending on a queued pull request, Shipit will wait for CI to pass or fail, and merge or reject as appropriate.

We received a lot of positive feedback for that small adjustment, and for the merge queue in general. By getting automation involved earlier in the pipeline, we’re able to take some of the load off our developers, make them happier, and more productive. Over 90% of pull requests to Shopify’s core application are using Shipit with the merge queue! That makes Shipit the largest contributor to our monolith.

Unsafe Commits

A passing, regularly exercised CI pipeline gives you high confidence that a given changeset won’t cause any negative impacts once it reaches production. Ultimately, the only way to see the impact of your changes is to ship them, and sometimes that results in a breaking change reaching your users. You quickly roll back the deploy, stop the shipping pipeline, and investigate what caused the break. Once you identify the bad commit, it can be reverted on master, and the pipeline can resume, right? Consider this batch of commits on master, waiting to deploy:

  • Good commit A
  • Bad commit B
  • Good commit C
  • Good commit D
  • Revert of commit B

How does your deployment tool know that deploying commit C or D is unsafe? Up until recently, we were relying on our developers to manage this situation by hand, manually deploying the first safe commit before unlocking the pipeline. We’d rather our developers focus on adding value elsewhere and decided to have Shipit manage this automatically where possible. Our solution comes in 2 parts:

Marking Commits as Unsafe for Deploy

Introducing the Merge Queue - Marking Commits as Unsafe for Deploy

If a commit is marked as unsafe, Shipit will not deploy that ref in isolation. In the above example, the bottom (oldest) commit might be deployed, followed by the remaining two commits together. This is the functionality we want but still requires manual configuration, so we complement this with automatic revert detection.

Automatic Revert Detection

If Shipit detects a revert of an undeployed commit, it will mark the original commit (and any intermediate commits between it and the revert) as unsafe for deploy:

Introducing the Merge Queue - Automatic Revert Detection

This removes the need for any manual intervention when doing git revert as Shipit can continue to deploy automatically and safely.

In Conclusion

These new Shipit features allow us to ship faster, safer, and hands valuable time back to our developers. Shipit is open source, so you can benefit from these features yourself — check out the setup guide to get started. We’re actively exploring open sourcing the browser extension mentioned above, stay tuned for more updates on that!

Continue reading

Shopify Interns Share Their Tips for Success

Shopify Interns Share Their Tips for Success

At Shopify, we pride ourselves on our people. Shopifolk come from all kinds of backgrounds and experiences — our interns are no exception. So, we gathered some of our current and past interns to chat with them about their careers so far. They share insights about work, education, and tips for interviewing and succeeding at Shopify.

Natalie Dunbar (Backend Developer Intern, Marketing Technology)

Office: Toronto

Education: University of Toronto
Natalie Dunbar (Backend Developer Intern, Marketing Technology)
Get to know Natalie:
  • Studied as a philosophy major for three years, then switched to Computer Science
  • Former camp counselor and sailing instructor
  • Best tip when stuck on a problem? Pair programming.

What does your day-to-day look like?
Once I get to work I immediately open GitHub and Slack. Our team does a daily stand-up through Slack to review our tasks from yesterday and today. My morning is usually responding to PR comments, Slack messages, and working on my assigned tasks. After lunch, I work on projects and usually try to merge my work from earlier in the afternoon to monitor it in production. Finally, before I leave I try to update my PRs so that our team in San Francisco can view them before the end of their day.

What do you feel was the hardest part of your interview?
I've done many technical interviews before, and the “Life Story” step in the Shopify interview process is unique from other companies. I was unsure what to expect. Looking back, I realize it’s not something to worry about because it's an incredibly comfortable conversation with your recruiter that gave them the knowledge to place me on a team that was the best possible fit.

Dream career if you weren’t working in tech?
Philosophy professor (specializing in either logic, philosophy of language, or continental philosophy).

Best piece of advice you’ve ever gotten?
Always be open with your mentor/lead. They want to make your internship experience great so always help them do this for you. This means both requesting and giving feedback frequently.

What are your tips for future Shopify applicants?
Be yourself! And if you are applying for a role that requires a personal project, show one that is targeted at what you’re interested in working on. I made a completely new project over the few days before my internship, which is in no way necessary, and my interviewer (and now lead) was able to determine my technical fit from that.

Gurpreet Gill (Developer Intern, Products)

Office: Ottawa

Education: University of Waterloo
Gurpreet Gill (Developer Intern, Products)
Get to know Gurpreet:
  • No experience of technical stack used at Shopify when hired
  • Can move ears on command
  • Best tip when stuck on a problem? Take a break.

What does your day-to-day look like?
I’m usually in the office by 8:30, and I try not to miss breakfast. I typically avoid coding in the morning. Instead, I review and address feedback on my PRs; read emails; and catch up on work. My team and I head to lunch, then I start coding in the afternoon. I like taking a coffee break in the afternoon at Cody’s Cafe (yes, Ottawa has its own cafe) and make myself a latte with terrible latte art. I also like to play ping-pong as a break!

Dream career if you weren’t in tech?
A chef, or police officer.

Best piece of advice you’ve ever gotten?
Asking for help is okay - it doesn’t make you look weaker, and it’s never too late to reach out for it.

What are your tips for future Shopify applicants?
I believe Shopify is a unique company. Having “people skills” is as important as having technical skills here. So just be yourself during interviews. Don’t pretend to be someone you are not. Be passionate about what you do. Ask questions, don’t be afraid to crack jokes, and be ready to meet some dope people.

Joyce Lee (Solutions Engineering Intern, Shopify Plus)

Office: Waterloo

Education: University of Western Ontario
Joyce Lee (Solutions Engineering Intern, Shopify Plus)
Get to know Joyce:
  • Started interning at Shopify in September 2017
  • Spent 8 months at Shopify in a sales-focused role, but will spend next 4 in a technical one
  • Once tried to sell composting worms online, but inventory sourcing and fulfilment ended up being really complicated.

What’s your day to day like?
Grab a bottle of cold-pressed juice, then go to a full day of meetings selling Shopify Plus to prospective merchants. On days with fewer meetings, I’m building proofs-of-concept for merchants, and working on small projects to level up the revenue organization.

The hardest part of your interview?
I had a slightly different technical interview than other engineers. I was given a business problem and asked to propose a technical solution for it. Then explain it twice to two different audiences: a CEO and a developer.

Any tips for future Shopify applicants?
Complete every part of the application. For interns, it’s typically quite long so start early, but the application actually helps you know Shopify better, which is a great experience. Shopify is worth the long application process, trust me.

How do you succeed within Shopify?
Ask dumb questions, and ask them quickly. The more you ask, the less dumb they’ll get.

Yash Mathur (Developer Intern, Shopify Exchange)

Office: Toronto

Education: University of Waterloo
Yash Mathur (Developer Intern, Shopify Exchange)
Get to know Yash:
  • Has done two work terms with Shopify
  • Demoed an Android app when interviewing for a front-end developer role at Shopify (Spoiler alert: it all worked out!)
  • Favourite language is C++ but has learned to love Ruby for its simplicity (and because Shopify has great Ruby on Rails developers to learn from!)

What does your day-to-day look like?
I come in to the office around 10am. I prefer that because I like to spend my mornings running or swimming, and the rest of my team usually comes in around then too. I start off the day by grabbing breakfast and going through emails and messages. Our team does a daily stand-up where we review what we’ll be working on that day. Then, I like to grab lunch with my team or the other interns. Each day is a mix of coding and meetings to discuss projects or pair-programming. During my breaks, I love playing FIFA or ping pong with others.

Dream career if you weren’t in tech?
Astronaut.

Any tips for future Shopify applicants?
Shopify looks for people who are passionate and willing to expand their skill set. Make sure you bring that point across each phase of the interview.

How do you succeed within Shopify?
Take initiative. Shopify has a startup culture - people won’t tell you what to do, so you have to look for ways to contribute and be valuable. Also, talk to people outside your team. It’s important to understand how your team fits within the rest of the company.

Jenna Blumenthal (Developer, Channels Apps)

Office: Toronto
Education: McGill University
Jenna Blumenthal (Developer, Channels Apps)
Get to know Jenna:
  • Former Shopify intern
  • Started as an intern in January 2017, and was hired full-time in May 2017
  • Studied Physics and Physiology in undergrad, later completing a master’s degree in Industrial Engineering

What’s your day-to-day look like?
Most of the day is spent working on net-new code that will contribute to whatever feature or project we are building. The rest is spent on reviewing other team member’s code, investigating bugs that come in through support and pairing with others (devs or not) on issues.

Any tips for future Shopify applicants?
Play up your non-traditional background. Whoever you meet with, explain why your experiences have shaped the person you are and the way you work. Shopify thrives on people with diverse skills and opinions.

How do you succeed within Shopify?
One of the core tenets you hear a lot at Shopify is, “strong opinions, weakly held.” Don’t think that because you’re just an intern, or new, that you don’t have a valuable opinion. Sometimes fresh eyes see the root of a problem the fastest. Be confident, but also be willing to drive consensus even if it doesn’t turn out your way.

Jack McCracken (Developer Intern, Application Security)

Office: Ottawa
Education: Carleton University
Jack McCracken (Developer Intern, Application Security)
Get to know Jack:
Has a red-belt-black-stripe in Taekwondo. Almost a black belt!
Sells snacks and drinks to students at Carleton using Shopify’s POS system
Has worked at Shopify consistently since May 2015 and as of April 2018 is now full time...that’s six internships!

The hardest part of your interview?
The hardest part of my interview was admitting what I didn’t know. When I got to my second interview, I was so nervous that I completely blanked! After a while of trying to graciously “fake it till I made it, ”I worked up the courage to tell the person I was interviewing with that I totally had no idea. That was hard, but I still believe it got me the job to this day.

Dream career if you weren’t in tech?
If I wasn’t working in tech, I like to think I’d be an author. I love writing stories and explaining complex things to people.

Best piece of advice you’ve ever gotten?
If you ask a question that saves you an hour of hard work and takes the person you’re asking 5 minutes to explain, you just saved your company 55 minutes in development time.

How do you succeed at Shopify?
Succeeding at Shopify is slightly different than your average tech company. It’s very self-driven, so you need to ask questions. It’s hard to succeed (actually pretty much impossible) in a large organization without any context, and it’s much easier to learn by talking to your team lead, a mentor, or some random person you found on Slack than to laboriously read through code or wiki pages.

Ariel Scott-Dicker (iOS Developer, Mobile Foundations)

Office: Ottawa 

Education: Flatiron School
Ariel Scott-Dicker (iOS Developer, Mobile Foundations)
Get to know Ariel:
  • Was doing a degree in Cultural Anthropology, with a minor in Music. He didn’t finish the university degree but did a software development bootcamp and developer internship, before coming to Shopify
  • Never wrote in Swift before coming to Shopify. Now, it’s his favourite programming language!

What’s your day-to-day like?
We release new versions and updates to our iOS app every three weeks. This makes our day-to-day consist of working our way through various tasks that we’ve designated for the current three week period. Sometimes for me, that’s one large task or several smaller ones.

The hardest part of your interview?
I didn’t progress past Life Story the first time. I think it was because I didn’t relate the course of my life thus far to how I could be successful at Shopify. Other than that, the hardest part (which I thought was really fun) was solving conceptual problems verbally, not through coding terms.

Any tips for future Shopify applicants?
During your interview, be yourself, stay calm and confident, and breathe. Make sure whatever you mention speaks for itself, and that it demonstrates how you can succeed at and contribute to Shopify.

Dream career if not in tech?
Working in a big cat sanctuary or experimental agriculture.

How do you succeed at Shopify?
For me, a huge tip for succeeding at Shopify is being selfish with your education and development. This means, asking questions, using the smart people around you as resources, and taking the time to understand something practically or theoretically.

A huge thanks to our Winter 2018 interns for all they have contributed this term. We’re so proud of the work you’ve done and can’t wait to see what’s next for all of you! Think you can see yourself as one of our interns? We’re currently hiring for the Fall 2018 term. Find the application at shopify.com/careers/interns and make sure you apply before the deadline on May 11, 2018 at 9:00 AM EST.

Want to learn more about Shopify's Engineering intern program? Check out these posts:

Continue reading

Solving the N+1 Problem for GraphQL through Batching

Solving the N+1 Problem for GraphQL through Batching

Authors: Leanne Shapton, Dylan Thacker-Smith, & Scott Walkinshaw

When Shopify merchants build their businesses on our platform, they trust that we’ll provide them with a seamless experience. A huge part of that is creating scalable back-end solutions that allow us to manage the millions of requests reaching our servers each day.

When a storefront app makes a request to our servers, they’re interacting with the Storefront API. Historically, REST is the language of choice when designing APIs, but Shopify uses GraphQL.

GraphQL is an increasingly popular query language in the developer community, because it avoids the classic over-fetching problem associated with REST. In REST, the endpoint determines the type and amount of data returned. GraphQL, however, permits highly specific client-side queries that return only the data requested.

Over-fetching occurs when the server returns more data than needed. REST is especially prone to it, due to its endpoint design. Conversely, if a particular endpoint does not yield enough information (under-fetching), clients need to make additional queries to reach nested data. Both over-fetching and under-fetching waste valuable computing power and bandwidth.

In this REST example, the client requests all ‘authors’, and receives a response, including fields for name, id, number of publications, and country. The client may not have originally wanted all that information; the server has over-fetched the data.

REST Query and Response

Conversely, in this GraphQL version, the client makes a query specifically for all authors’ names, and receives that only that information in the response.

GraphQL Query

GraphQL Response

GraphQL queries are made to a single endpoint, as opposed to multiple endpoints in REST. Because of this, clients need to know how to structure their requests to reach the data, rather than simply targeting endpoints. GraphQL back-end developers share this information by creating schemas. Schemas are like maps; they describe all the data and their relationships within a server.

A schema for the above example might look as follows.

The schema defines the type ‘author’, for which two fields of information are available; name and id. The schema indicates that for each author, there’s a non-nullable string value for the ‘name’ field, and a unique, non-nullable identifier for the ‘id’ field. For more information, visit the schema section on the official GraphQL website.

How does GraphQL return data to those fields? It uses resolvers. A resolver is a field-specific function that hunts for the requested data in the server. The server processes the query and the resolvers return data for each field, until it has fetched all the data in the query. Data is returned in the same format and order as the query, in a JSON file.

GraphQL’s major benefits are its straightforwardness and ease of use. Its solved our biggest problems by reducing the bandwidth used and latency while retrieving data for our apps.

As great as GraphQL is, it’s prone to encountering an issue, known as the n+1 problem. The n+1 problem arises because GraphQL executes a separate resolver function for every field, whereas REST has one resolver per endpoint. These additional resolvers mean that GraphQL runs the risk of making additional round trips to the database than are necessary for requests.

The n+1 problem means that the server executes multiple unnecessary round trips to datastores for nested data. In the above case, the server makes 1 round trip to a datastore to fetch the authors, then makes N round trips to a datastore to fetch the address for N authors. For example, if there were fifty authors, then it would make fifty-one round trips for all the data. It should be able to fetch all the addresses together in a single round trip, so only two round trips to datastores in total, regardless of the number of authors. The computing expenditure of these extra round trips are massive when applied to large requests, like asking for fifty different colours of fifty t-shirts.

The n+1 problem is further exacerbated in GraphQL, because neither clients nor servers can predict how expensive a request is until after it’s executed. In REST, costs are predictable because there’s one trip per endpoint requested. In GraphQL, there’s only one endpoint, and it’s not indicative of the potential size of incoming requests. At Shopify, where thousands of merchants interact with the Storefront API each day, we needed a solution that allowed us to minimize the cost of each request.

Facebook previously introduced a solution to the N+1 issue by creating DataLoader, a library that batches requests specifically for JavaScript. Dylan Thacker-Smith, a developer at Shopify, used DataLoader as inspiration and built the GraphQL Batch Ruby library specifically for the GraphQL Ruby library. This library reduces the overall number of datastore queries required when fulfilling requests with the GraphQL Ruby library. Instead of the server expecting each field resolver to return a value, the library allows the resolver to request data and return a promise for that data. For GraphQL, a promise represents the eventual, rather than immediate, resolution of a field. Therefore, instead of resolver functions executing immediately, the server waits before returning the data.

GraphQL Batch allows applications to define batch loaders that specify how to group and load similar data. The field resolvers can use one of the loaders to load data, which is grouped with similar loads, and returns a promise for the result. The GraphQL request executes by first trying to resolve all the fields, which may be resolved with promises. GraphQL Batch iterates through the grouped loads, uses their corresponding batch loader to load all the promises together, and replaces the promises with the loaded result. When an object field loads, fields nested on those objects resolve using their field resolvers (which may themselves use batch loaders), and then they’re grouped with similar loads that haven't executed. The benefits for Shopify are huge, as it massively reduces the amount of computing power required to process the same requests.

GraphQL Batch is now considered general best-practice for all GraphQL work at Shopify. We believe great tools should be shared with peers. The GraphQL Batch library is simple, but solves a major complaint within the GraphQL Ruby community. We believe the tool is flexible and has the potential to solve problems beyond just Shopify’s scope. As such, we chose to make GraphQL Batch open-source.

Many Shopify developers are already active individual GraphQL contributors, but Shopify is still constantly exploring ways to interact more meaningfully with the vibrant GraphQL developer community. Sharing the source code for GraphQL Batch is just a first step. As GraphQL adoption increases, we look forward to sharing our learnings and collaborating externally to build tools that improve the GraphQL developing experience.

Continue reading

Shopify’s Infrastructure Collaboration with Google

Shopify’s Infrastructure Collaboration with Google

We’re always working to deliver the best commerce experience to our merchants and their customers. We provide a seamless merchant experience while shaping the future of retail by building a platform that can handle the traffic of a Kylie Cosmetic flash sale (they sell out in 20 seconds), ship new features into production hundreds of times a day, and process more than double the amount of orders year over year.

For Production Engineering to meet these needs, we regularly review our technology stack to ensure we are using the best tools for the job and our journey to the Cloud is a perfect example. That’s why, we are excited to share that Shopify is now building our Cloud with Google, but before sharing the details of this announcement, we want to provide some context on our journey.

Shopify has been a cloud company since day one. We provide a commerce cloud to our merchants, solving their worries about hiring full-time IT staff to manage the infrastructure side of the business. Cloud is part of our DNA and our public cloud connection goes back to 2006, the same year both Shopify and Amazon Web Services (AWS) launched. Early on, we leveraged the public cloud as a small piece of our commerce cloud. It was great for hosting some of our smaller services, but we found the public cloud wasn’t a great fit for our main Rails monolith.

We’re pragmatic about how to evolve and invest in our infrastructure. In our startup days - with a small team - we valued simplicity and chose to focus on shipping the foundations of a commerce platform by deferring more complex infrastructure like database sharding. As we grew in scale and engineering expertise, we took on solving more complex patterns. With each major infrastructure scalability feature we shipped, like database sharding, application sharding, and production load testing, we continued to revisit how to horizontally scale our Rails application across thousands of servers. Over the years, we moved more and more of our supporting services to the Cloud, gaining additional context which fed into our developing monolith Cloud strategy.

Our latest push to the Cloud started over two years ago. Google launched Google Kubernetes Engine (GKE) (formerly Google Container Engine) as we had just finished production-hardening Docker. In 2014, Shopify invested in Docker to capitalize on the benefits of immutable infrastructure: predictable, repeatable builds and deployments; simpler and more robust rollbacks; and elimination of configuration management drift. Once you’re running containers, the next natural step is to take inspiration from Google’s Borg and start building out a dynamic container management and orchestration system. Being early adopters of Docker meant there weren’t many open-source options available, so we decided to build minimal container management features ourselves. The community and codebase were in its infancy and changing rapidly. Building these features allowed us to focus on application scalability and resilience while avoiding additional complexity as the Docker community matured.

In 2016, internal discussions began around what Shopify would look like in the future. The infrastructure changes from 2012 to 2016 allowed us to lay the foundation for using the Cloud in a pragmatic way via database sharding, application sharding, perf testing and automated failovers, but we were still missing an orchestration solution. Luckily, several exciting developments were happening, and the most promising one for Shopify was Kubernetes, an open-source container management system created by the teams at Google that built Borg and GKE.

After 12 years of building and running the foundation of our own commerce cloud with our own data centers, we are excited to build our Cloud with Google. We are working with a company who shares our values in open-source, security, performance and scale. We are better positioned to change the face of global commerce while providing more opportunities to the 600,000+ merchants on our platform today.

Since we began our Google Cloud migration, we have:

  • Built our Shop Mover, a selective database data migration tool, that lets us rebalance shops between database shards with an average of 2.5s of downtime per shop
  • Migrated over 50% of our data center workloads, and counting, to Google Cloud
  • Contributed and leveraged, Grafeas, Google’s open source initiative to define a uniform way for auditing and governing the modern software supply chain
  • Grown to over 400 production services and built a platform as a service (PaaS) to consolidate all production services on Kubernetes
  • Joined the Cloud Native Computing Foundation (CNCF) and participated in the Kubernetes Apps Special Interest Group and Application Definition Working Group

By leveraging Google’s deep understanding of global infrastructure at scale, we’re able to ensure that every engineer we hire focuses on building and shaping the future of commerce on a global scale.

Stay tuned. We’re excited to share more stories about Shopify’s journey to Google Cloud with you.

Dale Neufeld, VP of Production Engineering

Continue reading

A Pods Architecture To Allow Shopify To Scale

A Pods Architecture To Allow Shopify To Scale

In 2015, it was no longer possible to continue buying a larger database server for Shopify. We finally had no choice but to shard the database, which allowed us to horizontally scale our databases and continue our growth. However, what we gained in performance and scalability we lost in resilience. Throughout the Shopify codebase was code like this:

Sharding.with_each_shard do

some_action

end

If any of our shards went down, that entire action would be unavailable across the platform. We realized this would become a major problem as the number of shards continued to increase. In 2016 we sat down to reorganize Shopify’s runtime architecture.

Continue reading

Accelerating Android Talent Through Community Bootcamps

Accelerating Android Talent Through Community Bootcamps

6 minute read

The mobile team knew they needed developers, particularly Android developers. A few years ago, Shopify pivoted to mobile-first, which led to the launches of Shopify Mobile, Shopify Pay, Frenzy, and others. To maintain momentum, Shopify had to keep building up its mobile talent.

Back when Shopify's mobile teams spun up, many of our then-early mobile developers never did any mobile development before, instead teaching themselves how to do it on the job. From this observation, we had an insight: what if we could teach developers how to build an Android app, via a Shopify-hosted workshop?

The benefits were obvious: this educational initiative could help our local developer community pick up some new skills, while potentially allowing us to meet exciting new talent. The idea for Android Bootcamp was born.

Continue reading

Future Proofing Our Cloud Storage Usage

Future Proofing Our Cloud Storage Usage

How we reduced error rates, and dropped latencies across merchants’ flows

Reading Time: 6 Minutes

Shopify merchants trust that when they build their stores on our platform, we’ve got their back. They can focus on their business, while we handle everything else. Any failures or degradations that happen put our promise of a sturdy, battle-tested platform at risk.

To do so, we need to ensure that the platform stays up and stays reliable. Shopify since 2016 has grown from 375,000 merchants to over 600,000. As of today, an average of 450,000 S3 operations per second are being made through our platform. However, that rapid growth also came with an increased S3 error rate, and increased read and write latencies.

While we use S3 at Shopify, if your application uses any flavor of cloud storage, and its use of cloud storage strongly correlates with the growth of your user base—whether it’s storing user or event data—I’m hoping this post provides some insight into how to optimize your cloud storage!

Continue reading

2017 Bug Bounty Year in Review

2017 Bug Bounty Year in Review

7 minute read

At Shopify, our bounty program complements our security strategy and allows us to leverage a community of thousands of researchers who help secure our platform and create a better Shopify user experience. We first launched the program in 2013 and moved to the HackerOne platform in 2015 to increase hacker awareness. Since then, we've continued to see increasing value in the reports submitted, and 2017 was no exception.

Continue reading

Implementing ChatOps into our Incident Management Procedure

Implementing ChatOps into our Incident Management Procedure

8 minute read

Production engineers (PE) are expected to be incident management experts. Still, incident handling is difficult, often messy, and exhausting. We encounter new incidents, search high and low for possible explanations, sometimes tunnel on symptoms, and, under pressure, forget some best practices.

At Shopify, we care not only about handling incidents quickly and efficiently, but also PE well-being. We have a special IMOC (incident manager on call) rotation and an incident chatbot to assist IMOCs. This post provides an overview of incident management at Shopify, the responsibility of different roles during an incident, and how our chatbot works to support our team.

Continue reading

How Shopify Merchants can Measure Retention

How Shopify Merchants can Measure Retention

At Shopify, our business depends upon understanding the businesses of the more than 500,000 merchants who rely on our platform. Customers are at the heart of any business, and deciphering their behavior helps entrepreneurs to effectively allocate their time and money. To help our merchants, we set upon tackling the nontrivial problem of helping our merchants determine customer retention.

When a customer stops buying from a business, we call that churn. In a contractual business (like software as a service), it’s easy to see when a customer leaves because they dissolve their contract. By comparison, in a non-contractual business (like a clothing store), it’s more difficult as the customer simply stops purchasing without any direct notification. This business won’t know, so we can’t describe it as deterministic. Entrepreneurs running non-contractual businesses can better define churn using probability.

Correctly describing customer churn is important: picking the wrong churn model means your analysis will be either full of arbitrary assumptions or misguided. Far too often businesses define churn as no purchases after N days; typically N is a multiple of 7 or 30 days. Because of this time-limit, it arbitrarily buckets customers into two states: active or inactive. Two customers in the active state may look incredibly different and have different propensities to buy, so it’s unnatural to treat them the same. For example, a customer who buys groceries in bulk should be treated differently than a customer who buys groceries every day. This binary model has clear limitations.

Our Data team recognized the limitation of defining churn incorrectly, and that we had to do better. Using probability, we have a new way to think about customer churn. Imagine a few hypothetical customers visit a store, visualized in the below figure. Customer A is reliable. They are a long-time customer and buy from your store every week. It’s been three days since you last saw them in your store but chances are they’ll be back. Customer B’s history is short-lived. When they first found your store, they made purchases almost daily, but now you haven’t seen them in months, so there’s a low chance of them still being considered active. Customer C has a slower history. They buy something from your store a couple times a year, and you last saw them 10 months ago. What can you say about Customer C’s probability of being active? It’s likely somewhere in the middle.

How Shopify Merchants can Measure Retention
We can formalize this intuition of probabilistic customers in a model. We’ll consider a simple model for now. Suppose each customer has two intrinsic parameters: a rate of purchasing, \(\lambda\), and a probability of churn event, \(p\). From the business point of view, even if a customer churns, we don’t see the churn event and we can only infer churn from their purchase history. Given a customer’s rate of purchase, their times between purchases is exponentially distributed with rate \(\lambda\), which means it looks like a Poisson process. After each future purchase, the customer has a \(p\) chance of churning. Rather than trying to estimate every customer's’ parameters, we can think about an individual customer’s parameter coming from a probability distribution. Thus we can estimate the distribution that generates the parameters, and hence, the customers’ behavior. Altogether this is known as a hierarchical model, where there are unobservables (the customer behaviors) being created from probability distributions.
The probability distributions for \(\lambda\) and \(p\) are different for each business. The first step in applying this model is to estimate your specific business’s distributions for these quantities. Let’s assume that a customer’s \(\lambda\) comes from Gamma distribution (with currently unknown parameters), and \(p\) comes from a Beta distribution (also with currently unknown parameters). This is the model the authors of “Counting Your Customers the Easy Way: An Alternative to the Pareto/NBD Model” propose. They call it the BG/NBD (Beta Geometric / Negative Binomial Distribution) model.
BG/NBD (Beta Geometric / Negative Binomial Distribution) model

Further detail on implementing the BG/NBD model is given below, but what’s interesting is that after writing down the likelihood of the model, the sufficient statistics turn out to be:

  • Age: the duration between the customer’s first purchase and now
  • Recency: what was the Age of the customer at their last purchase?
  • Frequency: how many repeat purchases have they made?

Because the above statistics (age, frequency, recency) contain all the relevant information needed, we only need to know these three quantities per customer as input to the model. These three statistics are easily computed from the raw purchase data. Using these new statistics, we can redescribe our customers above:

  • Customer A has a large Age, Frequency, and Recency.
  • Customer B has a large Age and Frequency, but much smaller Recency.
  • Customer C has a large Age, low Frequency, and moderate Recency.

Being able to statistically determine the behaviors of Customers A, B and C means an entrepreneur can better run targeted ad campaigns, introduce discount codes, and project customer lifetime value.

The individual-customer data can be plugged into a likelihood function and fed to a standard optimization routine to find the Gamma distribution and Beta distribution parameters \((r, \alpha)\), and \((a, b)\), respectively. You can use the likelihood function derived in the BG/NBD paper for this:


We use optimization routines in Python, but the paper describes how to do this in a spreadsheet if you prefer.

Once these distribution parameters are known \((\alpha, r, a, b)\), we can look at metrics like the probability of a customer being active given their purchase history. Organizing this as a distribution is useful as a proxy for the health of a customer base. Another view is to look at the heatmap of the customer base. As we vary the recency of a customer, we expect the probability of being active to increase. And as we vary the frequency, we expect the probability to increase given a high recency too. Below we plot the probability of being active given varying frequency and recency:

How Shopify Merchants Can Measure Retention - Probability of Being Active, by Frequency and Recency

The figure reassures us that the model behaves as we expect. Similarly, we can look at the expected number of future purchases in a single unit of time: 

Expected Number of Future Purchases for 1 Unit of Time

At Shopify, we’re using a modified BG/NBD model implemented in lifetimes, an open-source package maintained by the author and the Shopify Data team. The resulting analysis is sent to our reporting infrastructure to display in customer reports. We have over 500K merchants that we can train the BG/NBD model on, all in under an hour. We do this by using Apache Spark’s DataFrames to pick up the raw data, group rows by the shop, and apply a Python user-defined function (UDF) to each partition.  The UDF contains the lifetimes estimation algorithm. For performance reasons, we subsample to 50k customers per shop because the estimation beyond this yielded diminishing returns. After fitting the data to the BG/NBD model’s parameters, we apply the model to each customer in that shop, and yield the results again. In all, we infer churn probabilities and expected values for the over 500 million historical merchant customers.

One reason for choosing the BG/NBD model is its easy interpretability. Because we are displaying the end results to shop owners, we didn’t want the model to be a black-box that they’d have a difficult time explaining why a customer was at-risk or loyal. Recall the variables the BG/NBD model requires are age, frequency and recency. Each of these variables is easily understood by even non-technical individuals. The BG/NBD model is codifying the interactions between these three variables and providing quantitative measures based on them. On the other hand, the BG/NBD does suffer from over simplicity. It doesn’t handle seasonal trends well. For example, the frequency term collapses all purchases into a single value, ignoring any seasonality in the purchase behaviour. Another limitation is using this model, you cannot add additional customer variables to the model (ex: country, products purchased) easily.

Once we fitted a model for a store, we rank customers from highest to lowest probability of being active. The highest customers are the reliable customers. The lowest customers are unlikely to come back. The customers around 50% probability are at risk of churning, so targeted campaigns could be made to entice them back, possibly reviving the relationship and potentially gaining a life-long customer. By providing these statistics, our merchants are in a position to drive smarter marketing campaigns, order fulfillment prioritization, and customer support.

Continue reading

How We Enable Our Interns to Make an Impact

How We Enable Our Interns to Make an Impact

Making an Impact

When interns join Shopify for their internship term, they work on projects that will impact our merchants, partners, and even their fellow developers. Some of these projects will alleviate a merchant's pain points, like the ability to sell their products on different channels, or simplify a complicated process for our developers. We want interns to leave knowing they worked on real projects with real impact.

Continue reading

Tell Your Stories: The Benefits of Strategic Engineering Communications

Tell Your Stories: The Benefits of Strategic Engineering Communications

In early 2016, we faced a problem at Shopify. We were growing quickly, and decisions could no longer be made across the room, so to speak. Four offices became five, accommodating that growth raised interesting questions like: how would new people know the history of the company, and how could existing Shopifolk keep up with new developments? In addition to sharing knowledge inside the company, we also wanted to let people outside Shopify know what we were working on to give back to the community and to support recruitment efforts.

Engineering communications was born to solve a specific problem. A valued saying here is “do things, tell people,” but, while we’re very good at the first part, we weren’t living up to expectations on the second. Ad hoc worked when we were smaller, but with technical stories now coming from teams as varied as production engineering, mobile, front-end development, and data engineering, we needed something more formalized. Strong communications inside the engineering team could help prevent the overlap of work by different teams or the duplication of mistakes, and it could support cross-pollination of ideas.

Continue reading

How Shopify Governs Containers at Scale with Grafeas and Kritis

How Shopify Governs Containers at Scale with Grafeas and Kritis

Today, Google and its contributors launched Grafeas, an open source initiative to define a uniform way for auditing and governing the modern software supply chain. At Shopify, we’re excited to be part of this announcement.

Grafeas, or “scribe” in Greek, enables us to store critical software component metadata during our build and integration pipelines. With over 6,000 container builds per day and 330,000 images in our primary container registry, the security team was eager to implement an appropriate auditing strategy to be able to answer questions such as:

  • Is this container deployed to production?
  • When was the time this container was pulled (downloaded) from our registry?
  • What packages are installed in this container?
  • Does this container contain any security vulnerabilities?
  • Does this container meet our security controls?

Using Grafeas as the central source of truth for container metadata has allowed the security team to answer these questions and flesh out appropriate auditing and lifecycling strategies for the software we deliver to users at Shopify.

Here’s a sample of some of the container introspection we gain from Grafeas. In this example we have details surrounding the origin of this container including its build details, base image and the operations that resulted in the container layers.

Build Details:

Image Basis:

As part of Grafeas, Google also introduced Kritis, or “judge” in Greek, which allows us to use the metadata stored in Grafeas to build and enforce real-time deployment policies with Kubernetes. During CI, a number of audits are performed against the containers and attestations are generated. These attestations make up the policies we can enforce with Kritis on Kubernetes.

At Shopify we use PGP to digitally sign our attestations, ensuring the identity of our builder and other attestation authorities.

Here’s an example of a signed attestation:

The two key concepts of Kritis are attestation authorities and policies. Attestation authorities are described as a named entity which has the capability to create attestations. A policy would then name one or more attestation authorities whose attestations are required to deploy a container to a particular cluster. Here’s an example of what that might look like:

Given the above attestation authorities (built-by-us and tested) we can deploy a policy similar to this example:

This policy would preclude the deployment of any container that does not have signed attestations from both authorities.

Given this model, then we can create a number of attestation authorities which satisfy particular security controls.

Attestation Examples:

  • This container has been built by us
  • This container comes from our (or a trusted) container repository
  • This container does not run as root
  • This container passes CI tests
  • This container does not introduce any new vulnerabilities (scanned)
  • This container is deployed with the appropriate security context

Given the attestation examples above, we can enable Kritis enforcement on our Kubernetes clusters that ensures we only run containers which are free from known vulnerabilities, have passed our CI tests, do not run as root, and have been built by us!

In addition to build time container security controls we can also generate Kritis attestations for the Kubernetes workload manifests with the results of kubeaudit during CI. This means we can ensure there are no regressions in the runtime security controls before the container is even deployed.

Using tools like Grafeas and Kritis has allowed us to inject security controls into the DNA of Shopify’s cloud platform to provide software governance techniques at scale alongside our developers, unlocking the velocity of all the teams.

We’re really excited about these new tools and hope you are too! Here are some of the ways you can learn more about the projects and get involved:

Try Grafeas now and join the GitHub project: https://github.com/Grafeas

Attend Shopify’s talks at Google Cloud Summit in Toronto on 10/17 and KubeCon in December.

See grafeas.io for documentation and examples.

Continue reading

Building Shopify Mobile with Native and Web Technology

Building Shopify Mobile with Native and Web Technology

For mobile apps to have an excellent user experience, they should be fast, use the network sparingly, and use visual and behavioural conventions native to the platform. To achieve this, the Shopify Mobile apps are native iOS and Android, and they're powered by GraphQL. This ensures our apps are consistent with the platforms they run on, are performant, and use the network efficiently.

This essentially means developing Shopify on each platform: iOS, Android, and web. As Shopify has far more web developers than mobile developers, it’s almost impossible to keep pace with the feature releases on the web admin. Since Shopify has invested in making the web admin responsive, we often leverage parts of the web to maintain feature parity between mobile and desktop platforms.

Core parts of the app that are used most are native to give the best experience on a small screen. A feature that is data-entry intensive or has high information density is also a good candidate for a native implementation that can be optimized for a smaller screen and for reduced user input. For secondary activities in the app, web views are used. Several of the settings pages, as well as reports, which are found in the Store tab, are web views.  This allows us to focus on creating a mobile-optimized version of the most used parts of our product, while still allowing our users to have access to all of Shopify on the go.

With this mixed-architecture approach, not only can a user go from a native view to a web view, using deep-links the user can also be presented a native view from a web view. For example, tapping a product link in a web view will present the native product detail view.

At Unite, our developer conference, Shopify announced Polaris, a design language that we use internally for our web and mobile applications. A common design language ensures our products are familiar to our users, as well as helping to facilitate a mixed architecture where web pages can be used in conjunction with native views.

Third Party Apps

In addition to the features that are built internally, Shopify has an app platform, which allows third party developers to create (web) applications that extend the functionality of Shopify. In fact, we have an entire App Store dedicated to showcasing these apps. These apps authenticate to Shopify using OAuth, and consume our REST APIs. We also offer a JavaScript SDK called the Embedded App SDK (EASDK) that allow apps to be launched within an iframe of the Shopify Admin (instead of opening the app in another tab), and to use Shopify’s navigation bars, buttons, pop ups, and status messages. Apps that use the EASDK are called "embedded apps," and most of the applications developed for Shopify today are embedded.

Our users rely on these third party apps to run their business, and they are doing so increasingly from their mobile devices. When our team was tasked with bringing these apps to mobile, we quickly found these apps use too much vertical real-estate for their navigation when loaded in a web view. Also, it would introduce inconsistencies between the native app navigation bars and their web counterparts. It was clear that this would be a sub-par experience. Additionally since these apps are maintained by third-party developers, it would not be possible to update them to be responsive.

Our goal was to have apps optimize their screen usage, and have them look and behave like the rest of the mobile app. We wanted to achieve this without requiring existing apps to make any code change.  This approach means our users would have all of the apps they currently use, in addition to access to the thousands of apps available on the Shopify App Store on the day we released the feature.

Content size highlighted in an app rendered in a web view (left) vs. in Shopify Mobile (right).  


This screenshots above illustrate what an app would look like rendered in a web view as-is, vs. how they look now, optimized for mobile. Much of the navigation bar elements have been collapsed into the native nav bar, which allows the app to reclaim the vertical space for content instead of displaying a redundant navigation bar. Also, the web back button has been combined into the native navigation back stack, so tapping back through the web app is the same as navigating back through native views.  These changes allowed the apps to reclaim more than 40% more vertical real estate.

I'll now go through how we incorporated each element.

Building the JavaScript bridge

The EASDK is what apps use to configure their UI within Shopify. We wanted to position the Shopify Mobile app to be on the receiving end of this API, much like the Shopify web admin is today. This would allow existing apps to use the EASDK with no changes. The EASDK contains several methods to configure the navigation bar which can consist of buttons, title, breadcrumbs and pagination. We looked at reducing the amount of items that the navigation bar needed to render, and starting pruning. We found that the breadcrumbs and pagination buttons were not necessary, and not a common pattern for mobile apps. They were the first to be cut. The next step was to collapse the web navigation bar into the native bar. To do this, we had to intercept the JavaScript calls to the EASDK methods.

To allow interception of calls to the EASDK, we created a middleware system on Shopify web that could be injected by the mobile apps. This allows Shopify Mobile to augment the messages before they hit their final destination or suppress them entirely. This approach is very flexible, and generic; clients can natively implement features piecemeal without the need for versioning between client and server.

This middleware is implemented in JavaScript and bundled with the mobile apps. A single shared JavaScript file contains the common methods for both platforms, and then separate platform-specific files which contain the iOS and Android specific native bridging.

High level overview of the data flow from an embedded app to native code on the Shopify Mobile

The shared JavaScript file injects itself into the main context, extends the Shopify.EmbeddedApp class, and overrides the methods that are to be intercepted on the mobile app. The methods in this shared file simply forward the calls to another object, Mobile, which is implemented in the separate files for iOS and Android.


Shared JS File

On iOS, WKWebView relies on postMessage to allow the web-page to communicate to native Swift. The two JavaScript files are injected into the WKWebView using WKUserScript. The iOS specific JavaScript file forwards the method calls from the EASDK into postMessages that are intercepted by the WKScriptMessageHandler.


iOS JS File


iOS native message handling

On Android, a Java Object can be injected into the WebView, which gives the JavaScript access to its methods.



Android JS File

 

Android native message handling

When an embedded app is launched from the mobile app, we inject a URL parameter to inform Shopify not to render the web nav bar since we will be doing so natively. As calls to the EASDK methods are intercepted, the mobile apps render titles, buttons and activity indicators natively. This provides better use of the screen space, and required no changes to the third party apps, so all the existing apps work as-is!

Communicating from native to web

App with native primary button, and secondary buttons in the overflow menu


In addition to intercepting calls from the web, the mobile apps need to communicate user interactions back to the web.  For instance, when a user taps a native button, we need to trigger the appropriate behaviour as defined in the embedded app.  The middleware facilitates communicating from native to web via HTML postMessages.  Buttons have an associated message name, which we use when a button is tapped.

Alternatively, a button can be defined to load a URL, in which case we can simply load the target URL in the web view. A button can also be configured to emit a postMessage.


 

iOS implementation of button handling

Android implementation of button handling

Summary

By embracing the web in our mobile apps, we are able to keep pace with the feature releases of the rest of Shopify while complementing it with native versions for features that merchants use most. This also allows us to extend the Shopify Mobile with apps that were created by our third party developers with no change to the EASDK. By complementing the web view with a JavaScript bridge, we were able to optimize the real estate and make the apps more consistent with the rest of the mobile app.

With multiple teams contributing features to Shopify Mobile concurrently, our mobile app is the closest it’s been to reaching feature parity with the web admin, while ensuring the frequently used parts of the app to be optimized for mobile by writing them natively.

To learn more about creating apps for Shopify Mobile, check out our developer resources.

Continue reading

Code Style Consistency for Shopify’s Decade-Old Codebase

Code Style Consistency for Shopify’s Decade-Old Codebase

5 minute read

Over the course of Shopify's 13-year codebase history, the core platform has never been rewritten. That meant a slew of outdated code styles, piling atop of one another and without a lot of consistency. By 2012, our CEO Tobi created a draft Ruby style guide to keep up with the growth. Unfortunately, it never became embedded in our programming culture and many people didn't even know it existed.

Continue reading

Integrating with Amazon: How We Bridged Two Different Commerce Domain Models

Integrating with Amazon: How We Bridged Two Different Commerce Domain Models

Over the past decade, the internet and mobile devices became the dominant computing platforms. In parallel, the family of software architecture styles that support distributed computing are the ways we build systems to tie these platforms together. Styles fall in and out of favor as technologies evolve and as we, the community of software developers, gain experience building ever more deeply connected systems.

If you’re building an app to integrate two or more systems, you’ll need to bridge between two different domain models, communication protocols, and/or messaging styles. This is the situation that our team found itself in as we were building an application to integrate with Amazon’s online marketplace. This post talks about some of our experiences integrating two well-established but very different commerce platforms.

Shopify is a multi-channel commerce platform enabling merchants to sell online, in stores, via social channels (Facebook, Messenger and Pinterest), and on marketplaces like Amazon from within a single app. Our goals for the Amazon channel were to enable merchants to use Shopify to:

  • Publish products from Shopify to Amazon
  • Automatically sync orders that were placed on Amazon back to Shopify
  • Manage synced orders by pushing updates such as fulfillments and refunds back to Amazon

At Shopify, we deal with enormous scale as the number of merchants on our platform grows. In the beginning, to limit the scale that our Amazon app would face, we set several design constraints including:

  • Ensure the data is in sync to enable merchants to meet Amazon’s SLAs
  • Limit the growth of the data our app stores by not saving order data

In theory, the number of orders our app processes is unbounded and increases with usage. By not storing order data, we believed that we could limit the rate of growth of our database, deferring the need to build complex scaling solutions such as database sharding.

That was our plan, but we discovered during implementation that the differences between the Amazon and Shopify systems required our app to do more work and store more data. Here’s how it played out.

Integrating Domain Woes

In an ideal world, where both systems use a similar messaging style (such as REST with webhooks for event notification) the syncing of an order placed on Amazon to the Shopify system might look something like this:

Integrating with Amazon: How we bridged two different commerce domain models

Each system notifies our app, via a webhook, of a sale or fulfillment. Our app transforms the data into the format required by Amazon or Shopify and creates a new resource on that system by using an HTTP POST request.

Reality wasn’t this clean. While Shopify and Amazon have mature APIs, each has a different approach to the design of these APIs. The following chart lists the major differences:

Shopify API
Amazon’s Marketplace Web Server (MWS) API
  • uses representational state transfer (REST)
  • uses remote procedure call (RPC) messaging style
  • synchronous data write requests
  • asynchronous data write requests
  • uses webhooks for event notification
  • uses polling for event discovery, including completion of asynchronous write operations.

To accommodate, the actual sequence of operations our app makes is:

  1. Request new orders from Amazon
  2. Request order items for new orders
  3. Create an order on Shopify
  4. Acknowledge receipt of the order to Amazon
  5. Confirm that the acknowledgement was successfully processed

When the merchant subsequently fulfills the order on Shopify by the merchant, we receive a webhook notification and post the fulfillment to Amazon. The entire flow looks like this:

Integrating with Amazon: How we bridged two different commerce domain models

When our app started to receive an odd error from Amazon when posting fulfilment requests we knew the design wasn’t totally figured out. It turned out that our app received the fulfillment webhook from Shopify before the order acknowledgement was sent to Amazon. Therefore when we attempted to send the fulfillment to Amazon, it failed. 

Shopify has a rich ecosystem of third-party apps for merchants’ shops. Many of these apps help automate fulfillment by watching for new orders and automatically initiating a shipment. We had to be careful because one of these apps could trigger a fulfilment request before our app sends the order acknowledgement back to Amazon.

Shopify uses a synchronous messaging protocol requiring two messages for order creation and fulfillment. Amazon’s messaging protocol is a mix of synchronous (retrieving the order and order items) and asynchronous messages (acknowledging and then fulfilling the order), which requires four messages. All six of these messages need to be sent and processed in the correct sequence. This is a message ordering problem: we can’t send the fulfillment request to Amazon until the acknowledgement request has been sent and successfully processed even if we get a fulfillment notification from Shopify. We solved the message ordering problem by holding the fulfillment notification from Shopify until the order acknowledgement is processed by Amazon.

Another issue cropped up when we started processing refunds. The commerce domain model implemented by Amazon requires refunds to be associated with an item sold while Shopify allows for more flexibility. Neither model is wrong, they simply reflect the different choices made by the respective teams when they chose the commerce use-cases to support.

To illustrate, consider a simplified representation of an order received from Amazon.

This order contains two items, a jersey and a cap. The item and shipping prices for each are just below the item title. When creating the order in Shopify, we send this data with the same level of detail, transformed to JSON from the XML received from Amazon.

Shopify is flexible and allows the merchant to submit the refund either quickly by entering a refund amount, or with more a detailed method specifying the individual items and prices. If the merchant takes the quicker approach, Shopify sends the following data to our app when the refund is created:

Notice that we didn’t get an item-by-item breakdown of the item or shipping prices from Shopify. This causes a problem because we’re required to send Amazon values for price, shipping costs, and taxes for each item. We solved this by retaining the original order detail retrieved from Amazon and using this to fill in missing data when sending the refund details back.

Lessons Learned

Our choices violated the design constraint that we initially set to not persist order data. Deciding to persist orders and all the detail retrieved from Amazon in our app’s database enabled us to solve our problems integrating the different domain models. Looking back, here are a few things we learned:

  • It’s never wrong to go back and re-visit assumptions, decisions, or constraints put in place early in a project. You’ll learn something more about your problem with every step you take towards shipping a feature. This is how we work at Shopify, and this project highlighted why this flexibility is important
  • Understand the patterns and architectural style of the systems with which you’re integrating. When you don’t fully account for these patterns, it can cause implementation difficulties later on. Keep an eye open for this
  • Common integration problems include message ordering and differences in message granularity. A persistence mechanism can be used to overcome these. In our case, we needed the durability of an on-disk database

By revisiting assumptions, being flexible, and taking into account the patterns and architectural style of Amazon, the team successfully integrated these two very different commerce domains in a way that benefits our merchants and makes their lives easier.

Continue reading

How Shopify Capital Uses Quantile Regression To Help Merchants Succeed

How Shopify Capital Uses Quantile Regression To Help Merchants Succeed

6 minute read

Shopify Capital provides funding to help merchants on Shopify grow their businesses. But how does Shopify Capital award these merchant cash advances? In this post, I'll dive deep into the machine-learning technique our Risk-Algorithms team uses to decide eligibility for cash advances.

The exact features that go into the predictive model that powers Shopify Capital are secret, but I can share the key technique we use: quantile regression.

Continue reading

Upgrading Shopify to Rails 5

Upgrading Shopify to Rails 5

Today, Shopify runs on Rails 5.0, the latest version. It’s important to us to stay on the latest version so we can improve the performance and stability of the application without having to increase the maintenance cost of applying monkey patches. This guarantees we would always be in the version maintained by the community; and, that we would have access to new features soon.

Upgrading the Shopify monolith—one of the oldest and the largest Rails applications in the industry—from Rails 4.2 to 5.0 took us nearly a year. In this post, I’ll share our upgrade story and the lessons we learned. If you're wondering how the Shopify scale looks like or you plan a major Rails upgrade, this post is for you.

Continue reading

Maintaining a Swift and Objective-C Hybrid Codebase

Maintaining a Swift and Objective-C Hybrid Codebase

6 minute read

Swift is gaining popularity among iOS developers, which is of no surprise. It's strictly typed, which means you can prove the correctness of your program at compile time, given that your typesystem describes the domain well. It's a modern language offering syntax constructs encouraging developers to write better architecture using fewer lines of code, making it expressive. It's more fun to work with, and all the new Cocoa projects are being written in Swift. 

At Shopify, we want to adopt Swift where it makes sense, while understanding that many existing projects have an extensive codebase (some of them written years ago) in Objective-C (OBJC) that are still actively supported. It's tempting to write new code in Swift, but we can't migrate all the existing OBJC codebase quickly. And sometimes it just isn't worth the effort.

Continue reading

How 17 Lines of Code Improved Shopify.com Loading by 50%

How 17 Lines of Code Improved Shopify.com Loading by 50%

3 minute read

Big improvements don't have to be hard nor take a long time to implement. It took, for example, only 17 lines of code to decrease the time to display text on Shopify.com by 50%. That saved visitors 1.2 seconds: each second matters given that 40% of users expect a website to load within two seconds and those same users will abandon a site if it takes longer than three.  

Continue reading

Bootsnap: Optimizing Ruby App Boot Time

Bootsnap: Optimizing Ruby App Boot Time

8 minute read

Hundreds of Shopify developers work on our largest codebase, the monolithic Rails application that powers most of our product offering. There are various benefits to having a “majestic monolith,” but also a few downsides. Chief among them is the amount of time people spend waiting for Rails to boot.

Doing development, two of the most common tasks are running a development server and running a unit test file. By improving the performance of these tasks, we will also improve the experience for developers working on this codebase and achieve higher iteration speed. We started measuring and profiling the following code paths:

  • Development server: time to first request
  • Unit testing: time to first unit test

Continue reading

Building a Dynamic Mobile CI System

Building a Dynamic Mobile CI System

18 minute read

The mobile space has changed quickly, even within the past few years. At Shopify, the world’s largest Rails application, we have seen the growth and potential of the mobile market and set a goal of becoming a mobile-first company. Today, over 130,000 merchants are using Shopify Mobile to set up and run their stores from their smartphones. Through the inherent simplicity and flexibility of the mobile platform, many mobile-focused products have found success.

 

This post was co-written with Arham Ahmed, and shout-outs to Sean Corcoran of MacStadium and Tim Lucas of Buildkite.

Continue reading

The Side Hustle: Building a Quadcopter Controller for iOS

The Side Hustle: Building a Quadcopter Controller for iOS

Our engineering blog is home to our stories sharing technical knowledge and lessons learned. But that's only part of the story: we hire passionate people who love what they do and are invested in mastering their craft. Today we launch "The Side Hustle," an occasional series highlighting some side projects from our devs while off the Shopify clock.

When Gabriel O'Flaherty-Chan noticed quadcopter controllers on mobile mostly translated analog controls to digital, he took it upon himself to find a better design.

7 minute read

For under $50, you can get ahold of a loud little flying piece of plastic from Amazon, and they’re a lot of fun. Some of them even come with cameras and Wi-Fi for control via a mobile app.

Unfortunately, these apps are pretty low quality — they’re unreliable and frustrating to use, and look out of place in 2017. The more I used these apps, the more frustrated I got, so I started thinking about ways I could provide a better solution, and two months later I emerged with two things:

1. An iOS app for flying quadcopters called SCARAB, and

2. An open-source project for building RC apps called QuadKit

Continue reading

Sharing the Philosophy Behind Shopify's Bug Bounty

Sharing the Philosophy Behind Shopify's Bug Bounty

2 minute read

Bug bounties have become commonplace as companies realize the advantages to distributing the hunt for flaws and vulnerabilities among talented people around the world. We're no different, launching a security response program in 2012 before evolving it into a bug bounty with HackerOne in 2015. Since then, we've seen meaningful results including nearly 400 fixes from 250 researchers, to the tune of bounties totalling over half a million dollars.

Security is vital for us. With the number of shops and volume of info on our platform, it's about maintaining trust with our merchants. Entrepreneurs are running their businesses and they don't want to worry about security, so anything we can do to protect them is how we measure our success. As Tobi recently mentioned on Hacker News, “We host the livelihoods of hundreds of thousands of other businesses. If we are down or compromised all of them can't make money.” So, we have to ensure any issue gets addressed.

Continue reading

Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers (Part II)

Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers (Part II)

7 minute read

In the first post of this series, I outlined Shopify’s history with flash sales, our move to Nginx and Lua to help manage traffic, and the initial attempt we made to throttle traffic that didn’t account sufficiently for customer experience. We had underestimated the impact of not giving preference to customers who’d entered the queue at the beginning of the sale, and now we needed to find another way to protect the platform without ruining the customer experience.

 

Continue reading

Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers (Part I)

Surviving Flashes of High-Write Traffic Using Scriptable Load Balancers (Part I)

7 minute read

This Sunday, over 100 million viewers will watch the Super Bowl. Whether they’re catching the match-up between the Falcons and the Patriots, or there for the commercials between the action, that’s a lot of eyeballs—and that’s only counting America. But all that attention doesn’t just stay on the screen, it gets directed to the web, and if you’re not prepared curious visitors could be rewarded with a sad error page.

The Super Bowl makes us misty-eyed because our first big flash sale happened in 2007, after the Colts beat the Bears. Fans rushed online for T-shirts celebrating the win, giving us a taste of what can happen when a flood of people convene on one site in a very short duration of time. Since then, we’ve been continually levelling up our ability to handle flash sales, and our merchants have put us to the test: on any given day, they’ll hurl Super Bowl-sized traffic, often without notice.

 

Continue reading

Why Shopify Moved to The Production Engineering Model

Why Shopify Moved to The Production Engineering Model

6 minute read

The traditional model of running large-scale computer systems divides work into Development and Operations as distinct and separate teams. This split works reasonably well for computer systems that are changed or updated very rarely, and organizations sometimes require this if they’re deploying and operating software built by a different company or organization. However, this rigid divide fails for large-scale web applications that are undergoing frequent or even continuous change. DevOps is the term for a movement that’s gathered steam in the past decade to bring together these disciplines.

 

Continue reading

Automatic Deployment at Shopify

Automatic Deployment at Shopify

6 minute read

Hi, I'm Graeme Johnson, and I work on Shopify's Developer Acceleration team. Our mission is to provide tools that let developers ship fast and safely. Recently we began shipping Shopify automatically as developers hit the merge button in GitHub. This removes the final manual step in our deploy pipeline, which now looks like this:

Merge → Build container → Run CI → Hit deploy button → Ship to production

We have invested a lot of engineering effort to make this pipeline fast enough to run end-to-end in about 15 minutesstill too slow for our tasteand robust enough to allow cancellation at any stage in the process. Automating the actual deploy trigger was the next logical step.

Continue reading

How We're Thinking About Commerce and VR With Our First VR App, Thread Studio

How We're Thinking About Commerce and VR With Our First VR App, Thread Studio

3 minute read

Hey everyone! I’m Daniel and I lead our VR efforts at Shopify.

When I talk to people about VR and commerce, the first idea that usually pops into their heads is about all the possibilities of walking around a virtual shopping mall. While that could be an enjoyable experience for some, I find it’s a very limiting view of how virtual reality can actually improve retail.

If VR gave you the superpowers to do anything, create anything, and go anywhere you want, would you really want to go shopping in a regular mall?

More than a virtual mall

It’s easy to take a new medium and try to shoehorn in what already exists and is familiar. What’s hard is figuring out what content makes the medium truly shine and worthwhile to use. VR offers an amazing storytelling platform for brands. For the first time, brands can put people in the stories that their products tell.

If you’re selling scuba gear, why not show what it’d look like underwater with jellyfish passing by? Or a tent on a windy, chilly cliff, reflecting the light of a scrappy fire? It sure would beat being in a fluorescent-lit camping store. In VR, you could explore inside a tent before you buy it, or change the environment around you at a press of a button.

Continue reading

Shopify Merchants Will Soon Get AMP'd

Shopify Merchants Will Soon Get AMP'd

1 minute read

Today we're excited to share our involvement with the AMP Project.

Life happens on mobile. (In fact, there are over seven billion small screens now!) We're not only comfortable with shopping online, but increasingly we're buying things using our mobile devices. Delays can mean the difference between a sale or no sale, so it's important to make things run as quickly as possible.

AMP, or Accelerated Mobile Pages, is an open source, Google-led initiative aimed at improving the mobile web experience and solving the issue of slow loading content. (You can learn more about the tech here.) Starting today, Google is pointing to AMP’d content beyond their top stories carousel to include general web search results.

Continue reading

How Our UX Team's Approaching Accessibility

How Our UX Team's Approaching Accessibility

Last updated: September 9, 2016

2 minute read

At Shopify, our mission is to make commerce better for everyone. When we say better, we’re talking about caring deeply about making quality products. To us, a quality web product means a few things: certainly beautiful design, engaging copy, and a fantastic user experience, but just as important are inclusivity and the principles of universal design.

“Everyone” is a pretty big group. It includes our merchants, their customers, our developer partners, our employees, and the greater tech community at large, where we love to lead by example. “Everyone” also includes:

We take our mission to heart, so it’s important that Shopify products are useable and useful to all our users. This is something we’ve been thinking about and working on for a few years, but it’s an ongoing, difficult challenge. Luckily, we love tackling challenging problems and we’re constantly chipping away at this one. We’ve learned a lot from the community and think it’s important to contribute back, so — in celebration of Global Accessibility Awareness Day — we’re thrilled to announce a series of posts on accessibility.

 

Continue reading

How to Set Up Your Own Mobile CI System

How to Set Up Your Own Mobile CI System

1 minute read

Editor's note: a more updated post on this topic is now up! Check out Sander Lijbrink's "Building a Dynamic Mobile CI System."

Over the past few years the mobile development community has seen a dramatic shift towards the use of continuous integration (CI) systems similar to changes present in other communities — particularly web developers. This shift has been a particularly powerful moment for mobile developers, as they’re able to focus on their apps and code rather than spending their time on provisioning, code signing, deployment, and running tests.

I’m a software developer at Shopify currently working on our Developer Acceleration’s Mobile team. My job is to design, create, and manage an automated system to provide an accelerated development experience for our developers.

Based on our experiences at Shopify, we will be talking about “hosted” vs “BYOH” systems, how to provision Mac OS X and Ubuntu machines for iOS and Android, and the caveats we ran into throughout this series. By the end, you should be ready to go build your very own CI setup.

 

    Continue reading

    Adventures in Production Rails Debugging

    Adventures in Production Rails Debugging

    5 minute read

    At Shopify we frequently need to debug production Rails problems. Adding extra debugging code takes time to write and deploy, so we’ve learned how to use tools like gdb and rbtrace to quickly track down these issues. In this post, we’ll explain how to use gdb to retrieve a Ruby call stack, inspect environment variables, and debug a really odd warning message in production.

    We recently ran into an issue where we were seeing a large number of similar warning messages spamming our log files:

    
    /artifacts/ruby/2.1.0/gems/rack-1.6.4/lib/rack/utils.rb:92: warning: regexp match /.../n against to UTF-8 string
    

    This means we are trying to match an ASCII regular expression on a UTF-8 source string.

    Continue reading

    Developer Onboarding at Shopify

    Developer Onboarding at Shopify

    5 minute read

    Hi there! We’re Kat and Omosola and we’re software developers at Shopify. We both started working at Shopify back in May, and we felt both excited and a little nervous before we got here. You never know exactly what to expect when you start at a new company and no matter what your previous experience is, there are always a lot of new skills you need to learn. Thankfully, Shopify has an awesome onboarding experience for its new developers, which is what we want to talk about today.

    Continue reading

    Introducing Shipit

    Introducing Shipit

    3 minute read

    After a year of internal use, we’re excited to open-source our deployment tool, Shipit.

    With dozens of teams pushing code multiple times a day to a variety of different targets, fast and easy deploys are key to developer productivity (and happiness) at Shopify. Along with key improvements to our infrastructure, Shipit plays a central role in making this happen.

    Continue reading

    Secrets at Shopify - Introducing EJSON

    Secrets at Shopify - Introducing EJSON

    This is a continuation of our series describing our evolution of Shopify toward a Docker-powered, containerized data centre. Read the last post in the series here.

    One of the challenges along the road to containerization has been establishing a way to move application secrets like API keys, database passwords, and so on into the application in a secure way. This post explains our solution, and how you can use it with your own projects.

    Continue reading

    Announcing go-lua

    Announcing go-lua

    Today, we’re excited to release go-lua as an Open Source project. Go-lua is an implementation of the Lua programming language written purely in Go. We use go-lua as the core execution engine of our load generation tool. This post outlines its creation, provides examples, and describes some challenges encountered along the way.

    Continue reading

    There's More to Ruby Debugging Than puts()

    There's More to Ruby Debugging Than puts()

    "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan

    Debugging is always challenging, and as programmers we can easily spend a good chunk of every day just trying to figure out what is going on with our code. Where exactly has a method been overwritten or defined in the first place? What does the inheritance chain look like for this object? Which methods are available to call from this context?

    This article will take you through some under-utilized convenience methods in Ruby which will make answering these questions a little easier.

      Continue reading

      Building Year in Review 2014 with SVG and Rails

      Building Year in Review 2014 with SVG and Rails

      As we have for the past 3 years, Shopify released a Year in Review to highlight some of the exciting growth and change we’ve observed over the past year. Designers James and Veronica had ambitious ideas for this year’s review, including strong, bold typographic treatments and interactive data visualizations. We’ve gotten some great feedback on the final product, as well as some curious developers wondering how we pulled it off, so we’re going to review the development process for Year in Review and talk about some of the technologies we leveraged to make it all happen.

      Continue reading

      Building and Testing Resilient Ruby on Rails Applications

      Building and Testing Resilient Ruby on Rails Applications

      Black Friday and Cyber Monday are the biggest days of the year at Shopify with respect to every metric. As the Infrastructure team started preparing for the upcoming seasonal traffic in the late summer of 2014, we were confident that we could cope, and determined resiliency to be the top priority. A resilient system is one that functions with one or more components being unavailable or unacceptably slow. Applications quickly become intertwined with their external services if not carefully monitored, leading to minor dependencies becoming single points of failure.

      For example, the only part of Shopify that relies on the session store is user sign-in - if the session store is unavailable, customers can still purchase products as guests. Any other behaviour would be an unfortunate coupling of components. This post is an overview of the tools and techniques we used to make Shopify more resilient in preparation for the holiday season.

      Continue reading

      Start your free 14-day trial of Shopify