Today, Shopify runs on Rails 5, the latest major version. It’s important to us to stay updated so we can improve the performance and stability of the application without having to increase the maintenance cost of applying monkey patches. This guarantees we would always be in the version maintained by the community; and, that we would have access to new features soon.
Upgrading the Shopify monolith—one of the oldest and the largest Rails applications in the industry—from Rails 4.2 to 5.0 took us nearly a year. In this post, I’ll share our upgrade story and the lessons we learned. If you're wondering how the Shopify scale looks like or you plan a major Rails upgrade, this post is for you.
We prepared for the upgrade in two steps. First, we rewrote our code to no longer use deprecated features. Second, we ensured that all gems that we use support the Rails version we’re upgrading.
This work was nontrivial because Shopify heavily relied on protected_attributes, an old feature that was deprecated in Rails 4.0 in 2013. We never prioritized reducing usage of protected_attributes because we were still able to keep using it on our current Rails version, 4.2.
We also have 370 gem dependencies, and many of them had to be upgraded to support Rails 5. Some of the gems were developed by us, meaning that we had to update the gems ourselves (like activerecord_typed_store and activerecord-databasevalidations). There’ve also been gems that were no longer maintained, and we had to take over the maintainership to make them ready for Rails 5.
After the preparation step was done, the app was able to boot on the new Rails version, and we’ve been able to run and fix tests that failed on Rails 5.
In a case of a small app, you can do the Rails upgrade in a branch, create a PR and merge it as soon as it's ready. But at Shopify we have hundreds of developers working in the same repo, merging more than 100 pull requests per day. The only way was to perform the Rails upgrade in the master branch step by step, preserving compatibility with Rails 4.2. We made a contract that the app should be bootable on both Rails versions, allowing some tests to fail on Rails 5. We set up “dual” Gemfile, and had a Gemfile.lock file per each Rails version:
gem 'rails', github: 'rails/rails', branch: '5-0-stable'
gem 'rack', github: 'rack/rack'
gem 'rails', '~> 4.2.7'
gem 'rack', '~> 1.5', '>= 1.5.5'
gem 'responders', '~> 2.2'
This was not the best experience from a developer’s perspective because they had to update both Gemfile.lock and Gemfile_next.lock when they added a new gem. Our very own GitHub bot reminded about not forgetting about the second Gemfile lock:
We used the same dual pattern for CI. We've created a separate CI pipeline for Rails 5 that was allowed to fail until all tests have been fixed. During the upgrade project, all submitted PRs have been tested on both Rails version. Funny fact: this doubled the demand on our CI infrastructure, so we had to provision extra resources for it.
At the beginning we had >1000 tests failing on Rails 5. It sounds like a massive number, and we formed a SWAT team of senior Rails developers to complete this phase of the project as soon as possible. The velocity of fixing the failings tests has been reverse exponential: at the beginning, you could easily fix 100 tests with a one line change, but closer to the last hundred of tests you could spend a few days fixing just one of them. This has been a real challenge, and you never knew whether the issue was in Shopify codebase, in Rails, or in a third-party gem.
There was a massive archaeological component in the upgrade when we had to dive into parts of the monolith that have been written almost a decade ago. It’s fascinating that Rails hasn’t changed that much and that code still works.
Shopify codebase has over a million LOC, which means there's a ton of edge cases that may be not tested by the Rails test suite. The upgrade helped us to discover and fix a dozen of bugs among gems in the Rails ecosystem: rack, bundler, activerecord, actionpack, activesupport.
Another challenge was to fix it and at the same time keep the codebase backward compatible with Rails 4.2. We had to put conditional statements all over the place to control behavior on both Rails versions:
FALSE_VALUES = ActiveModel::Type::Boolean::FALSE_VALUES
FALSE_VALUES = ActiveRecord::ConnectionAdapters::Column::FALSE_VALUES
Finally, when all the tests have been fixed, we made both CI statuses (Rails 4.2 and Rails 5.0) required to pass when you wanted to merge a PR. It made developers at Shopify to write code that would be compatible with both Rails version until we fully roll out the new version. In was pain-free in 99% percent of cases, and in the rest of them, our team was there to help.
Thanks to Docker containers and our shipping pipeline, we could safely roll out Shopify on Rails 5 to a small percentage of the servers and see how it works in production. After the deploy, we would keep this percentage running for some time to collect performance data and discover any compatibilities are test suite missed. Afterward, we'd revert back to Rails 4, fixed the uncovered issues and tried again. We iterated on this several times until we were comfortable rolling it out to 100%.
One of the problematic issues we uncovered this way, was YAML and Marshal-serialized objects in the cache. It became a big deal for internal Rails classes that have changed between versions. The class dumped with Rails 4.2 couldn't be loaded with Rails 5.0 and vice versa. We had to invalidate the caches (which degraded the performance) and roll out Rails 5 for 100% of customers so we wouldn't have two separate versions running at the same time and creating cache conflicts.
Another anecdote was related to deprecations. As soon as we began running Rails 5 in production, from the performance reports we’ve noticed that ActiveSupport::Deprecation#warn was allocating so many objects that it was responsible for 2% of the runtime, which at Shopify scale is a lot. The reason was that we’ve started running the code that generated a huge amount of warnings from various code paths that haven’t fixed new deprecations from Rails 5 yet. Every warning allocated the string with the message and logged it, which caused the performance leak. The solution was to disable warning reporting in production, which sounds like the right thing.
Finally, January 11th was the happiest day of our team:
When the upgrade was shipped, we were left with a codebase that contained a lot of dead code branches to support both Rails versions. We had to clean up. Thanks to custom Rubocop rule, automated that with a single `rubocop --autocorrect`.
We learned many lessons during this upgrade.
- Get rid of deprecated features as soon as possible. The longer you wait, the larger your codebase grows and the harder it would be to clean up.
- Keep your gems upgraded. Keeping major gem versions in your Gemfile up to date will make it less painful to upgrade Rails. At Shopify, we’ve started running a service that watches our repos and suggests when it’s time to upgrade a gem.
- Dual Rails boot and CI is the only option to upgrade larger applications. Dual boot and CI were the key tools that helped our team to collaborate, pick up a test to fix and watch the progress. It’d not be possible to upgrade Rails outside of the master branch because it’d get outdated too quickly.
- Avoid storing serialized data in database or cache. We use Marshal to serialize some Ruby objects to store them in the cache. The internals of these objects may change between Rails versions which made Rails 4.2 cache break when it tried to read cache produced by Rails 5.
The next big step for the Rails team at Shopify is to start running Rails from the upstream. For us, it could solve the problem of long-running Rails upgrades that require a lot of effort, and at the same time, we could better contribute back to the community by discovering bugs and edge cases in the upstream Rails and gems.