Reading Time: 7 minutes
At Shopify, the leading multi-channel commerce platform that powers over 600,000 businesses in approximately 175 countries, we aim at making commerce better for everyone, everywhere. Since Shopify’s early days, it’s been possible to provide customers with a localized translated experience, but merchants had to understand English to use the platform. Fortunately, things have started to change. For the past few months, my team and I focused on international expansion bringing new shipping patterns, new payment paradigms, compliance with local laws and much more to explore. However, the biggest challenge is preparing the platform for our translation efforts.
I speak French. Growing up, I learned that things have genders. A pencil is masculine, but a feather is feminine. A wall is a he, but a chair is a she. Even countries have genders, too — le Canada, but la France. It’s a construct of the language native speakers learned to deal with. It’s so natural, one can usually guess the gender of unknown or new things without even knowing what they are.
Did you know that in English, zero dictates the plural form? We’d say zero cars, car being plural. But in French, zero is always singular, as in, zéro voiture. Singular, no s. Why? I don’t know but each language has their quirks. Sometimes it might be obvious, like genders, or more subtle like a special pluralization rule.
Shopify employs hundreds of developers working on millions of lines of code. For the past twelve years, we collectively hardcoded thousands and thousands of English strings scattered across all our products oblivious to our future international growth. It would be great if we could simply replace words from one language with another, but unfortunately, differences like gender and pluralizations force us to rethink established patterns.
We had to educate ourselves, build new tools, and refactor entire parts of our codebase. We made mistakes, tried different things, and failed many times. But now, six months after we started, Shopify is available in a variety of languages. What you’ll find below is a small collection of thoughts and patterns that have helped us succeed.
Stop the Bleeding
The first step, like with any significant refactoring effort, is to stop the bleeding. At Shopify, we deploy hundreds of hardcoded English words daily. If we were to translate everything that exists today, we’d have to do it again tomorrow and again the day after because we’re always deploying new hardcoded words. As brilliantly explained by my colleague Simon Hørup Eskildsen, it’s unrealistic to think you can align everyone with an email or to fix everything with a single pull request.
Fortunately, Shopify relies on automated tooling (cops, linters, and tests) to communicate best practices and correct violations. It’s the perfect medium to tell developers about new patterns and guide them with contextual insights as they learn about new practices. We built cops and linters to detect a variety of violations:
- Hardcoded strings in HTML files
- Hardcoded strings in specific method arguments
- Hardcoded date and time formats
How we built the cops and linters could be a post on its own, but the concept is what matters here: we knew a pattern to avoid, so we built tools to inform and correct. These tools gave developers a strong feedback loop, prevented the addition of new violations, and gave an estimate of the size of the task we had in front of us.
Automate the Mundane
Shopify has, relatively speaking, quite a big codebase. Due to our cops and linters, we build all new features with translation in mind. However, all the hard-coded content that existed before our intervention still had to be extracted and moved to dictionaries. So we did what any engineer would do; we built tools.
Linters made identifying violations easy. We ran them against every single file of our application and found a significant number of items in need of translation. After identification, we opted for the simplest approach; create a file named after the current module, move the actual content in there, and reference it through a key created from a combination of file path and the content itself. Slowly but surely, all the content was moved to dictionaries. The results weren’t perfect. There was duplicated content and the reference names weren’t always intuitive, but despite this, we extracted most of the basic and mundane stuff, like static content and documentation. What was left were edge cases like complex interpolations — I like to call them, fun challenges.
Pseudolocalization to the Rescue
Identifying the extracted content from everything else immediately became a challenging issue. Yes, some sentences were now in dictionaries, but the product looked exactly the same as before. We needed to distinguish between hardcoded and extracted content, all while keeping the product in a usable state so that translators, content writers, and product managers could stay informed about our progress. Enter pseudolocalization.
Pseudolocalization (or pseudo-localization, or pseudo-translation) is a software testing method used for examining internationalization aspects of software. Instead of translating the text of the software into a foreign language, as in the process of localization, an altered version of the original language replaces the textual elements of an application.
We created a new backend built on top of Rails I18n, the default Rails framework for internationalization, that hijacked all translation calls and swapped resulting characters with an altered yet similar alternative: a became α, b became ḅ, and so on.
Word lengths differ from one language to another. On average, German words are 30% longer and has the potential to seriously mess up a UI built without this knowledge. In French, a simple word like “Save” translates to “Sauvegarder”, which is almost 200% longer. Our pseudotranslation module intercepted all translation calls, so we took the opportunity to double all vowels in an attempt to mimic languages with longer words. The end result was a remarkable achievement in readability. We easily distinguished between content and performed visual testing on the UI against longer words.
ASCII is Dead, Long Live UTF8
Character sets also prove to be a fun challenge. Shopify runs on MySQL. Unfortunately, MySQL’s default utf8 isn’t really UTF-8. It only stores up to three bytes per code point, which means no support for hentaigana, emoji, and other characters outside of the Basic Multilingual Plane. This means that unless explicitly told otherwise, most of our tables didn’t support emoticons characters, and thus needed migration.
On the application side, Rails isn’t perfect neither. Popular methods such as parameterize and ordinalize don’t come with international support built-in.
Identifying and fixing all of these broken behaviors wasn’t an easy task, and we’re still finding occurrences here and there. There is no secret sauce or real generic approach. Some bugs were fixed right away, others were simply deprecated, and some were only rolled out to new customers.
If anything, one trick to try is to introduce UTF8 characters in your fixtures and other data seeds. The more exposed you are to other character sets, the more likely you are to stumble on broken behavior.
Preparing content for translation is one thing, but getting it actually translated is another. Now that everything was in dictionaries, we had to find a way for developers and product managers to request new translations and to talk to translators in a lean, simple, and automated way.
Managing translations isn’t part of our core expertise and other companies do this more elegantly than we ever could. Translators and other linguists rely on specialized tools that empower them with glossaries, memories, automated suggestions, and so on.
So, on one side of this equation, we have Github and our developers, and on the other are translators and their translation management system. Could GitHub’s API, coupled with our translation management system API help bridge the gap between developers and translators? We bet that it could.
Leveraging APIs from both sides, we built an internal tool called “Translation Platform”. It’s a simple and efficient way for developers and translators to collaborate in a streamlined and automated manner. The concept is quite simple; each repository defines their configuration file that indicates where to find the language files, what’s the source language, and what are the targeted languages. A basic example would look as follows:
Once the configuration file in place, the Translation Platform starts listening to Github’s webhooks and automatically detects if a change impacts one of the repository’s language file. If it does, it uses the translation management system API to issue a new translation request, one per targeted language. From a translator standpoint, the tool works similarly. It listens to the translation management system webhooks, detects when translations are ready and approved, then automatically creates a new commit or a new pull request with the newly translated content.
Translation Platform made gathering translations a seamless process, similar to running tests on our continuous integration environment. It gives us visibility of the entire flow while allowing us to gather logs, metrics, and data we can later use to provide SLAs and guarantees on translation requests. The simplicity of the Translation Platform was key to successfully introducing our new translation processes across the company.
Localization challenges don’t stop with words. Every single UX element needs examination through an international lens. For example, shipping and payment are two concepts that vary significantly from one market to another. The iconography that accompanies them must acknowledge these differences and cultural gaps that may exist. A mailbox doesn’t look the same in Japan as it does in France. A credit card isn’t used as much in India as it is in North America.
Maps and geography represent another intriguing challenge. Centering a world map over Japan instead of the UK can go a long way with our Japanese merchants. The team needs to take special care of regions like Taiwan and Macau, which can lead to important conflicts if not labeled correctly, especially when what is considered “correct” changes depending on whom we ask.
Number formatting, addresses, and phone numbers are all things that change from one region or language to another. If something requires formatting for display purposes, the format will change with the country or the language.
We’re only at the beginning of our journey. The internationalization and globalization of a platform isn’t a small task but an ongoing effort. The same way our security experts never sleep, we expect to always be around, informing our peers about language specificities, market subtleties, and local requirements.
My name is Christian and I lead the engineering team responsible for internationalization and localization at Shopify. If these types of challenges are appealing to you, feel free to reach out to me on twitter or through our career page.