Search at Shopify—Range in Data and Engineering is the Future
Share
One thing I’ve always appreciated about Shopify is the emphasis on range: the ability to navigate across expertise. Range isn’t just a book we love at Shopify, it’s built into our entire outlook. If you’re a developer at Shopify, you could start your career building data science infrastructure, but decide a few years later to pivot to Ruby internals.
The emphasis on range inspires me. In my coding journey, I’ve loved ranging. I started building AppleBasic programs in 4th grade. Years later my high school friends would try to one-up each other, obsessed with the math behind 3D games.
What does any of this have to do with search?
While most would see search and discovery as some kind of deep specialty: it actually requires an intense amount of range. Many search teams focus too much on specialists—in the words of my former colleague Charlie Hull, teams always wanted to hire “magical search unicorns” that often don’t exist. Instead, they tended to have siloed data and engineers working on search.
I’ve taken these painful experiences to heart when helping build Shopify’s search team. I want to share why range is a core team principle that separates us from the herd and sets us up for long-term success. (And of course, why you should join, even if you’re not a magical search unicorn!).
Lack of Range: Dysfunction between Data and Engineering
In reality, nobody on our search team is an “engineer” or “data scientist”. Instead they have the range to be both at the same time. In fact, most of the team has a wide range when it comes to past jobs or hobbies: from linguists to physicists! After all, good decisions require fitting both data science and engineering skills into one brain.
Why? Because of the trade-offs.
Pure data scientists or engineers waste time making poor decisions because they lack full context. They won’t see the other competency’s constraints. That’s why generalizing beyond our expertise is a major part of how Shopifolk work on every project. And that’s precisely why we’ve brought this value to the search domain.
Consider life in the data silo: without engineering context, data could easily chase the bleeding edge machine learning research without considering how to deliver to production. They develop a new model, decide shipping to production isn’t their job and instead give the new model to engineers to translate.
In the engineer silo, they don’t have the context needed to make the important tradeoffs. Can they know where to tweak the model to remove bloat that doesn’t hurt relevance? Can pure engineers make the dozens of minute-by-minute decisions they need to optimize relevance, performance, and stability? Without the data context in their brain, they’ll fail, leading to suboptimal solutions!
Great engineering is about making the best decision given the constraints. So when an engineer lacks one crucial piece of know-how (data and relevance), they won’t arrive at the optimal solution between relevance, performance, stability, and other product factors. They’ll blindly implement the model, unsure where to tweak, leading to disastrous results in one of these dimensions.
That leads me to the other end of the trade-off spectrum: the data team creates a reasonable solution, but the infrastructure won’t bend. Unfortunately the engineers, specifically skilled in performance and reliability, might not see the full search quality spectrum of relevance, experience, and performance. Their incentives focus on answering whether search satisfies a service-level agreement? Does it keep me from being woken up at 3AM when I’m on call? With only those constraints, why would an engineer care to build a complicated looking search relevance model that only runs the risk of creating more complexity and instability?
Coordination between two groups—each with only half of the skills needed to make decisions—creates dysfunction. It adds needless time to production deployment and creates politics.
Silos like these only lead to the dark side.
The solution? RANGE!
Range: The Solution to Dysfunction between Data and Engineering
At Shopify, we have one team with members from both competencies. We draw very few lines between “data” and “engineering” work. Instead we have “search” work.
Engineers on our team must grow data science skills—they learn to build and run experiments. They think scientifically and evaluate the quality of a model. Data scientists find themselves pushed to become good engineers. They must build high quality, performant, and testable code. When they build a model, it’s not just a random idea in a notebook, it’s on them to get it to production and create a maintainable system.
Why does this matter? Because search, like all software development, requires making dozens of deeply intricate tradeoffs between correctness, scalability, performance, and maintainability. Good decisions require fitting both data science and engineering skills in one brain. An elegant solution to a problem is the simplest one that satisfies all of the constraints. If you can only fit half the constraints in your head, you’ll fail to see the best solution that makes search smart, fast, and scalable.
A close partnership between data and engineering organizations makes this possible. Management on both sides has experience and commitment to close collaboration and partnership. At the level of individual contributors, we don’t think of ourselves as two teams. We’re one team, with individuals that report to a few different leads. We organize, plan, and execute together. We don’t carve out territorial fiefdoms.
Data and Engineering Range is the Future
When you look at the problems of tomorrow, they’ll increasingly be less about point-and-click interactivity. They’ll frequently include some “smart” user interaction. The user wants to:
- talk to the system
- start with a curated set of possibilities tailored to them and fine tune them with their preferences
- be given options or taken on a journey that doesn’t filter out obvious paths they won’t care about.
This isn’t just the cool stuff people add on to an existing application: it’s increasingly the core part of what’s being built.
I see search and discovery at Shopify as just the beginning. The more personalized or conversational products we build, like those listed above, the more engineers must have the range to push into data (and vice versa). The future isn’t specialization within data science and engineering—it’s having the range to move between both.
Doug Turnbull is a Sr. Staff Engineer at Shopify working on search and discovery. Doug wrote Relevant Search and contributed to AI Powered Search. Doug also blogs heavily at Shopify and his personal site. Currently Doug’s passion includes incubating search and discovery skills at Shopify, planning technical initiatives in search and discovery, and collaborating with peers to make commerce better for everyone through search!
Wherever you are, your next journey starts here! If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Intrigued? Visit our Engineering career page to find out about our open positions and learn about Digital by Default.