Modelling Developer Infrastructure Teams
Share
I’ve been managing Developer Infrastructure teams (alternatively known as Developer Acceleration, Developer Productivity, and other such names) for almost a decade now. Developer Infrastructure (which we usually shorten to “Dev Infra”) covers a lot of ground, especially at a company like Shopify that’s invested heavily in the area, from developer environments and continuous integration/continuous deployment (CI/CD) to frameworks, libraries, and various productivity tools.
When I started managing multiple teams I realized I’d benefit from creating a mental model of how they all fit together. There are a number of advantages to doing this exercise, both for myself and for all team members, manager and individual contributor alike.
First, a model helps clarify the links and dependencies between the various teams and domains. This in turn allows a more holistic approach to designing systems that mitigates siloed thinking that affects both our users and our developers. Seeing these links also lets us identify where there are gaps in our solutions and rough transitions.
Second, it helps everyone feel more connected to a larger vision, which is important for engagement. Many people feel more motivated if they can see how their work fits into the big picture.
There’s no single perfect model. Indeed, it’s helpful to have different models to highlight different relationships. Team structures also change and that can require rethinking connections. I’m going to discuss one way that I thought about my area last year, reflecting the org structure at that time. In fact, constructing this model actually helped me think through other ways of organizing teams and led to us implementing a new structure. Before we get into the model, though, here’s a very brief description of the teams that reported into me last year:
- Local Environments: The team responsible for the tooling that helps get new and existing projects set up on a local machine (that is, a MacBook Pro). This includes cloning repositories, installing dependencies, and running backing services, amongst various other common tasks.
- Cloud Environments: A relatively new team that was created to explore development on remote, on-demand systems.
- Test Infrastructure: They’re in charge of our CI systems, continually improving them and trying new ideas to accommodate Shopify’s growth.
- Deploys: These folks handle the final steps in the development process: merging commits into our main branches (we’ve outgrown GitHub’s standard process!), validating them on our canary systems, and promoting them out to production.
- Web Foundations: We’ve got some big front-end codebases and thus a team dedicated to accelerating the development of React-based apps through various tools and libraries.
- React Native Foundations: Similar to Web Foundations, but focused specifically on standardizing and improving how we build React Native apps.
- Mobile Tooling: Mobile apps have quite a few differences from web apps, so this team specializes in building tools for our mobile devs.
The Development Workflow
One way to look at the Developer Infrastructure teams is as parts of the development workflow (or pipeline), which can be split into three discrete phases:
- Develop: Setting up local dependencies, creating patches, and local testing
- Validate: Building on CI and running test suites
- Deploy: Merging patches into our main branch and sending the final images to our production infrastructure
The Local Environments, Cloud Environments, Test Infrastructure, and Deploys teams each map to one phase. The scope of these teams remains broad, although the default support is for Ruby on Rails apps. See above for a graphical representation.
By contrast, the applications and systems developed and supported by the Mobile Tooling, Web Foundations, and React Native Foundations teams span multiple phases. In the case of Web Foundations, much of this work focuses on the development phase (frameworks, tools, and libraries), but the team also maintains one application that’s executed as part of the validate phase, to monitor bundle sizes.
Web Foundations builds on the systems supported by the Local and Cloud Environments, Test Infrastructure, and Deploys teams. Their work is complementary by adding specialized tooling for front-end development.
The work of the Mobile Tooling and React Native Foundations teams spans all three phases. Although in this case, as seen in the image above, the Deployment phase is independent from that of the generic workflow, given the very different release process for mobile apps.
Horizontal and Vertical Integration
We can further extend the workflow model by borrowing a concept from the business world to look at the relationships in these teams. In a manufacturing industry, horizontal integration means that the different points in the supply chain have specific, often large companies behind them. The producer, supplier, manufacturer, and so on are all separate entities, providing deep specialization in a particular area.
One could view Local and Cloud Environments, Test Infrastructure, and Deploys as similarly horizontally integrated. The generic development workflow is the supply chain, and each of these teams is responsible for one part of it, that is, one phase of the workflow. Each specializes in the specific problem area involved in that phase by maintaining the relevant systems, implementing workflow optimizations, and scaling up solutions to meet the increasing amount of development activity.
By contrast, vertical integration involves one company handling multiple parts of the supply chain. IKEA is an example of this model, as they own everything from forests to retail stores. Their entire supply chain specializes in a particular industry (furniture and other housewares), meaning they can take a holistic approach to their business.
Mobile Tooling, Web Foundations, and React Native Foundations can be seen as similarly vertically integrated. Each is responsible for systems that collectively span two or all three phases of the workflow. As noted, these two teams also rely on systems supported by the generic workflow, with their own specific solutions being either built on or sitting adjacent to them. So, they aren’t fully vertically integrated, but instead of being specialized in a phase of the development pipeline, these teams are subject matter experts in the development workflow of a particular technology. They build solutions along the workflow as required when the generic solutions are insufficient on their own.
Analyzing Our Model
Now, we can use the idea of a development workflow and the framework of horizontally and vertically integrated teams as a lens to pull together some interesting observations. First let’s look at the commonalities and contrasts.
The work of each team in Dev Infra generally fits into one or more of the phases of the development workflow. This gives us a good scope for Dev Infra as a whole and helps distinguish us from other teams in our parent team, Accelerate. This in turn allows us to focus by pushing back on work that doesn’t really fit into this model. We made this Dev Infra’s mission statement: “Improving and scaling the develop–validate–deploy cycle for Shopify engineering.”
An interesting contrast is that the horizontal teams have broad scale, while the vertical teams have broad scope. Our horizontal teams have to support engineering as a whole: virtually every developer interacts with our development environments, test infrastructure, and deploy systems. As a growing company, this means an increasing amount of usage and traffic. On the other side, our vertical teams specialize in smaller segments of the engineering population: those that develop mainly front-end and mobile apps. However, they’re responsible for specific improvements to the entire development workflow for those technologies, hence a broader scope.
Further to this point, vertical teams have more opportunities for collaboration given their broad scope. However, there are also more situations where product teams go in their own directions to solve specific problems that Dev Infra can’t prioritize at a given moment. Therefore, it’s imperative for us to stay in close contact with product teams to ensure we aren’t duplicating work and to act as long-term stewards for infra projects that outgrow their teams. On the other side, horizontal teams get fewer outside contributions due to how deep and complex the infrastructure is to support our scale. However, there’s more consistency in its use as there are fewer, if any, ways around these systems.
From Analysis to Action
As a result of our study, we’ve started to categorize the work we’re doing and plan to do. For any phase in the development pipeline, there are three avenues for development:
- Concentration: solidifying and improving systems, improving user experience, and incremental or linear scaling
- Expansion: pushing outwards, identifying new opportunities within the problem domain, and step-change or exponential scaling
- Interfacing: improving the points of contact between the development phases, both in terms of data flow and user experience, and identifying gaps into which an existing team could expand or a new team is created
Horizontal and vertical teams will naturally approach development differently:
- Horizontal teams have a more clearly defined scope, and hence prioritization can be easier, but impact is limited to a particular area. Interface development is harder because it spans teams.
- Vertical teams have a much vaguer scope with more possibilities for impact, but determining where we can have the most impact is thus more difficult. Interface improvement can be more straightforward if it’s between pieces owned by that team.
We also used this analysis to inform the organizational structure. As I mentioned, we made some changes earlier this year within Accelerate. This included starting a Client Foundations team, which are essentially all vertically integrated and technology-focused teams, specializing in front-end and mobile development. Back in Dev Infra, we have the possibility of pulling in teams that currently exist in other organizations if they help us extend our development workflow model and provide new horizontal integrations. We’re starting to experiment with more active collaboration between teams to expand the context the developers have about our entire workflow.
Finally, we plan to engage in some user research that spans the development workflow. Most of the time any in-depth research we do is at the team level: what repetitive tasks our mobile devs face, what annoys people about our test infrastructure, or how to make our deploy systems more intuitive. Now we have a way to talk about the journey a developer takes from first writing a patch all the way to getting it out into production. This helps us understand how we can make a more holistic solution and provide the smoothest experience to our developers.
Mark Côté manages Developer Infrastructure at Shopify. He has worked in the software industry for 20 years, as a developer at a number of start ups and later at the Mozilla Corporation, where he went into management. For half of his career he has been involved in software tooling and developer productivity, leading efforts to bring a product-management mindset into the space.
We're planning to DOUBLE our engineering team in 2021 by hiring 2,021 new technical roles (see what we did there?). Our platform handled record-breaking sales over BFCM and commerce isn't slowing down. Help us scale & make commerce better for everyone.