Stop trying to do it all alone, add Kit to your team. Learn more.
We’re off to Germany for SREcon 2018 Europe/Middle East/Africa!

We’re off to Germany for SREcon 2018 Europe/Middle East/Africa!

3 minute read

In just a couple of weeks, we will be off to Germany for the SREcon18 Europe/Middle East/Africa, a gathering of engineers who care deeply about site reliability, systems engineering and working with complex distributed systems at scale. The conference will run from August 29 - 31 and our developers, Felix Glaser, Daniel Turner, and Niko Kurtti will be presenting talks at the event. The conference has a culture of critical thought, deep technical insights, and continuous improvement and we hope to see you there!

Know Your Kubernetes Deploys - Felix Glaser

Containers changed the way we develop and package our code. Kubernetes made it easy to deploy and orchestrate our workloads. Now that those steps are well understood, it is time to draw attention to securing the software supply chain. This talk shows how Shopify secures and tracks its workloads.

We secure our software supply chain by creating signatures on our containers which state that they originate from the correct deploy pipeline, got tested and contain no known vulnerabilities or outdated software.

During deployment, we use an admission controller that enables us to enforce deploy time policies that check the presence of the before created signatures so that we prevent privilege escalation via code deployment.

Since new exploits show up all the time, we need to add another piece to the puzzle to sure containers: a place to track all the metadata created during the lifetime of a container. For example, where it's deployed so that if it becomes vulnerable it gets pulled out of production, fixed, and redeployed.

Thursday, 2018, August 30 (09:55 – 10:30)

What Medicine Can Teach Us about Being On-Call - Daniel Turner

Being on-call is a critical and stressful part of being a SRE. While most organizations want and are willing to take steps to reduce the on-call burden, few have used quantitative research methods to try and optimize being on-call.

At the same time, being on-call is a part of most physician’s practice. This is especially true for medical residents—postgraduate doctors in training—who can be on-call as often as once every three days. The field of medicine has undertaken numerous studies and research projects to optimize the handling of on-call duties. These studies have explored work-life balance, ways to decrease the number of critical incidents (which can literally mean life or death), as well as reducing mistakes.

This talk breaks down the techniques and research that have led to practices that can be adopted for SREs. It also looks at issues that remain unsolved in both fields, like pages sent to the wrong team or those that shouldn’t have been sent at all. Finally, it concludes with words of warning that SREs are not physicians, and as with any interdisciplinary study, we must be mindful of these differences when borrowing techniques.

Friday, 2018, August 31 (12:15 – 12:40)

Keep Building Fresh: Shopify's Journey to Kubernetes - Niko Kurtti

Shopify, in 2014, was one of the first large-scale users of Docker in production. We ran 100% of our production traffic in hundreds of containers. We saw the value of containerization and aspired to also introduce a real orchestration layer.

Fast forward two years to 2016, when instead we had a clumsy and fragile homemade middleware for controlling containers. We started looking at orchestration solutions again and the technology behind Kubernetes intrigued us.

In this talk I'll briefly go over challenges we saw in moving from a traditional host-based infrastructure to a cloud native one, moving not only our core app to Kubernetes but also hundreds of our other apps at the same time. I'll focus on the cluster tooling solutions we've built like controllers, cluster creators, and deploy tools. We've automated things ranging from our DNS to certificates and even complex cluster creations—and all with a real programming language and projects rather than a handful of random scripts.

The ability to extend Kubernetes to fit our needs has been the greatest reward of this project. It's given us a new paradigm on which to build upon rather than relying on old patterns.

Friday, 2018, August 31 (14:00 – 14:50)

Continue reading

Shopify to Attend Percona Live 2018!

Shopify to Attend Percona Live 2018!

2 min read

We’re excited to announce that we’ll be attending Percona Live 2018 in Santa Clara this coming April! Running from April 23-25, the Percona Live Open Source Database Conference is the premier event for anyone that develops and uses open source database software.

We’ll be sending three Shopify speakers to the event. Check out their conference topics below!

The role of the DBA is evolving, as more companies and products move to a hybrid model where, along with their traditional work, DBAs are expected to code. We’ve scaled our Data Stores team to bring in pure developers, pure DBAs, and a mix of both. We were able to find creative ways to help support our DBAs to better adapt to the changes in the industry. We’re discovering how increasingly complex it is to find candidates with this mix of experience, and through learning from our struggles we have learned the best methods to find DBAs and to help them evolve. We want DBAs to be prepared for this new world and would love to share our industry findings with this community.

Monday, April 23 (4:30pm - 6:30pm)

In this session, we will discuss our fully automated failover solution running in containers on Kubernetes. Using Orchestrator for MySQL failovers, ProxySQL to route queries and a Zookeeper-backed application we wrote called Taiji for service discovery, database failures and topology changes are handled without any human intervention. This system is tolerant to network partitions and connectivity issues, node failures, and even full-region outages.

After adding additional functionality to Orchestrator, we have it deployed with the raft consensus protocol and automatic failovers enabled. ProxySQL is deployed alongside a Taiji container that watches for changes in Zookeeper. All topology changes are automatically pushed to Zookeeper via Orchestrator callback scripts and a Taiji agent that performs health checks on databases. In less than a second, these changes are pushed to ProxySQL, so our application will seamlessly begin sending read and write queries to the proper database.

Tuesday, April 24 (3:50pm - 4:40pm)

Existing tools like mysqldump and replication cannot migrate data between GTID-enabled MYSQL and non-GTID-enabled MySQL - a common configuration across multiple cloud providers that cannot be changed. These tools are also cumbersome to operate and error-prone, thus requiring a DBA’s attention for each data migration. We introduced a tool that allows for easy migration of data between MySQL databases with constant downtime on the order of seconds.

Inspired by gh-ost, our tool is named Ghostferry and allows application developers at Shopify to migrate data without assistance from DBAs. It has been used to rebalance sharded data across databases. We plan to open source Ghostferry at the conference so that anyone can migrate their own data with minimal hassle and downtime. Since Ghostferry is written as a library, you can use it to build specialized data movers that move arbitrary subsets of data from one database to another.

Tuesday, April 24 (4:50pm - 5:15pm)

Interested in working at Shopify? Talk to Kayla at the conference, or reach out on LinkedIn! You can also check out our careers page.

Continue reading

Shopify is Heading to Pittsburgh for RailsConf 2018!

Shopify is Heading to Pittsburgh for RailsConf 2018!

2 min read

We’re excited to announce that we’ll be attending RailsConf 2018 in Pittsburgh from April 17 - 19! RailsConf is hosted annually by Ruby Central, Inc., and features the world’s largest gathering of Rails developers and Rubyists!

This year, we’re sending three Shopify engineers to speak about their experiences and best practices for developing on Rails for one of the world’s largest commerce sites. Check out their conference topics below!

Tracking down bugs can be hard. Tracking down bugs in a codebase touched by five thousand contributors is even harder. Making heads or tails of an inheritance-happy codebase like Rails can be a nightmare. How do you find where the bug in save is when save is overridden by 15 different modules?

In this talk, we’ll look at the process that goes into fixing a bug in Rails itself. You’ll learn about every step from the initial report, to when the fix is eventually committed. We’ll share tips on navigating Rails’ internals, and how to find the source of problems - even if you’re a complete newcomer to Rails.

Tuesday, April 17 (11:40am - 12:20pm)

Moving from operations powered by scripts like Capistrano, to containers orchestrated by Kubernetes, requires a shift in practices. In this talk, we’ll go beyond the operational basics of Kubernetes, and cover more advanced aspects, such as gradual deployments, capacity planning, job workers and their safety, and solving problems for unique cloud environments like Kubernetes.

This presentation is about the lessons we learned while migrating hundreds of Rails apps within the Shopify organization to Kubernetes.

Tuesday, April 17 (2:30pm - 3:10pm)

Upgrading Rails at Shopify has always been a tedious and slow process. A full upgrade cycle took as much time as releasing a new version of Rails - this wasn’t working for us. We realized that having a full-time team dedicated to working on Rails wasn’t the solution; instead, it was to build a proper toolkit and process for each upgrade. In this talk, you’ll learn about the different techniques and strategies that enabled Shopify to perform its fastest, smoothest Rails upgrade ever.

Wednesday, April 18 (10:50am - 11:30am)

 

Interested in working at Shopify? Talk to Mackenzie, Alexa, or Jane about our engineering opportunities at our booth, or reach out to them on LinkedIn! You can also head over to our careers page.

Continue reading

We're off to Santa Clara for SREcon 2018!

We're off to Santa Clara for SREcon 2018!

3 minute read

Get pumped! This March, Shopify engineers will be speaking at SREcon Americas in Santa Clara, CA, USA! This three-day conference runs from March 27-29, 2018, and is dedicated to highlighting excellence, best practices, and thought leadership in the areas of engineering resilience, reliability, and scalability. Shopify engineers will be giving presentations on several topics, ranging from software engineer lesso

Continue reading

We're Headed to the Velocity San Jose Conference

We're Headed to the Velocity San Jose Conference

If you’re headed to Velocity, happening June 20 to 22 in San Jose, come say hello! We’re excited to have two talks from our production engineering team.  

Scriptable load balancers - Emil Stolarsky and Justin Li

  • Flash sales have been a prominent marketing tool to get a high volume of customers in a short period of time. From people rushing to the Kylie Cosmetics store, to the much anticipated Cyber Monday deals. A tool we use to manage these activities are scriptable load balancers. This tool works with Nginx and LuaJIT via OpenResty to quickly handle difficult infrastructural problems that can occur.  

Emil and Justin will run through how scriptable load balancers allow us to deal with flash sales, DDoS attacks, and solve sharding data centers.

Wednesday, June 21, 11:25 am to 12:05 pm - Location: LL21 C/D 

Genesis: Automation data center management - David Radcliffe

  • Automation is essential to increasing the speed of capacity management. In previous years a Google sheet was used to keep track of our data centers, until we adopted Tumblr’s Collins, an inventory management tool. A hack day project During the presentation, David will explore some of the tools, such as Genesis, for a faster automation of our data centers.  

Thursday, June 22, 1:15 pm to 1:55 pm - Location: LL21 C/D 

See you there! And to whet your appetite, here's a video from one of our speakers last year, Flo Weingarten, talking about our multitenant architecture across multiple data centres!

Continue reading

Five Shopify Talks at RailsConf 2016

Five Shopify Talks at RailsConf 2016

Updated June 9, 2016

3 minute read

RailsConf is tomorrow! For the first time, the conference will be in Kansas City, known for jazz and barbeque and home to the Royals. If you're heading down, here are the details for the five presentations we'll be giving:

  • How We Deploy Shopify - Kat Drobnjakovic

Shopify is one of the largest Rails apps in the world and yet remains to be massively scalable and reliable. The platform is able to manage large spikes in traffic that accompany events such as new product releases, holiday shopping seasons and flash sales, and has been benchmarked to process over 25,000 requests per second, all while powering more than 243,000 businesses. Even at such a large scale, all our developers still get to push to master and deploy Shopify in 3 minutes. Let's break down everything that can happen when deploying Shopify or any really big Rails app.

Wednesday, May 4, 11:40 am to 12:20 pm, Room 3501 G

Continue reading

Start your free 14-day trial of Shopify