Cam Davidson-Pilon 11/14/2017

How Shopify Merchants can Measure Retention

At Shopify, our business depends upon understanding the businesses of the more than 500,000 merchants who rely on our platform. Customers are at the heart of any business, and deciphering their behavior helps entrepreneurs to effectively allocate their time and money. To help our merchants, we set upon tackling the nontrivial problem of helping our merchants determine customer retention.

When a customer stops buying from a business, we call that churn. In a contractual business (like software as a service), it’s easy to see when a customer leaves because they dissolve their contract. By comparison, in a non-contractual business (like a clothing store), it’s more difficult as the customer simply stops purchasing without any direct notification. This business won’t know, so we can’t describe it as deterministic. Entrepreneurs running non-contractual businesses can better define churn using probability.

Correctly describing customer churn is important: picking the wrong churn model means your analysis will be either full of arbitrary assumptions or misguided. Far too often businesses define churn as no purchases after N days; typically N is a multiple of 7 or 30 days. Because of this time-limit, it arbitrarily buckets customers into two states: active or inactive. Two customers in the active state may look incredibly different and have different propensities to buy, so it’s unnatural to treat them the same. For example, a customer who buys groceries in bulk should be treated differently than a customer who buys groceries every day. This binary model has clear limitations.

Our Data team recognized the limitation of defining churn incorrectly, and that we had to do better. Using probability, we have a new way to think about customer churn. Imagine a few hypothetical customers visit a store, visualized in the below figure. Customer A is reliable. They are a long-time customer and buy from your store every week. It’s been three days since you last saw them in your store but chances are they’ll be back. Customer B’s history is short-lived. When they first found your store, they made purchases almost daily, but now you haven’t seen them in months, so there’s a low chance of them still being considered active. Customer C has a slower history. They buy something from your store a couple times a year, and you last saw them 10 months ago. What can you say about Customer C’s probability of being active? It’s likely somewhere in the middle.

How Shopify Merchants can Measure Retention
We can formalize this intuition of probabilistic customers in a model. We’ll consider a simple model for now. Suppose each customer has two intrinsic parameters: a rate of purchasing, \(\lambda\), and a probability of churn event, \(p\). From the business point of view, even if a customer churns, we don’t see the churn event and we can only infer churn from their purchase history. Given a customer’s rate of purchase, their times between purchases is exponentially distributed with rate \(\lambda\), which means it looks like a Poisson process. After each future purchase, the customer has a \(p\) chance of churning. Rather than trying to estimate every customer's’ parameters, we can think about an individual customer’s parameter coming from a probability distribution. Thus we can estimate the distribution that generates the parameters, and hence, the customers’ behavior. Altogether this is known as a hierarchical model, where there are unobservables (the customer behaviors) being created from probability distributions.
The probability distributions for \(\lambda\) and \(p\) are different for each business. The first step in applying this model is to estimate your specific business’s distributions for these quantities. Let’s assume that a customer’s \(\lambda\) comes from Gamma distribution (with currently unknown parameters), and \(p\) comes from a Beta distribution (also with currently unknown parameters). This is the model the authors of “Counting Your Customers the Easy Way: An Alternative to the Pareto/NBD Model” propose. They call it the BG/NBD (Beta Geometric / Negative Binomial Distribution) model.
BG/NBD (Beta Geometric / Negative Binomial Distribution) model

Further detail on implementing the BG/NBD model is given below, but what’s interesting is that after writing down the likelihood of the model, the sufficient statistics turn out to be:

  • Age: the duration between the customer’s first purchase and now
  • Recency: what was the Age of the customer at their last purchase?
  • Frequency: how many repeat purchases have they made?

Because the above statistics (age, frequency, recency) contain all the relevant information needed, we only need to know these three quantities per customer as input to the model. These three statistics are easily computed from the raw purchase data. Using these new statistics, we can redescribe our customers above:

  • Customer A has a large Age, Frequency, and Recency.
  • Customer B has a large Age and Frequency, but much smaller Recency.
  • Customer C has a large Age, low Frequency, and moderate Recency.

Being able to statistically determine the behaviors of Customers A, B and C means an entrepreneur can better run targeted ad campaigns, introduce discount codes, and project customer lifetime value.

The individual-customer data can be plugged into a likelihood function and fed to a standard optimization routine to find the Gamma distribution and Beta distribution parameters \((r, \alpha)\), and \((a, b)\), respectively. You can use the likelihood function derived in the BG/NBD paper for this:


$$L(r, \alpha, a, b | X=x, t_x, T) = A_1 A_2 (A_3 + \delta_{x > 0} A_4) $$

where

$$A_1 = \frac{\Gamma(r + x)\alpha^r}{\Gamma(r)}$$

$$A_2 = \frac{\Gamma(a + b)\Gamma(b+x)}{\Gamma(b)\Gamma(a+b+x)}$$

$$A_3 = \left(\frac{1}{\alpha+T}\right)^{r+x}$$

$$A_4 = \left(\frac{a}{b+x-1}\right)\left(\frac{1}{a+t_x}\right)^{r+x}$$

We use optimization routines in Python, but the paper describes how to do this in a spreadsheet if you prefer.

Once these distribution parameters are known \((\alpha, r, a, b)\), we can look at metrics like the probability of a customer being active given their purchase history. Organizing this as a distribution is useful as a proxy for the health of a customer base. Another view is to look at the heatmap of the customer base. As we vary the recency of a customer, we expect the probability of being active to increase. And as we vary the frequency, we expect the probability to increase given a high recency too. Below we plot the probability of being active given varying frequency and recency:

How Shopify Merchants Can Measure Retention - Probability of Being Active, by Frequency and Recency

The figure reassures us that the model behaves as we expect. Similarly, we can look at the expected number of future purchases in a single unit of time: 

Expected Number of Future Purchases for 1 Unit of Time

At Shopify, we’re using a modified BG/NBD model implemented in lifetimes, an open-source package maintained by the author and the Shopify Data team. The resulting analysis is sent to our reporting infrastructure to display in customer reports. We have over 500K merchants that we can train the BG/NBD model on, all in under an hour. We do this by using Apache Spark’s DataFrames to pick up the raw data, group rows by the shop, and apply a Python user-defined function (UDF) to each partition.  The UDF contains the lifetimes estimation algorithm. For performance reasons, we subsample to 50k customers per shop because the estimation beyond this yielded diminishing returns. After fitting the data to the BG/NBD model’s parameters, we apply the model to each customer in that shop, and yield the results again. In all, we infer churn probabilities and expected values for the over 500 million historical merchant customers.

One reason for choosing the BG/NBD model is its easy interpretability. Because we are displaying the end results to shop owners, we didn’t want the model to be a black-box that they’d have a difficult time explaining why a customer was at-risk or loyal. Recall the variables the BG/NBD model requires are age, frequency and recency. Each of these variables is easily understood by even non-technical individuals. The BG/NBD model is codifying the interactions between these three variables and providing quantitative measures based on them. On the other hand, the BG/NBD does suffer from over simplicity. It doesn’t handle seasonal trends well. For example, the frequency term collapses all purchases into a single value, ignoring any seasonality in the purchase behaviour. Another limitation is using this model, you cannot add additional customer variables to the model (ex: country, products purchased) easily.

Once we fitted a model for a store, we rank customers from highest to lowest probability of being active. The highest customers are the reliable customers. The lowest customers are unlikely to come back. The customers around 50% probability are at risk of churning, so targeted campaigns could be made to entice them back, possibly reviving the relationship and potentially gaining a life-long customer. By providing these statistics, our merchants are in a position to drive smarter marketing campaigns, order fulfillment prioritization, and customer support.