How to Measure Subscriber Loyalty
When You have Incomplete Data

How to Measure Subscriber Loyalty
When You have Incomplete Data

By Stephen H. Yu.

Properly measuring customer loyalty is often difficult in a multi-channel B-to-B marketing environment.The first question is often, “Where should we start digging when there are so many data silos?” Before embarking on a massive data consolidation project, we suggest organisations begin by identifying customer loyalty categories.This exercise helps to narrow down the list of data assets you will need to work with.

Who are your valuable customers? What will be their value over the next several years? How long will they continue to do business with you? Which ones are vulnerable and who are likely to churn in the next three months? Wouldn’t it be great if you could identify the vulnerable amongst your valuable customers, “before” they stop doing business with you?

Marketers often rely on surveys to measure loyalty. Net Promoter Score, for example, is a good way to measure customer loyalty at the brand level. But if you want to be proactive about each customer, you need to know the loyalty score for everyone in your base. Asking everyone is extremely cost prohibitive and impractical. Moreover, respondents may not be completely honest about their intentions, especially when it comes to monetary transactions.

That’s where modeling techniques come in. Without asking direct questions, what are the leading indicators of loyalty or churn? What specific behaviours lead to relationship longevity versus complete attrition? In answering these questions, past behaviour has often proven to be a better predictor of future behaviour than survey data.That’s because what people say they “would do” and what they “actually do” are in fact different

Modeling is also beneficial, because it fills inevitable data gaps. No matter how much data you’ve collected, you’ll never know everything about everyone within your base. Models are tools that make the most of available data assets, summarising complex datasets into forms of answers to questions. For instance, how loyal is Company XYZ? The loyalty model score will indicate the answer in a numeric form, such as a score between 1 and 10, for every entity in question. Undoubtedly, that’s a much simpler option than setting up rules by digging through a long data dictionary.


Maximizing the power of data

No matter the form, modeling is useful for maximising the power of available data.Remember: Marketers must begin small using readily available data assets, then gradually improve them over time.

eClerx Digital recently developed a loyalty model for a leading US computing service company. The purpose of the modeling exercise was two-fold: (1) Identify the group likely to be loyal customers and (2) pinpoint the “vulnerable” segment in the base. This allowed our client to enhance treatment of “potentially” loyal customers even before they showed obvious signs of loyalty. At the opposite end of the spectrum, the client could proactively contact “vulnerable” customers, if their present or future value (need a customer value model for this) was high. We call that the “valuable-vulnerable” segment.

We could’ve also built a separate churn model.However, this would have required extensive historical data in the form of time-series variables (processes for those can be time-consuming and costly). To arrive at the answer quickly with minimal data access, we chose to build one loyalty model, making sure that the bottom scores measured vulnerability, while the top scores indicated loyalty.

What did we need to build this model? Again, to provide “usable” answers in the shortest time, we used only transaction history from the past three years along with 3rd party firmographic data. We considered the following: promotion and response history data; technical support data; non-transactional engagement data; and client-initiated activity data. All of which were pushed out for future enhancement due to difficulties in data procurement.

Loyalty has several different meanings. For our purposes, we considered multiple options to define “loyal” as a mathematical term for modeling. Depending on the purpose, it could mean high value, frequent buyer, tenured customers, or other measurements of loyalty and levels of engagement. Since we were starting with the basic transaction data, we examined many possible combinations of RFM data.

In doing so, we observed in this instance that many indicators of loyalty behave radically differently among distinct segments defined by spending levels. A clear sign that separate models were required. Other cases, such overarching segments can also be defined based on region, product line, or target groups.

We divided the base into small, medium and large segments based on annual spending levels. Then we examined other types of loyalty indicators for a target definition. If we had survey data, we could have also used them to define the meaning of “loyal.” In this project, we mixed the combinations of recency and frequency factors, where each segment ended up with different target definitions. For the first round, we defined loyal customers with the last transaction date within the past 12 months and total transaction counts within the top 10-15% range, whereas the governing idea was to have the target universes that were not “too big” or not “too small”. During this exercise, we concluded that the small segment of big spenders were deemed loyal, and no model was needed to further differentiate them.


As expected, models built for small and medium level spenders were quite different in terms of data usage as well as the weight assigned to each variable. For example, even for the same product category purchases, a recency variable (i.e. weeks since last transaction within the category) showed up as a leading indicator for one model, while various bands of categorical spending level were important factors for the other. A common variable such as industry classification code (SIC code) also behaved very differently, validating our decision to build separate models for each spending level segment.

The following is the efficiency curve for one of the resultant models:


This is a typical method of measuring the predictive power of the model in terms of “cumulative gains” realized by the exercise. Here, the top model group displays over 4x gain in terms of loyalty measurement over the general population, while the tail end of the curve indicates “not-so-loyal” or “vulnerable.”
Could this model have been more effective with more colorful sets of input data? Yes.Would it have significantly changed the way marketers configure their customers in terms of loyalty proxy? Not really.
That is why moving quickly with readily usable data is important. Models can improved, but generally speaking, rankings do not shift drastically. In other words, a company that scored 3 or 4 on a loyalty scale won’t jump into the top group just because new types of data were introduced into the mix.

Now that we have proxy scores of loyalty (not carved in stone) for everyone in the base, here are our recommendations following this exercise

  • Marketers can engage “likely to be loyal” customers (generally top 2-3 model groups) with special care, more proactively.
  • At the bottom end of the curve (generally the bottom 3-4 model groups), marketers can identify “valuable, but vulnerable” customers by combining the loyalty model score with its present value, or preferably, develop separate customer value model scores. Then, the organization can proactively address those valuable-vulnerable customers to prevent churn.
  • Test, test, test. Modeling is an iterative exercise. Set up control groups for “no-treatment” segments, and continuously measure the effectiveness of prediction.Tweak the models periodically and enhance them over time by adding other available data.

This example is only one of many possible ways to create proxies of loyalty. Depending on the business model and a number of factors including immediate challenges, channel usages and available data, the definition of loyalty and the appropriate modeling exercise can come in different forms.

No matter the form, modeling is useful for maximizing the power of available data. Remember: marketers must begin small using readily available data assets, then gradually improve them over time. Ultimately, it is not the most mathematically sound models, but about treating customers properly in the order of importance to your business. For that, the proxy score in hand now is much better than a perfect set of data which may never come.

About the Author:

Stephen H. Yu is the Practice Head, Advanced Analytics & Insights for eClerx. He is a world-class database marketer with a proven track record in comprehensive strategic planning and complete tactical execution, from data modeling to targeting and personalization based on advanced analytics.