Minimizing Churn Rate Through Analysis of Financial Habits

Background

Subscription products often are the main source of revenue for companies across all industries. These products can come in the form of a “one size fits all” overcompassing subscription, or in multi level memberships. Regardless of how they structure their memberships, or what industry they are in, companies almost always try to minimize customer churn (aka subscription cancellations).

To retain their customers, companies first need to identify behavioral patterns that act as catalyst in disengagement with the product.

Market: The target audience is the entirety of a company’s subscription base. They are the ones companies want to keep.
Product: The subscription products that customers are already enrolled in can provide value that users may not have imagined, or that they may have forgotten

Objective

The objective of this model is to find out which users are likely to churn, so that the company focus on re-engaging these users with the product. These efforts can be email reminders about the benefit of the product, especially focusing on features that are new or that user has shown to value

In this case study we will be working for a fintech company that provides a subscription product to its users, which allows them to manage their Bank accounts (saving account, credit cards etc.), provides them with personalized coupons, informs them about latest low APR-loans available in the market, and educates them on the best available methods to save money (like free courses on financial health etc.)

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import random

dataset = pd.read_csv('churn_data.csv') # Users who were 60 days enrolled, churn in the next 30

Exploratory data analysis

Now, we will do some Exploratory Data Analysis (EDA) which is an approach for data analysis that employs a variety of techniques to:

maximize insight into a data set
uncover underlying structure
extract important variables
detect outliers and anomalies
test underlying assumptions
develop parsimonious models
determine optimal factor settings

dataset.columns

Index(['user', 'churn', 'age', 'housing', 'credit_score', 'deposits',
       'withdrawal', 'purchases_partners', 'purchases', 'cc_taken',
       'cc_recommended', 'cc_disliked', 'cc_liked', 'cc_application_begin',
       'app_downloaded', 'web_user', 'app_web_user', 'ios_user',
       'android_user', 'registered_phones', 'payment_type', 'waiting_4_loan',
       'cancelled_loan', 'received_loan', 'rejected_loan', 'zodiac_sign',
       'left_for_two_month_plus', 'left_for_one_month', 'rewards_earned',
       'reward_rate', 'is_referred'],
      dtype='object')

dataset.head(5)

	user	churn	age	housing	credit_score	deposits	withdrawal	purchases_partners	purchases	...	zodiac_sign	left_for_two_month_plus	rewards_earned	reward_rate	is_referred
0	55409	0	37.0	na	NaN	0	0	0	0	...	Leo	1	NaN	0.00	0
1	23547	0	28.0	R	486.0	0	0	1	0	...	Leo	0	44.0	1.47	1
2	58313	0	35.0	R	561.0	47	2	86	47	...	Capricorn	1	65.0	2.17	0
3	8095	0	26.0	R	567.0	26	3	38	25	...	Capricorn	0	33.0	1.10	1
4	61353	1	27.0	na	NaN	0	0	2	0	...	Aries	1	1.0	0.03	0

5 rows × 31 columns

dataset.describe()

	user	churn	age	credit_score	deposits	withdrawal	purchases_partners	purchases	cc_taken	cc_recommended	...	registered_phones	waiting_4_loan	cancelled_loan	received_loan	rejected_loan	left_for_two_month_plus	left_for_one_month	rewards_earned	reward_rate	is_referred
count	27000.000000	27000.000000	26996.000000	18969.000000	27000.000000	27000.000000	27000.000000	27000.000000	27000.000000	27000.000000	...	27000.000000	27000.000000	27000.000000	27000.000000	27000.000000	27000.000000	27000.000000	23773.000000	27000.000000	27000.000000
mean	35422.702519	0.413852	32.219921	542.944225	3.341556	0.307000	28.062519	3.273481	0.073778	92.625778	...	0.420926	0.001296	0.018815	0.018185	0.004889	0.173444	0.018074	29.110125	0.907684	0.318037
std	20321.006678	0.492532	9.964838	61.059315	9.131406	1.055416	42.219686	8.953077	0.437299	88.869343	...	0.912831	0.035981	0.135873	0.133623	0.069751	0.378638	0.133222	21.973478	0.752016	0.465723
min	1.000000	0.000000	17.000000	2.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	1.000000	0.000000	0.000000
25%	17810.500000	0.000000	25.000000	507.000000	0.000000	0.000000	0.000000	0.000000	0.000000	10.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	9.000000	0.200000	0.000000
50%	35749.000000	0.000000	30.000000	542.000000	0.000000	0.000000	9.000000	0.000000	0.000000	65.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	25.000000	0.780000	0.000000
75%	53244.250000	1.000000	37.000000	578.000000	1.000000	0.000000	43.000000	1.000000	0.000000	164.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	48.000000	1.530000	1.000000
max	69658.000000	1.000000	91.000000	838.000000	65.000000	29.000000	1067.000000	63.000000	29.000000	522.000000	...	5.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	114.000000	4.000000	1.000000

8 rows × 28 columns

Few quick things to note:

41% of users have churned
32 is the average age of the users

dataset.shape

(27000, 31)

dataset.dtypes

user                         int64
churn                        int64
age                        float64
housing                     object
credit_score               float64
deposits                     int64
withdrawal                   int64
purchases_partners           int64
purchases                    int64
cc_taken                     int64
cc_recommended               int64
cc_disliked                  int64
cc_liked                     int64
cc_application_begin         int64
app_downloaded               int64
web_user                     int64
app_web_user                 int64
ios_user                     int64
android_user                 int64
registered_phones            int64
payment_type                object
waiting_4_loan               int64
cancelled_loan               int64
received_loan                int64
rejected_loan                int64
zodiac_sign                 object
left_for_two_month_plus      int64
left_for_one_month           int64
rewards_earned             float64
reward_rate                float64
is_referred                  int64
dtype: object

** Data Cleaning: ** Next step, we will clean the data and than continue with more EDA

dataset[dataset.credit_score < 300]
dataset = dataset[dataset.credit_score >= 300]

#Check null values
dataset.isna().sum()

user                          0
churn                         0
age                           0
housing                       0
credit_score                  0
deposits                      0
withdrawal                    0
purchases_partners            0
purchases                     0
cc_taken                      0
cc_recommended                0
cc_disliked                   0
cc_liked                      0
cc_application_begin          0
app_downloaded                0
web_user                      0
app_web_user                  0
ios_user                      0
android_user                  0
registered_phones             0
payment_type                  0
waiting_4_loan                0
cancelled_loan                0
received_loan                 0
rejected_loan                 0
zodiac_sign                   0
left_for_two_month_plus       0
left_for_one_month            0
rewards_earned             1190
reward_rate                   0
is_referred                   0
dtype: int64

Credit score and rewards earned have significant amount of null values. We will drop them from our model.

dataset = dataset.drop(columns = ['credit_score', 'rewards_earned'])

# Features Histograms
fig, ax = plt.subplots(3,3, figsize=(20, 14))
sns.distplot(dataset.age, bins = 20, ax=ax[0,0])  
sns.distplot(dataset.purchases_partners, bins = 20, ax=ax[0,1]) 
sns.distplot(dataset.app_downloaded, bins = 20, ax=ax[0,2]) 
sns.distplot(dataset.deposits, bins = 20, ax=ax[1,0]) 
sns.distplot(dataset.withdrawal, bins = 20, ax=ax[1,1]) 
sns.distplot(dataset.cc_application_begin, bins = 20, ax=ax[1,2]) 
sns.distplot(dataset.cc_recommended, bins = 20, ax=ax[2,0]) 
sns.distplot(dataset.cancelled_loan, bins = 20, ax=ax[2,1]) 
sns.distplot(dataset.reward_rate, bins = 20, ax=ax[2,2]) 
plt.show()

png

Few things to note:

Age: Distribution is right skewed, intuitively it makes sense as older people don’t use the services
Deposit/withdrawal: Majority of people have no deposit (as the data we have is for first couple of months, and for this time period, activity could be low)

## Pie Plots
dataset2 = dataset[['housing', 'is_referred', 'app_downloaded',
                    'web_user', 'app_web_user', 'ios_user',
                    'android_user', 'registered_phones', 'payment_type',
                    'waiting_4_loan', 'cancelled_loan',
                    'received_loan', 'rejected_loan', 'zodiac_sign',
                    'left_for_two_month_plus', 'left_for_one_month', 'is_referred']]
fig = plt.figure(figsize=(15, 12))
#plt.suptitle('Pie Chart Distributions', fontsize=20)
for i in range(1, dataset2.shape[1] + 1):
    plt.subplot(6, 3, i)
    f = plt.gca()
    f.axes.get_yaxis().set_visible(False)
    f.set_title(dataset2.columns.values[i - 1])
   
    values = dataset2.iloc[:, i - 1].value_counts(normalize = True).values
    index = dataset2.iloc[:, i - 1].value_counts(normalize = True).index
    plt.pie(values, labels = index, autopct='%1.1f%%')
    plt.axis('equal')
fig.tight_layout(rect=[0, 0.03, 0.9, 2.1])
plt.show()

png

Few things we notice:

Housing: Majority of users are not owners. There’s good amount of renters. Most of them are unclassified
Payment type: Biweekly is most common
Zodiac sign: Pretty evenly distributed, except for perhaps Capricorn

Interesting to note is features like : ‘waiting_4_loan’, ‘cancelled_loan’, ‘received_loan’, ‘rejected_loan’ and left_for_one_month’are unevenly distributed. We will try to explore more to make sure these features will be useful to build our models

## Exploring Uneven Features
dataset[dataset2.waiting_4_loan == 1].churn.value_counts()

0    15
1     3
Name: churn, dtype: int64

dataset[dataset2.cancelled_loan == 1].churn.value_counts()

0    194
1    187
Name: churn, dtype: int64

dataset[dataset2.received_loan == 1].churn.value_counts()

1    233
0    162
Name: churn, dtype: int64

dataset[dataset2.rejected_loan == 1].churn.value_counts()

1    64
0    17
Name: churn, dtype: int64

dataset[dataset2.left_for_one_month == 1].churn.value_counts()

1    207
0    184
Name: churn, dtype: int64

These are pretty balanced distribution and we do not see any strong reason that these fields are biased.

Next, we will check the correlation with Response variable

## Correlation with Response Variable
dataset_corr = dataset.drop(columns = ['churn', 'user']) #drop columns

dataset_corr.corrwith(dataset.churn).plot.bar(figsize=(20,10),
              title = 'Correlation with Response variable',
              fontsize = 15, rot = 45,
              grid = True)
plt.show()

png

Age is negatively correlated to the response variable churn, smaller the age - more likely for it to be 1 (or churn).

Same with deposits and withdrawal. Smaller the deposits or withdrawal - more likely for users to churn. This makes sense, because this means that less activity user has, more likely they will churn.

Interestingly, ‘cc_taken’ is correlated with churn, meaning if user has taken a credit card, they are more likely to churn (aka they are not happy with Credit card). This will be interesting to explore further.

#Correlation matrix

corr=dataset_corr.corr()

sns.set(font_scale=1.3)
plt.figure(figsize=(24, 27))

sns.heatmap(corr, vmax=.8, linewidths=0.01,
            square=True,annot=True,cmap='YlGnBu',linecolor="black")
plt.title('Correlation between features', fontsize = '32')
plt.show()

png

Obviously, best case scenario would be every feature is independent of each other and matrix above is marked around ‘0’ meaning that they are not linearly related.

However, that is not the case here. As we see in the matrix, correlation between ‘android user’ and ‘ios user’ is very strong. This makes sense, as if you are an android user, you are likely not a ios user. Correlation is not exactly 1 as there are probably users who have both. We will drop one of the column as it is not bringing any new information.

Additionaly, ‘app_web_user’ field is 1 when ‘web_user’ and ‘app downloaded’ both are ‘1’ (aka its a function of the two fields) which makes it not an independent variable. As we want independent variables, we will drop this field.

# Removing Correlated Fields
dataset = dataset.drop(columns = ['app_web_user'])
dataset = dataset.drop(columns = ['ios_user'])

## Data Preparation
user_identifier = dataset['user']
dataset = dataset.drop(columns = ['user'])

One hot encoding

We will use One hot encoding which is a simple process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

# One-Hot Encoding
dataset.housing.value_counts()
dataset.groupby('housing')['churn'].nunique().reset_index()
dataset = pd.get_dummies(dataset)
dataset.columns
dataset = dataset.drop(columns = ['housing_na', 'zodiac_sign_na', 'payment_type_na'])

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset.drop(columns = 'churn'), dataset['churn'],
                                                    test_size = 0.2,
                                                    random_state = 0)

# Balancing the Training Set
y_train.value_counts()

pos_index = y_train[y_train.values == 1].index
neg_index = y_train[y_train.values == 0].index

if len(pos_index) > len(neg_index):
    higher = pos_index
    lower = neg_index
else:
    higher = neg_index
    lower = pos_index

random.seed(0)
higher = np.random.choice(higher, size=len(lower))
lower = np.asarray(lower)
new_indexes = np.concatenate((lower, higher))

X_train = X_train.loc[new_indexes,]
y_train = y_train[new_indexes]

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train2 = pd.DataFrame(sc_X.fit_transform(X_train))
X_test2 = pd.DataFrame(sc_X.transform(X_test))
X_train2.columns = X_train.columns.values
X_test2.columns = X_test.columns.values
X_train2.index = X_train.index.values
X_test2.index = X_test.index.values
X_train = X_train2
X_test = X_test2

Model building

# Fitting Model to the Training Set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=0, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

# Predicting Test Set
y_pred = classifier.predict(X_test)

# Evaluating Results
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score, precision_score, recall_score
cm = confusion_matrix(y_test, y_pred)
accuracy_score(y_test, y_pred)
precision_score(y_test, y_pred) # tp / (tp + fp)
recall_score(y_test, y_pred) # tp / (tp + fn)
f1_score(y_test, y_pred)

df_cm = pd.DataFrame(cm, index = (0, 1), columns = (0, 1))
plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)
sn.heatmap(df_cm, annot=True, fmt='g')
print("Test Data Accuracy: %0.4f" % accuracy_score(y_test, y_pred))

Test Data Accuracy: 0.6399

# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("SVM Accuracy: %0.3f (+/- %0.3f)" % (accuracies.mean(), accuracies.std() * 2))

SVM Accuracy: 0.652 (+/- 0.023)

# Analyzing Coefficients
pd.concat([pd.DataFrame(X_train.columns, columns = ["features"]),
           pd.DataFrame(np.transpose(classifier.coef_), columns = ["coef"])
           ],axis = 1)

	features	coef
0	age	-0.164204
1	deposits	0.034083
2	withdrawal	0.031360
3	purchases_partners	-0.761356
4	purchases	-0.188282
5	cc_taken	0.043743
6	cc_recommended	0.069329
7	cc_disliked	0.035589
8	cc_liked	-0.003735
9	cc_application_begin	0.100384
10	app_downloaded	-0.057729
11	web_user	0.146670
12	android_user	-0.051212
13	registered_phones	0.098973
14	waiting_4_loan	-0.024642
15	cancelled_loan	0.098474
16	received_loan	0.102405
17	rejected_loan	0.124107
18	left_for_two_month_plus	0.025957
19	left_for_one_month	0.056692
20	reward_rate	-0.282190
21	is_referred	0.041676
22	housing_O	-0.038284
23	housing_R	0.046987
24	payment_type_Bi-Weekly	-0.064121
25	payment_type_Monthly	-0.054434
26	payment_type_Semi-Monthly	-0.038202
27	payment_type_Weekly	0.043886
28	zodiac_sign_Aquarius	0.004534
29	zodiac_sign_Aries	0.042943
30	zodiac_sign_Cancer	0.041694
31	zodiac_sign_Capricorn	0.066223
32	zodiac_sign_Gemini	0.032779
33	zodiac_sign_Leo	0.016060
34	zodiac_sign_Libra	0.005363
35	zodiac_sign_Pisces	0.056116
36	zodiac_sign_Sagittarius	0.032343
37	zodiac_sign_Scorpio	0.002033
38	zodiac_sign_Taurus	0.013963
39	zodiac_sign_Virgo	0.041674

# Recursive Feature Elimination
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# Model to Test
classifier = LogisticRegression()

# Select Best X Features
rfe = RFE(classifier, 20)
rfe = rfe.fit(X_train, y_train)

# summarize the selection of the attributes
print(rfe.support_)

[ True False False  True  True  True  True False False  True  True  True
  True  True False  True  True  True False  True  True  True  True  True
 False False False  True False False False  True False False False False
 False False False False]

print(rfe.ranking_)

[ 1  9  7  1  1  1  1  5 18  1  1  1  1  1 15  1  1  1 12  1  1  1  1  1
  3  2  4  1 20  8 10  1 13 16 19  6 14 21 17 11]

X_train.columns[rfe.support_]

Index(['age', 'purchases_partners', 'purchases', 'cc_taken', 'cc_recommended',
       'cc_application_begin', 'app_downloaded', 'web_user', 'android_user',
       'registered_phones', 'cancelled_loan', 'received_loan', 'rejected_loan',
       'left_for_one_month', 'reward_rate', 'is_referred', 'housing_O',
       'housing_R', 'payment_type_Weekly', 'zodiac_sign_Capricorn'],
      dtype='object')

# New Correlation Matrix
sn.set(style="white")

# Compute the correlation matrix
corr = X_train[X_train.columns[rfe.support_]].corr()

# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(18, 15))

# Generate a custom diverging colormap
cmap = sn.diverging_palette(220, 10, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sn.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})    
plt.show()

png

# Fitting Model to the Training Set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train[X_train.columns[rfe.support_]], y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

# Predicting Test Set
y_pred = classifier.predict(X_test[X_train.columns[rfe.support_]])

# Evaluating Results
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score, precision_score, recall_score
cm = confusion_matrix(y_test, y_pred)
accuracy_score(y_test, y_pred)
precision_score(y_test, y_pred) # tp / (tp + fp)
recall_score(y_test, y_pred) # tp / (tp + fn)
f1_score(y_test, y_pred)

0.6318327974276529

df_cm = pd.DataFrame(cm, index = (1, 0), columns = (1, 0))
plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)
sn.heatmap(df_cm, annot=True, fmt='g')
print("Test Data Accuracy: %0.4f" % accuracy_score(y_test, y_pred))

Test Data Accuracy: 0.6378

 #Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier,
                             X = X_train[X_train.columns[rfe.support_]],
                             y = y_train, cv = 10)
print("SVM Accuracy: %0.3f (+/- %0.3f)" % (accuracies.mean(), accuracies.std() * 2))

SVM Accuracy: 0.651 (+/- 0.027)

# Analyzing Coefficients
pd.concat([pd.DataFrame(X_train[X_train.columns[rfe.support_]].columns, columns = ["features"]),
           pd.DataFrame(np.transpose(classifier.coef_), columns = ["coef"])
           ],axis = 1)

	features	coef
0	age	-0.164930
1	purchases_partners	-0.753390
2	purchases	-0.139449
3	cc_taken	0.050407
4	cc_recommended	0.071737
5	cc_application_begin	0.104003
6	app_downloaded	-0.057866
7	web_user	0.147560
8	android_user	-0.051496
9	registered_phones	0.099842
10	cancelled_loan	0.098161
11	received_loan	0.101879
12	rejected_loan	0.122173
13	left_for_one_month	0.056976
14	reward_rate	-0.287096
15	is_referred	0.042179
16	housing_O	-0.038413
17	housing_R	0.048422
18	payment_type_Weekly	0.089569
19	zodiac_sign_Capricorn	0.051997

# Formatting Final Results
final_results = pd.concat([y_test, user_identifier], axis = 1).dropna()
final_results['predicted_churn'] = y_pred
final_results = final_results[['user', 'churn', 'predicted_churn']].reset_index(drop=True)
final_results

	user	churn	predicted_churn
0	20839	0.0	1
1	15359	1.0	0
2	34210	1.0	0
3	57608	1.0	1
4	11790	0.0	0
5	1826	1.0	1
6	8508	0.0	1
7	50946	1.0	1
8	50130	1.0	0
9	55422	0.0	0
10	259	1.0	1
11	17451	0.0	0
12	41909	0.0	0
13	38825	0.0	1
14	19314	1.0	1
15	26916	0.0	0
16	30614	0.0	1
17	30329	1.0	1
18	38853	0.0	1
19	15592	1.0	1
20	40888	0.0	1
21	17918	1.0	0
22	52613	0.0	0
23	725	0.0	1
24	51797	0.0	0
25	2601	0.0	0
26	33990	0.0	1
27	10006	0.0	0
28	19296	1.0	0
29	12135	1.0	1
...	...	...	...
3763	64494	1.0	0
3764	1185	0.0	0
3765	17908	0.0	1
3766	52426	0.0	1
3767	41552	0.0	0
3768	52762	1.0	0
3769	35892	1.0	1
3770	28025	1.0	0
3771	55416	0.0	0
3772	14997	0.0	1
3773	25667	0.0	1
3774	44166	0.0	1
3775	50893	1.0	0
3776	10975	1.0	1
3777	38184	0.0	0
3778	31601	0.0	0
3779	31167	0.0	0
3780	51126	0.0	1
3781	58440	0.0	0
3782	65088	0.0	1
3783	26821	0.0	0
3784	25599	0.0	1
3785	3369	0.0	1
3786	33587	1.0	0
3787	22318	0.0	1
3788	67681	0.0	1
3789	49145	1.0	0
3790	47206	0.0	0
3791	22377	0.0	0
3792	47663	1.0	0

3793 rows × 3 columns

Conclusion

Our model has provided us with an indication of which users are likely to churn. We have purposfully left the date of the expected churn open-ended because we are focussed on only gauging the features that indicate disengagement with the product, and not the exact manner in which users will disengage. We chose this open ended emphasis to get a sense of those who are even just a bit likely to churn because we are not aiming to create new products for people who are going to leave us for sure, but for people who are starting to lose interest in the app.

If after creating new product features , we start seeing our model predict that less of our users are going to churn, then we can assume our customers are feeling more engaged with what we are offering them.

We can move forward with these new efforts by inquiring the opinion of users about the new features (eg. survey). If we want to transition into predicting churn more accurately , in order to put emhpasis directly on those users leaving the product, then we can add a time dimension to churn, which would add more accuracy to our model.