首页 > 其他分享 >【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析

时间:2022-12-11 17:37:43浏览次数:53  
标签:knn total Qu calls ## charge 代写 churn day

电信公司churn数据客户流失 k近邻(knn)模型预测分析

 

Data background

A telephone company is interested in determining which customer characteristics are useful for predicting churn, customers who will leave their service. 

The data set  is Churn . The fields are as follows:

 

State

 discrete.

account length

 continuous.

area code

 continuous.

phone number

 discrete.

international plan

 discrete.

voice mail plan

 discrete.

number vmail messages

 continuous.

total day minutes

 continuous.

total day calls

 continuous.

total day charge

 continuous.

total eve minutes

 continuous.

total eve calls

 continuous.

total eve charge

 continuous.

total night minutes

 continuous.

total night calls

 continuous.

total night charge

 continuous.

total intl minutes

 continuous.

total intl calls

 continuous.

total intl charge

 continuous.

number customer service calls

 continuous.

churn

 Discrete

Data Preparation and Exploration 

 

查看数据概览

## state account.length area.code phone.number
## WV : 158 Min. : 1.0 Min. :408.0 327-1058: 1
## MN : 125 1st Qu.: 73.0 1st Qu.:408.0 327-1319: 1
## AL : 124 Median :100.0 Median :415.0 327-2040: 1
## ID : 119 Mean :100.3 Mean :436.9 327-2475: 1
## VA : 118 3rd Qu.:127.0 3rd Qu.:415.0 327-3053: 1
## OH : 116 Max. :243.0 Max. :510.0 327-3587: 1
## (Other):4240 (Other) :4994
## international.plan voice.mail.plan number.vmail.messages
## no :4527 no :3677 Min. : 0.000
## yes: 473 yes:1323 1st Qu.: 0.000
## Median : 0.000
## Mean : 7.755
## 3rd Qu.:17.000
## Max. :52.000
##
## total.day.minutes total.day.calls total.day.charge total.eve.minutes
## Min. : 0.0 Min. : 0 Min. : 0.00 Min. : 0.0
## 1st Qu.:143.7 1st Qu.: 87 1st Qu.:24.43 1st Qu.:166.4
## Median :180.1 Median :100 Median :30.62 Median :201.0
## Mean :180.3 Mean :100 Mean :30.65 Mean :200.6
## 3rd Qu.:216.2 3rd Qu.:113 3rd Qu.:36.75 3rd Qu.:234.1
## Max. :351.5 Max. :165 Max. :59.76 Max. :363.7
##
## total.eve.calls total.eve.charge total.night.minutes total.night.calls
## Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 87.0 1st Qu.:14.14 1st Qu.:166.9 1st Qu.: 87.00
## Median :100.0 Median :17.09 Median :200.4 Median :100.00
## Mean :100.2 Mean :17.05 Mean :200.4 Mean : 99.92
## 3rd Qu.:114.0 3rd Qu.:19.90 3rd Qu.:234.7 3rd Qu.:113.00
## Max. :170.0 Max. :30.91 Max. :395.0 Max. :175.00
##
## total.night.charge total.intl.minutes total.intl.calls total.intl.charge
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. :0.000
## 1st Qu.: 7.510 1st Qu.: 8.50 1st Qu.: 3.000 1st Qu.:2.300
## Median : 9.020 Median :10.30 Median : 4.000 Median :2.780
## Mean : 9.018 Mean :10.26 Mean : 4.435 Mean :2.771
## 3rd Qu.:10.560 3rd Qu.:12.00 3rd Qu.: 6.000 3rd Qu.:3.240
## Max. :17.770 Max. :20.00 Max. :20.000 Max. :5.400
##
## number.customer.service.calls churn
## Min. :0.00 False.:4293
## 1st Qu.:1.00 True. : 707
## Median :1.00
## Mean :1.57
## 3rd Qu.:2.00
## Max. :9.00
##

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN

 从数据概览中我们可以发现没有缺失数据,同时可以发现电话号 地区代码是没有价值的变量,可以删去

 

Examine the variables graphically 

 

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_数据_02

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_03

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_04

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_05

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_06

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_07

​从上面的结果中,我们可以看到churn为no的样本数目要远远大于churn为yes的样本,因此所有样本中churn占多数。​

 

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_数据_08

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_09

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_10

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_11

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_12

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_13

从上面的结果中,我们可以看到除了emailcode和areacode之外,其他数值变量近似符合正态分布。

##  account.length    area.code     number.vmail.messages total.day.minutes
## Min. : 1.0 Min. :408.0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 73.0 1st Qu.:408.0 1st Qu.: 0.000 1st Qu.:143.7
## Median :100.0 Median :415.0 Median : 0.000 Median :180.1
## Mean :100.3 Mean :436.9 Mean : 7.755 Mean :180.3
## 3rd Qu.:127.0 3rd Qu.:415.0 3rd Qu.:17.000 3rd Qu.:216.2
## Max. :243.0 Max. :510.0 Max. :52.000 Max. :351.5
## total.day.calls total.day.charge total.eve.minutes total.eve.calls
## Min. : 0 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 87 1st Qu.:24.43 1st Qu.:166.4 1st Qu.: 87.0
## Median :100 Median :30.62 Median :201.0 Median :100.0
## Mean :100 Mean :30.65 Mean :200.6 Mean :100.2
## 3rd Qu.:113 3rd Qu.:36.75 3rd Qu.:234.1 3rd Qu.:114.0
## Max. :165 Max. :59.76 Max. :363.7 Max. :170.0
## total.eve.charge total.night.minutes total.night.calls total.night.charge
## Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.000
## 1st Qu.:14.14 1st Qu.:166.9 1st Qu.: 87.00 1st Qu.: 7.510
## Median :17.09 Median :200.4 Median :100.00 Median : 9.020
## Mean :17.05 Mean :200.4 Mean : 99.92 Mean : 9.018
## 3rd Qu.:19.90 3rd Qu.:234.7 3rd Qu.:113.00 3rd Qu.:10.560
## Max. :30.91 Max. :395.0 Max. :175.00 Max. :17.770
## total.intl.minutes total.intl.calls total.intl.charge
## Min. : 0.00 Min. : 0.000 Min. :0.000
## 1st Qu.: 8.50 1st Qu.: 3.000 1st Qu.:2.300
## Median :10.30 Median : 4.000 Median :2.780
## Mean :10.26 Mean : 4.435 Mean :2.771
## 3rd Qu.:12.00 3rd Qu.: 6.000 3rd Qu.:3.240
## Max. :20.00 Max. :20.000 Max. :5.400
## number.customer.service.calls
## Min. :0.00
## 1st Qu.:1.00
## Median :1.00
## Mean :1.57
## 3rd Qu.:2.00
## Max. :9.00

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_14

Relationships between variables

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_15

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_16

​从结果中我们可以看到两者之间存在显著的正相关线性关系。​

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_17

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_18

Using the statistics node, report

##                               account.length    area.code
## account.length 1.0000000000 -0.018054187
## area.code -0.0180541874 1.000000000
## number.vmail.messages -0.0145746663 -0.003398983
## total.day.minutes -0.0010174908 -0.019118245
## total.day.calls 0.0282402279 -0.019313854
## total.day.charge -0.0010191980 -0.019119256
## total.eve.minutes -0.0095913331 0.007097877
## total.eve.calls 0.0091425790 -0.012299947
## total.eve.charge -0.0095873958 0.007114130
## total.night.minutes 0.0006679112 0.002083626
## total.night.calls -0.0078254785 0.014656846
## total.night.charge 0.0006558937 0.002070264
## total.intl.minutes 0.0012908394 -0.004153729
## total.intl.calls 0.0142772733 -0.013623309
## total.intl.charge 0.0012918112 -0.004219099
## number.customer.service.calls -0.0014447918 0.020920513
## number.vmail.messages total.day.minutes
## account.length -0.0145746663 -0.001017491
## area.code -0.0033989831 -0.019118245
## number.vmail.messages 1.0000000000 0.005381376
## total.day.minutes 0.0053813760 1.000000000
## total.day.calls 0.0008831280 0.001935149
## total.day.charge 0.0053767959 0.999999951
## total.eve.minutes 0.0194901208 -0.010750427
## total.eve.calls -0.0039543728 0.008128130
## total.eve.charge 0.0194959757 -0.010760022
## total.night.minutes 0.0055413838 0.011798660
## total.night.calls 0.0026762202 0.004236100
## total.night.charge 0.0055349281 0.011782533
## total.intl.minutes 0.0024627018 -0.019485746
## total.intl.calls 0.0001243302 -0.001303123
## total.intl.charge 0.0025051773 -0.019414797
## number.customer.service.calls -0.0070856427 0.002732576
## total.day.calls total.day.charge
## account.length 0.0282402279 -0.001019198
## area.code -0.0193138545 -0.019119256
## number.vmail.messages 0.0008831280 0.005376796
## total.day.minutes 0.0019351487 0.999999951
## total.day.calls 1.0000000000 0.001935884
## total.day.charge 0.0019358844 1.000000000
## total.eve.minutes -0.0006994115 -0.010747297
## total.eve.calls 0.0037541787 0.008129319
## total.eve.charge -0.0006952217 -0.010756893
## total.night.minutes 0.0028044650 0.011801434
## total.night.calls -0.0083083467 0.004234934
## total.night.charge 0.0028018169 0.011785301
## total.intl.minutes 0.0130972198 -0.019489700
## total.intl.calls 0.0108928533 -0.001306635
## total.intl.charge 0.0131613976 -0.019418755
## number.customer.service.calls -0.0107394951 0.002726370
## total.eve.minutes total.eve.calls
## account.length -0.0095913331 0.009142579
## area.code 0.0070978766 -0.012299947
## number.vmail.messages 0.0194901208 -0.003954373
## total.day.minutes -0.0107504274 0.008128130
## total.day.calls -0.0006994115 0.003754179
## total.day.charge -0.0107472968 0.008129319
## total.eve.minutes 1.0000000000 0.002763019
## total.eve.calls 0.0027630194 1.000000000
## total.eve.charge 0.9999997749 0.002778097
## total.night.minutes -0.0166391160 0.001781411
## total.night.calls 0.0134202163 -0.013682341
## total.night.charge -0.0166420421 0.001799380
## total.intl.minutes 0.0001365487 -0.007458458
## total.intl.calls 0.0083881559 0.005574500
## total.intl.charge 0.0001593155 -0.007507151
## number.customer.service.calls -0.0138234228 0.006234831
## total.eve.charge total.night.minutes
## account.length -0.0095873958 0.0006679112
## area.code 0.0071141298 0.0020836263
## number.vmail.messages 0.0194959757 0.0055413838
## total.day.minutes -0.0107600217 0.0117986600
## total.day.calls -0.0006952217 0.0028044650
## total.day.charge -0.0107568931 0.0118014339
## total.eve.minutes 0.9999997749 -0.0166391160
## total.eve.calls 0.0027780971 0.0017814106
## total.eve.charge 1.0000000000 -0.0166489191
## total.night.minutes -0.0166489191 1.0000000000
## total.night.calls 0.0134220174 0.0269718182
## total.night.charge -0.0166518367 0.9999992072
## total.intl.minutes 0.0001320238 -0.0067209669
## total.intl.calls 0.0083930603 -0.0172140162
## total.intl.charge 0.0001547783 -0.0066545873
## number.customer.service.calls -0.0138363623 -0.0085325365

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_19

如果把高相关性的变量保存下来,可能会造成多重共线性问题,因此需要把高相关关系的变量删去。

Data Manipulation

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_20

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_21

从结果中可以看到,total.day.calls和total.day.charge之间存在一定的相关关系。
特别是voicemial为no的变量之间存在负相关关系。

 

 Discretize (make categorical) a relevant numeric variable  

 

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_22

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_23

 

 

​对变量进行离散化​

 

 construct a distribution of the variable with a churn overlay 

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_24

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_25

construct a histogram of the variable with a churn overlay

 

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_26

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_数据_27

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_28

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_29

 

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_30

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_31

 Find a pair of numeric variables which are interesting with respect to churn. 

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_数据_32

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_33

从结果中可以看到,total.day.calls和total.day.charge之间存在一定的相关关系。

Model Building

特别是churn为no的变量之间存在相关关系。

##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.3082150 0.0735760 4.189 2.85e-05 ***
## stateAL 0.0151188 0.0462343 0.327 0.743680
## stateAR 0.0894792 0.0490897 1.823 0.068399 .
## stateAZ 0.0329566 0.0494195 0.667 0.504883
## stateCA 0.1951511 0.0567439 3.439 0.000588 ***
## international.plan yes 0.3059341 0.0151677 20.170 < 2e-16 ***
## voice.mail.plan yes -0.1375056 0.0337533 -4.074 4.70e-05 ***
## number.vmail.messages 0.0017068 0.0010988 1.553 0.120402
## total.day.minutes 0.3796323 0.2629027 1.444 0.148802
## total.day.calls 0.0002191 0.0002235 0.981 0.326781
## total.day.charge -2.2207671 1.5464583 -1.436 0.151056
## total.eve.minutes 0.0288233 0.1307496 0.220 0.825533
## total.eve.calls -0.0001585 0.0002238 -0.708 0.478915
## total.eve.charge -0.3316041 1.5382391 -0.216 0.829329
## total.night.minutes 0.0083224 0.0695916 0.120 0.904814
## total.night.calls -0.0001824 0.0002225 -0.820 0.412290
## total.night.charge -0.1760782 1.5464674 -0.114 0.909355
## total.intl.minutes -0.0104679 0.4192270 -0.025 0.980080
## total.intl.calls -0.0063448 0.0018062 -3.513 0.000447 ***
## total.intl.charge 0.0676460 1.5528267 0.044 0.965254
## number.customer.service.calls 0.0566474 0.0033945 16.688 < 2e-16 ***
## total.day.minutes1medium 0.0502681 0.0160228 3.137 0.001715 **
## total.day.minutes1short 0.2404020 0.0322293 7.459 1.02e-13 ***

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Data_34

 

从结果中看,我们可以发现 state  total.intl.calls   、number.customer.service.calls 、 total.day.minutes1medium 、    total.day.minutes1short    的变量有重要的影响。

Use K-Nearest-Neighbors (K-NN) algorithm to develop a model for predicting Churn 

##         Direction.2005
## knn.pred 1 2
## 1 760 97
## 2 100 43


[1] 0.803

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_KNN_35

混淆矩阵(英语:confusion matrix)是可视化工具,特别用于监督学习,在无监督学习一般叫做匹配矩阵。 矩阵的每一列代表一个类的实例预测,而每一行表示一个实际的类的实例。
##         Direction.2005
## knn.pred 1 2
## 1 827 104
## 2 33 36



[1] 0.863

【大数据部落】R语言代写电信公司churn数据客户流失 k近邻(knn)模型预测分析_Max_36

 

从测试集的结果,我们可以看到准确度达到86%。

 

Findings  

 

我们可以发现 ,total.day.calls和total.day.charge之间存在一定的相关关系。特别是churn为no的变量之间存在相关关系。同时我们可以发现 state  total.intl.calls   、number.customer.service.calls 、 total.day.minutes1medium、    total.day.minutes1short    的变量有重要的影响。同时我们可以发现,total.day.calls和total.day.charge之间存在一定的相关关系。最后从knn模型结果中,我们可以发现从训练集的结果中,我们可以看到准确度有80%,从测试集的结果,我们可以看到准确度达到86%。说明模型有很好的预测效果。

 

如果您有任何疑问,请在下面发表评论。  


标签:knn,total,Qu,calls,##,charge,代写,churn,day
From: https://blog.51cto.com/u_14293657/5928409

相关文章