首页 > 其他分享 >客户价值观分析

客户价值观分析

时间:2024-06-10 12:54:58浏览次数:15  
标签:分析 24 13 12 15 22 23 价值观 客户

客户价值分析

一、实验目的与要求
1、掌握使用numpy和pandas库处理数据的基本方法。
2、掌握使用RFM分析模型对客户信息进行特征提取的基本方法。
3、掌握对特征数据进行标准化处理的基本方法。
4、掌握使用Sklearn库对K-Means聚类算法的实现及其评价方法。
5、掌握使用matplotlib结合pandas库对数据分析可视化处理的基本方法。
二、实验内容
1、利用python中pandas等库完成对数据的预处理,并计算R、F、M等3个特征指标,最后将处理好的文件进行保存。
2、利用python中pandas等库完成对数据的标准化处理。
3、利用Sklearn库和RFM分析方法建立聚类模型,完成对客户价值的聚类分析,并对巨累结果进行评价。
4、结合pandas、matplotlib库对聚类完成的结果进行可视化处理。
三、实验步骤

1、数据预处理。

(1)导入所需要使用的包

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
from sklearn.cluster import KMeans
from datetime import datetime

(2)读取文件

datafile="/data/bigfiles/data2.csv"
data = pd.read_csv(datafile)

(3)查看数据的基本统计信息

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2832 entries, 0 to 2831
Data columns (total 54 columns):
买家会员名           2660 non-null object
买家实际支付积分        2660 non-null float64
买家实际支付金额        2660 non-null float64
买家应付货款          2660 non-null float64
买家应付邮费          2660 non-null float64
买家支付宝账号         2658 non-null object
买家支付积分          2660 non-null float64
买家服务费           2660 non-null object
买家留言            163 non-null object
修改后的sku         0 non-null float64
修改后的收货地址        358 non-null object
分阶段订单信息         0 non-null float64
卖家服务费           2660 non-null float64
发票抬头            0 non-null float64
含应开票给个人的个人红包    0 non-null float64
天猫卡券抵扣          0 non-null float64
定金排名            0 non-null float64
宝贝总数量           2660 non-null float64
宝贝标题            2397 non-null object
宝贝种类            2660 non-null float64
店铺Id            1581 non-null float64
店铺名称            2660 non-null object
异常信息            0 non-null float64
总金额             2660 non-null float64
打款商家金额          2660 non-null object
支付单号            1560 non-null object
支付详情            1560 non-null object
收货人姓名           2660 non-null object
收货地址            2660 non-null object
新零售交易类型         2660 non-null object
新零售发货门店id       0 non-null float64
新零售发货门店名称       0 non-null float64
新零售导购门店id       0 non-null float64
新零售导购门店名称       0 non-null float64
是否上传合同照片        2660 non-null object
是否上传小票          2660 non-null object
是否代付            2660 non-null object
是否手机订单          1838 non-null object
是否是O2O交易        0 non-null float64
物流公司            1425 non-null object
物流单号            1425 non-null object
特权订金订单id        0 non-null float64
确认收货时间          1876 non-null object
联系手机            2659 non-null object
联系电话            130 non-null object
订单付款时间          2148 non-null object
订单关闭原因          2660 non-null object
订单创建时间          2660 non-null object
订单备注            695 non-null object
订单状态            2660 non-null object
运送方式            2660 non-null object
返点积分            2660 non-null float64
退款金额            2660 non-null float64
数据采集时间          2660 non-null object
dtypes: float64(25), object(29)
memory usage: 1.2+ MB
len(data)
2832
data.describe()
买家实际支付积分 买家实际支付金额 买家应付货款 买家应付邮费 买家支付积分 修改后的sku 分阶段订单信息 卖家服务费 发票抬头 含应开票给个人的个人红包 ... 异常信息 总金额 新零售发货门店id 新零售发货门店名称 新零售导购门店id 新零售导购门店名称 是否是O2O交易 特权订金订单id 返点积分 退款金额
count 2660.0 2660.000000 2660.000000 2660.000000 2660.0 0.0 0.0 2660.0 0.0 0.0 ... 0.0 2660.000000 0.0 0.0 0.0 0.0 0.0 0.0 2660.0 2660.000000
mean 0.0 155.113094 181.193241 1.257519 0.0 NaN NaN 0.0 NaN NaN ... NaN 182.450759 NaN NaN NaN NaN NaN NaN 0.0 10.436218
std 0.0 350.332509 366.871965 4.408725 0.0 NaN NaN 0.0 NaN NaN ... NaN 366.806966 NaN NaN NaN NaN NaN NaN 0.0 131.244263
min 0.0 0.000000 0.100000 0.000000 0.0 NaN NaN 0.0 NaN NaN ... NaN 0.100000 NaN NaN NaN NaN NaN NaN 0.0 0.000000
25% 0.0 43.890000 50.860000 0.000000 0.0 NaN NaN 0.0 NaN NaN ... NaN 51.870000 NaN NaN NaN NaN NaN NaN 0.0 0.000000
50% 0.0 62.860000 89.700000 0.000000 0.0 NaN NaN 0.0 NaN NaN ... NaN 90.130000 NaN NaN NaN NaN NaN NaN 0.0 0.000000
75% 0.0 199.000000 268.000000 0.000000 0.0 NaN NaN 0.0 NaN NaN ... NaN 268.000000 NaN NaN NaN NaN NaN NaN 0.0 0.000000
max 0.0 13246.800000 13246.800000 55.000000 0.0 NaN NaN 0.0 NaN NaN ... NaN 13246.800000 NaN NaN NaN NaN NaN NaN 0.0 3950.730000

8 rows × 25 columns

(4)提取属性列

data.columns
Index(['买家会员名', '买家实际支付积分', '买家实际支付金额', '买家应付货款', '买家应付邮费', '买家支付宝账号',
       '买家支付积分', '买家服务费', '买家留言', '修改后的sku', '修改后的收货地址', '分阶段订单信息', '卖家服务费',
       '发票抬头', '含应开票给个人的个人红包', '天猫卡券抵扣', '定金排名', '宝贝总数量', '宝贝标题 ', '宝贝种类 ',
       '店铺Id', '店铺名称', '异常信息', '总金额', '打款商家金额', '支付单号', '支付详情', '收货人姓名',
       '收货地址', '新零售交易类型', '新零售发货门店id', '新零售发货门店名称', '新零售导购门店id', '新零售导购门店名称',
       '是否上传合同照片', '是否上传小票', '是否代付', '是否手机订单', '是否是O2O交易', '物流公司', '物流单号 ',
       '特权订金订单id', '确认收货时间', '联系手机', '联系电话 ', '订单付款时间', '订单关闭原因', '订单创建时间',
       '订单备注', '订单状态', '运送方式', '返点积分', '退款金额', '数据采集时间'],
      dtype='object')
data.订单状态.unique()
array(['买家已付款,等待卖家发货', '等待买家付款', '卖家已发货,等待买家确认', '交易关闭', '交易成功', nan],
      dtype=object)
data = data[data.订单状态 == '交易成功']
data
买家会员名 买家实际支付积分 买家实际支付金额 买家应付货款 买家应付邮费 买家支付宝账号 买家支付积分 买家服务费 买家留言 修改后的sku ... 联系电话 订单付款时间 订单关闭原因 订单创建时间 订单备注 订单状态 运送方式 返点积分 退款金额 数据采集时间
25 gang_2015 0.0 143.64 143.64 0.0 18104860223 0.0 0元 NaN NaN ... NaN 2018/1/27 订单未关闭 2018-01-27 09:57:23 NaN 交易成功 快递 0.0 0.0 2018/12/31
26 tb6683844_2011 0.0 55.86 55.86 0.0 17743451991 0.0 0元 NaN NaN ... NaN 2018/1/26 订单未关闭 2018-01-26 22:55:46 NaN 交易成功 快递 0.0 0.0 2018/12/31
30 dlzslv 0.0 90.72 90.72 0.0 zs-lv@sohu.com 0.0 0元 NaN NaN ... NaN 2018/1/26 订单未关闭 2018-01-26 13:37:22 NaN 交易成功 快递 0.0 0.0 2018/12/31
31 劳什子2010 0.0 48.86 48.86 0.0 tangzhai2010@163.com 0.0 0元 NaN NaN ... NaN 2018/1/26 订单未关闭 2018-01-26 10:12:18 v6 交易成功 快递 0.0 0.0 2018/12/31
32 李氏江江48 0.0 103.74 103.74 0.0 849694657@qq.com 0.0 0元 NaN NaN ... NaN 2018/1/26 订单未关闭 2018-01-26 06:48:35 NaN 交易成功 快递 0.0 0.0 2018/12/31
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2655 旋光精灵 0.0 999.00 999.00 0.0 20323624@qq.com 0.0 0元 NaN NaN ... NaN 2017/1/4 订单未关闭 2017/1/4 15:05 NaN 交易成功 快递 0.0 0.0 2018/12/31
2656 leryang 0.0 268.00 268.00 0.0 9722165@163.com 0.0 0元 NaN NaN ... NaN 2017/1/3 订单未关闭 2017/1/3 16:51 '中通快递:728773317678 交易成功 虚拟物品 0.0 0.0 2018/12/31
2657 leryang 0.0 134.00 134.00 0.0 9722165@163.com 0.0 0元 NaN NaN ... NaN 2017/1/3 订单未关闭 2017/1/3 16:51 NaN 交易成功 虚拟物品 0.0 0.0 2018/12/31
2658 crazy283 0.0 268.00 268.00 0.0 crazy355@126.com 0.0 0元 NaN NaN ... NaN 2017/1/3 订单未关闭 2017/1/3 16:01 '中通:728773317331 【月月 01-04 08:58】 交易成功 虚拟物品 0.0 0.0 2018/12/31
2659 zhangyang52058 0.0 63.70 48.70 15.0 13693516433 0.0 0元 NaN NaN ... NaN 2017/1/2 订单未关闭 2017/1/2 23:28 NaN 交易成功 快递 0.0 0.0 2018/12/31

1876 rows × 54 columns

#提取需要的列
# 这里需要买家id,支付金额,支付时间,最后付款时间
data=data.filter(items=['买家会员名','打款商家金额','订单付款时间'])

(5)处理异常数据

# 统计数据缺失的值
datas=data.isnull().sum()
datas
买家会员名     0
打款商家金额    0
订单付款时间    0
dtype: int64
# 查看完全重复行
result=data.duplicated()
df=data[result]
df
买家会员名 打款商家金额 订单付款时间
71 qufan_xiao 100.00元 2018/1/20
119 kangfengtj 55.86元 2018/1/16
207 waterli2005 55.86元 2018/1/3
211 时尚乐器 268.00元 2018/1/3
584 猪头luing 254.00元 2018/6/22
... ... ... ...
2308 bill163com 200.00元 2017/6/21
2320 南山熊00340 268.00元 2017/6/13
2354 铭铭猪是的念倒 201.00元 2017/6/1
2446 夜沉晨 201.00元 2017/4/15
2533 chenzh3664951 201.00元 2017/3/14

103 rows × 3 columns

# 删除完全重复的行
data=data.drop_duplicates()
#删除未付款的行
data.drop(data.loc[data['打款商家金额']=='0.00元'].index, inplace=True)
data['订单付款时间'] = data.订单付款时间.map(lambda x: datetime.strptime(x, '%Y/%m/%d'))
data.打款商家金额 = data.打款商家金额.map(lambda x: re.sub('元','',x))
data.打款商家金额 = data.打款商家金额.map(lambda x: float(x))
# print(data)
data =data.groupby("买家会员名").agg({"打款商家金额":"sum","订单付款时间":"max","买家会员名":"count"})
data = data.rename(columns = {'打款商家金额':'总金额','买家会员名':'付款次数'})

(6)计算R并进行标准化,更改列名

# 计算R
# 数据采集时间减去订单付款时间
exdata_date=datetime(2018,12,31)
start_date=datetime(2017,1,2)
data['R(最后一次消费时间)']=exdata_date-data['订单付款时间']
data
总金额 订单付款时间 付款次数 R(最后一次消费时间)
买家会员名
00牛哥哥00 402.00 2017-02-06 2 693 days
020luo 74.70 2017-11-18 1 408 days
0587xueguangju 268.00 2017-04-14 1 626 days
0o秋天de童话 411.50 2018-10-09 2 83 days
0残缺0 48.86 2018-01-19 1 346 days
... ... ... ... ...
黑河市2013 47.88 2018-01-11 1 354 days
黑瑾瞳 158.44 2018-07-26 2 158 days
鼠标右键点 51.87 2018-12-12 1 19 days
龙星宇1018 198.00 2017-11-17 1 409 days
龙魂爱上凤灵 43.86 2017-12-13 1 383 days

1483 rows × 4 columns

(7)计算F并进行标准化,更改列名

from math import ceil
# 计算最后一次消费事件和起始时间
period_day=data['订单付款时间']-start_date
#创建空列表统计月数
period_month=[]
for i in period_day:
    period_month.append(ceil(i.days/30))
# 第一次输出月数统计
print(period_month)
[2, 11, 4, 22, 13, 3, 9, 15, 8, 7, 17, 12, 24, 23, 24, 17, 22, 17, 17, 5, 24, 13, 18, 18, 11, 24, 13, 13, 9, 22, 8, 22, 22, 11, 11, 12, 15, 13, 6, 20, 17, 13, 13, 22, 8, 15, 4, 24, 11, 10, 24, 13, 12, 18, 13, 15, 13, 13, 13, 9, 12, 23, 11, 12, 24, 24, 23, 24, 10, 17, 11, 24, 24, 6, 22, 24, 19, 8, 12, 18, 12, 2, 19, 25, 6, 6, 10, 17, 12, 10, 5, 25, 15, 12, 9, 18, 8, 7, 18, 23, 18, 8, 22, 9, 3, 17, 3, 9, 7, 5, 3, 10, 9, 20, 12, 11, 24, 23, 18, 17, 23, 1, 15, 8, 9, 4, 24, 22, 13, 20, 22, 11, 15, 10, 15, 22, 11, 5, 12, 12, 19, 1, 13, 6, 9, 9, 15, 19, 19, 19, 9, 10, 17, 15, 17, 5, 24, 10, 9, 3, 23, 22, 13, 15, 15, 12, 24, 11, 9, 15, 22, 11, 8, 22, 12, 12, 22, 6, 22, 11, 18, 8, 22, 2, 4, 13, 23, 23, 23, 23, 15, 9, 23, 24, 23, 24, 9, 13, 7, 23, 12, 8, 10, 12, 23, 22, 10, 10, 23, 9, 19, 3, 15, 13, 12, 13, 13, 15, 10, 17, 9, 15, 13, 13, 15, 17, 12, 13, 13, 19, 11, 17, 3, 3, 18, 12, 13, 15, 15, 19, 9, 15, 10, 8, 13, 12, 22, 17, 17, 15, 5, 12, 15, 23, 18, 13, 17, 24, 11, 22, 13, 5, 14, 5, 5, 13, 15, 11, 7, 11, 24, 9, 7, 13, 13, 17, 15, 6, 14, 18, 23, 11, 24, 19, 8, 25, 11, 17, 13, 7, 23, 13, 22, 15, 24, 3, 22, 7, 17, 7, 19, 24, 12, 12, 22, 1, 12, 17, 12, 24, 10, 17, 6, 19, 15, 12, 18, 15, 12, 19, 19, 19, 23, 24, 17, 3, 13, 11, 12, 12, 6, 13, 13, 6, 18, 18, 20, 19, 4, 1, 18, 11, 17, 13, 7, 8, 18, 19, 12, 23, 13, 23, 10, 23, 24, 11, 10, 15, 19, 19, 11, 17, 2, 21, 13, 22, 15, 3, 13, 24, 20, 15, 17, 11, 1, 7, 4, 12, 12, 12, 17, 12, 18, 3, 22, 23, 8, 23, 18, 10, 17, 13, 12, 23, 13, 7, 24, 23, 21, 18, 10, 24, 7, 18, 23, 5, 22, 8, 11, 13, 7, 8, 9, 7, 13, 18, 15, 9, 8, 5, 3, 7, 15, 15, 5, 20, 22, 25, 9, 19, 15, 24, 24, 14, 11, 13, 4, 13, 19, 2, 7, 13, 24, 8, 12, 11, 12, 11, 11, 12, 10, 3, 17, 3, 11, 7, 17, 6, 12, 11, 8, 12, 11, 15, 4, 17, 22, 3, 11, 13, 19, 3, 18, 12, 20, 13, 2, 10, 12, 12, 13, 2, 21, 24, 12, 24, 23, 15, 17, 13, 17, 15, 22, 25, 24, 2, 3, 3, 11, 11, 9, 18, 13, 22, 7, 17, 11, 12, 24, 5, 19, 8, 9, 10, 23, 12, 19, 24, 12, 24, 17, 24, 17, 24, 22, 19, 13, 19, 22, 21, 22, 7, 12, 17, 1, 23, 24, 11, 22, 3, 13, 12, 21, 19, 8, 21, 18, 6, 6, 24, 23, 23, 19, 23, 10, 13, 7, 22, 5, 12, 19, 23, 24, 19, 17, 23, 4, 12, 12, 3, 9, 13, 13, 1, 15, 11, 11, 22, 8, 12, 18, 3, 15, 13, 13, 18, 9, 17, 2, 17, 18, 13, 23, 4, 13, 15, 6, 22, 4, 13, 3, 24, 5, 12, 7, 24, 14, 15, 15, 9, 15, 3, 4, 7, 18, 13, 11, 24, 24, 9, 23, 13, 22, 15, 12, 7, 6, 24, 19, 20, 12, 9, 2, 14, 18, 4, 10, 24, 3, 18, 22, 12, 20, 8, 11, 13, 21, 23, 13, 21, 9, 23, 13, 19, 24, 18, 23, 12, 1, 8, 23, 4, 5, 24, 2, 17, 23, 2, 24, 15, 13, 10, 13, 6, 15, 12, 21, 1, 9, 6, 9, 24, 11, 23, 8, 19, 19, 10, 9, 7, 8, 12, 18, 7, 17, 3, 10, 22, 3, 3, 1, 13, 11, 12, 18, 15, 19, 22, 17, 8, 24, 24, 4, 5, 22, 14, 13, 5, 6, 19, 12, 22, 23, 8, 11, 11, 15, 24, 13, 15, 10, 12, 12, 22, 12, 24, 2, 7, 12, 23, 10, 18, 12, 12, 5, 24, 12, 8, 15, 17, 24, 13, 17, 18, 9, 24, 9, 23, 14, 18, 8, 4, 10, 7, 21, 19, 17, 23, 15, 22, 22, 18, 23, 18, 18, 22, 23, 8, 23, 7, 6, 22, 4, 12, 19, 24, 18, 8, 15, 1, 23, 11, 11, 17, 12, 15, 15, 19, 8, 0, 18, 11, 25, 22, 18, 7, 19, 2, 4, 24, 13, 9, 19, 19, 10, 11, 19, 13, 2, 12, 24, 19, 17, 12, 15, 17, 25, 23, 18, 12, 12, 15, 21, 14, 15, 22, 6, 24, 15, 19, 18, 15, 24, 23, 23, 6, 7, 7, 2, 8, 9, 5, 12, 8, 7, 2, 10, 8, 14, 13, 15, 18, 3, 10, 23, 6, 8, 24, 22, 15, 2, 2, 2, 2, 10, 13, 3, 9, 13, 22, 5, 23, 8, 18, 12, 8, 13, 22, 3, 11, 13, 22, 19, 13, 2, 2, 19, 25, 18, 25, 17, 15, 9, 17, 18, 13, 24, 8, 23, 13, 13, 15, 8, 12, 19, 13, 13, 4, 22, 10, 21, 13, 24, 8, 8, 9, 11, 22, 1, 6, 15, 15, 13, 13, 22, 25, 19, 23, 12, 8, 13, 12, 24, 4, 15, 19, 10, 7, 24, 4, 17, 12, 17, 24, 13, 24, 13, 18, 22, 12, 2, 15, 12, 19, 11, 1, 23, 13, 24, 15, 13, 13, 23, 22, 13, 15, 8, 15, 12, 13, 13, 22, 11, 4, 19, 12, 18, 6, 3, 23, 4, 8, 12, 13, 7, 23, 12, 17, 22, 22, 15, 24, 11, 22, 8, 18, 22, 13, 15, 9, 7, 24, 23, 10, 5, 23, 1, 23, 9, 10, 15, 10, 25, 25, 13, 15, 14, 12, 18, 13, 11, 8, 18, 11, 12, 15, 22, 10, 25, 2, 6, 17, 14, 15, 14, 11, 12, 13, 11, 8, 24, 15, 23, 17, 10, 23, 23, 10, 17, 4, 7, 13, 3, 14, 12, 22, 19, 23, 25, 15, 21, 22, 12, 10, 1, 5, 21, 13, 15, 22, 22, 8, 18, 12, 1, 13, 23, 2, 18, 12, 4, 7, 13, 13, 24, 14, 12, 13, 13, 15, 4, 24, 13, 25, 12, 15, 24, 4, 4, 7, 18, 2, 12, 22, 15, 11, 23, 15, 15, 13, 23, 17, 1, 23, 13, 21, 19, 8, 13, 11, 15, 18, 24, 17, 22, 24, 10, 15, 18, 10, 5, 3, 17, 15, 17, 17, 23, 12, 24, 14, 12, 10, 23, 15, 12, 13, 1, 8, 17, 13, 2, 5, 19, 25, 12, 15, 13, 13, 24, 12, 17, 8, 15, 22, 2, 18, 12, 17, 18, 17, 18, 8, 12, 22, 8, 15, 19, 20, 12, 5, 17, 22, 12, 24, 7, 8, 13, 12, 7, 11, 8, 19, 15, 23, 12, 18, 19, 5, 19, 24, 19, 18, 2, 4, 7, 17, 19, 15, 8, 13, 15, 12, 23, 13, 24, 3, 11, 17, 15, 22, 15, 15, 22, 20, 24, 13, 5, 1, 19, 14, 5, 15, 18, 24, 24, 11, 22, 15, 3, 4, 9, 13, 3, 3, 23, 19, 19, 22, 17, 18, 18, 18, 7, 13, 24, 13, 5, 17, 24, 22, 24, 15, 24, 23, 5, 12, 22, 22, 19, 15, 12, 23, 24, 19, 19, 13, 15, 18, 15, 12, 3, 18, 15, 19, 3, 17, 24, 9, 8, 22, 8, 17, 8, 15, 25, 11, 19, 18, 15, 23, 14, 19, 18, 12, 19, 2, 19, 9, 14, 22, 24, 12, 14, 3, 4, 21, 19, 17, 21, 3, 9, 23, 23, 24, 15, 13, 11, 10, 12, 9, 18, 22, 24, 16, 7, 4, 24, 3, 12, 24, 18, 12, 13, 19, 18, 8, 2, 8, 9, 6, 17, 19, 2, 12, 7, 23, 17, 20, 13, 12, 24, 5, 18, 9, 13, 24, 9, 13, 18, 23, 24, 18, 22, 13, 6, 12, 9, 15, 5, 9, 13, 19, 19, 23, 3, 10, 19, 15, 3, 15, 25, 5, 12, 3, 10, 10, 13, 23, 1, 13, 22, 17, 17, 15, 8, 20, 22, 3, 5, 24, 11, 18, 17, 5, 13, 15, 24, 24, 23, 10, 23, 13, 13, 22, 22, 18, 7, 3, 10, 18, 9, 22, 2, 8, 24, 8, 3, 13, 13, 24, 12, 12, 23, 17, 23, 10, 8, 18, 22, 18, 11, 15, 15, 13, 17, 12, 25, 22, 7, 23, 24, 23, 19, 13, 23, 18, 13, 13, 13, 19, 24, 11, 12]
# 遍历清除0值
for i in range(0,len(period_month)):
    if period_month[i]==0:
        period_month[i]=1
# 第二次统计月数
print(period_month)
[2, 11, 4, 22, 13, 3, 9, 15, 8, 7, 17, 12, 24, 23, 24, 17, 22, 17, 17, 5, 24, 13, 18, 18, 11, 24, 13, 13, 9, 22, 8, 22, 22, 11, 11, 12, 15, 13, 6, 20, 17, 13, 13, 22, 8, 15, 4, 24, 11, 10, 24, 13, 12, 18, 13, 15, 13, 13, 13, 9, 12, 23, 11, 12, 24, 24, 23, 24, 10, 17, 11, 24, 24, 6, 22, 24, 19, 8, 12, 18, 12, 2, 19, 25, 6, 6, 10, 17, 12, 10, 5, 25, 15, 12, 9, 18, 8, 7, 18, 23, 18, 8, 22, 9, 3, 17, 3, 9, 7, 5, 3, 10, 9, 20, 12, 11, 24, 23, 18, 17, 23, 1, 15, 8, 9, 4, 24, 22, 13, 20, 22, 11, 15, 10, 15, 22, 11, 5, 12, 12, 19, 1, 13, 6, 9, 9, 15, 19, 19, 19, 9, 10, 17, 15, 17, 5, 24, 10, 9, 3, 23, 22, 13, 15, 15, 12, 24, 11, 9, 15, 22, 11, 8, 22, 12, 12, 22, 6, 22, 11, 18, 8, 22, 2, 4, 13, 23, 23, 23, 23, 15, 9, 23, 24, 23, 24, 9, 13, 7, 23, 12, 8, 10, 12, 23, 22, 10, 10, 23, 9, 19, 3, 15, 13, 12, 13, 13, 15, 10, 17, 9, 15, 13, 13, 15, 17, 12, 13, 13, 19, 11, 17, 3, 3, 18, 12, 13, 15, 15, 19, 9, 15, 10, 8, 13, 12, 22, 17, 17, 15, 5, 12, 15, 23, 18, 13, 17, 24, 11, 22, 13, 5, 14, 5, 5, 13, 15, 11, 7, 11, 24, 9, 7, 13, 13, 17, 15, 6, 14, 18, 23, 11, 24, 19, 8, 25, 11, 17, 13, 7, 23, 13, 22, 15, 24, 3, 22, 7, 17, 7, 19, 24, 12, 12, 22, 1, 12, 17, 12, 24, 10, 17, 6, 19, 15, 12, 18, 15, 12, 19, 19, 19, 23, 24, 17, 3, 13, 11, 12, 12, 6, 13, 13, 6, 18, 18, 20, 19, 4, 1, 18, 11, 17, 13, 7, 8, 18, 19, 12, 23, 13, 23, 10, 23, 24, 11, 10, 15, 19, 19, 11, 17, 2, 21, 13, 22, 15, 3, 13, 24, 20, 15, 17, 11, 1, 7, 4, 12, 12, 12, 17, 12, 18, 3, 22, 23, 8, 23, 18, 10, 17, 13, 12, 23, 13, 7, 24, 23, 21, 18, 10, 24, 7, 18, 23, 5, 22, 8, 11, 13, 7, 8, 9, 7, 13, 18, 15, 9, 8, 5, 3, 7, 15, 15, 5, 20, 22, 25, 9, 19, 15, 24, 24, 14, 11, 13, 4, 13, 19, 2, 7, 13, 24, 8, 12, 11, 12, 11, 11, 12, 10, 3, 17, 3, 11, 7, 17, 6, 12, 11, 8, 12, 11, 15, 4, 17, 22, 3, 11, 13, 19, 3, 18, 12, 20, 13, 2, 10, 12, 12, 13, 2, 21, 24, 12, 24, 23, 15, 17, 13, 17, 15, 22, 25, 24, 2, 3, 3, 11, 11, 9, 18, 13, 22, 7, 17, 11, 12, 24, 5, 19, 8, 9, 10, 23, 12, 19, 24, 12, 24, 17, 24, 17, 24, 22, 19, 13, 19, 22, 21, 22, 7, 12, 17, 1, 23, 24, 11, 22, 3, 13, 12, 21, 19, 8, 21, 18, 6, 6, 24, 23, 23, 19, 23, 10, 13, 7, 22, 5, 12, 19, 23, 24, 19, 17, 23, 4, 12, 12, 3, 9, 13, 13, 1, 15, 11, 11, 22, 8, 12, 18, 3, 15, 13, 13, 18, 9, 17, 2, 17, 18, 13, 23, 4, 13, 15, 6, 22, 4, 13, 3, 24, 5, 12, 7, 24, 14, 15, 15, 9, 15, 3, 4, 7, 18, 13, 11, 24, 24, 9, 23, 13, 22, 15, 12, 7, 6, 24, 19, 20, 12, 9, 2, 14, 18, 4, 10, 24, 3, 18, 22, 12, 20, 8, 11, 13, 21, 23, 13, 21, 9, 23, 13, 19, 24, 18, 23, 12, 1, 8, 23, 4, 5, 24, 2, 17, 23, 2, 24, 15, 13, 10, 13, 6, 15, 12, 21, 1, 9, 6, 9, 24, 11, 23, 8, 19, 19, 10, 9, 7, 8, 12, 18, 7, 17, 3, 10, 22, 3, 3, 1, 13, 11, 12, 18, 15, 19, 22, 17, 8, 24, 24, 4, 5, 22, 14, 13, 5, 6, 19, 12, 22, 23, 8, 11, 11, 15, 24, 13, 15, 10, 12, 12, 22, 12, 24, 2, 7, 12, 23, 10, 18, 12, 12, 5, 24, 12, 8, 15, 17, 24, 13, 17, 18, 9, 24, 9, 23, 14, 18, 8, 4, 10, 7, 21, 19, 17, 23, 15, 22, 22, 18, 23, 18, 18, 22, 23, 8, 23, 7, 6, 22, 4, 12, 19, 24, 18, 8, 15, 1, 23, 11, 11, 17, 12, 15, 15, 19, 8, 1, 18, 11, 25, 22, 18, 7, 19, 2, 4, 24, 13, 9, 19, 19, 10, 11, 19, 13, 2, 12, 24, 19, 17, 12, 15, 17, 25, 23, 18, 12, 12, 15, 21, 14, 15, 22, 6, 24, 15, 19, 18, 15, 24, 23, 23, 6, 7, 7, 2, 8, 9, 5, 12, 8, 7, 2, 10, 8, 14, 13, 15, 18, 3, 10, 23, 6, 8, 24, 22, 15, 2, 2, 2, 2, 10, 13, 3, 9, 13, 22, 5, 23, 8, 18, 12, 8, 13, 22, 3, 11, 13, 22, 19, 13, 2, 2, 19, 25, 18, 25, 17, 15, 9, 17, 18, 13, 24, 8, 23, 13, 13, 15, 8, 12, 19, 13, 13, 4, 22, 10, 21, 13, 24, 8, 8, 9, 11, 22, 1, 6, 15, 15, 13, 13, 22, 25, 19, 23, 12, 8, 13, 12, 24, 4, 15, 19, 10, 7, 24, 4, 17, 12, 17, 24, 13, 24, 13, 18, 22, 12, 2, 15, 12, 19, 11, 1, 23, 13, 24, 15, 13, 13, 23, 22, 13, 15, 8, 15, 12, 13, 13, 22, 11, 4, 19, 12, 18, 6, 3, 23, 4, 8, 12, 13, 7, 23, 12, 17, 22, 22, 15, 24, 11, 22, 8, 18, 22, 13, 15, 9, 7, 24, 23, 10, 5, 23, 1, 23, 9, 10, 15, 10, 25, 25, 13, 15, 14, 12, 18, 13, 11, 8, 18, 11, 12, 15, 22, 10, 25, 2, 6, 17, 14, 15, 14, 11, 12, 13, 11, 8, 24, 15, 23, 17, 10, 23, 23, 10, 17, 4, 7, 13, 3, 14, 12, 22, 19, 23, 25, 15, 21, 22, 12, 10, 1, 5, 21, 13, 15, 22, 22, 8, 18, 12, 1, 13, 23, 2, 18, 12, 4, 7, 13, 13, 24, 14, 12, 13, 13, 15, 4, 24, 13, 25, 12, 15, 24, 4, 4, 7, 18, 2, 12, 22, 15, 11, 23, 15, 15, 13, 23, 17, 1, 23, 13, 21, 19, 8, 13, 11, 15, 18, 24, 17, 22, 24, 10, 15, 18, 10, 5, 3, 17, 15, 17, 17, 23, 12, 24, 14, 12, 10, 23, 15, 12, 13, 1, 8, 17, 13, 2, 5, 19, 25, 12, 15, 13, 13, 24, 12, 17, 8, 15, 22, 2, 18, 12, 17, 18, 17, 18, 8, 12, 22, 8, 15, 19, 20, 12, 5, 17, 22, 12, 24, 7, 8, 13, 12, 7, 11, 8, 19, 15, 23, 12, 18, 19, 5, 19, 24, 19, 18, 2, 4, 7, 17, 19, 15, 8, 13, 15, 12, 23, 13, 24, 3, 11, 17, 15, 22, 15, 15, 22, 20, 24, 13, 5, 1, 19, 14, 5, 15, 18, 24, 24, 11, 22, 15, 3, 4, 9, 13, 3, 3, 23, 19, 19, 22, 17, 18, 18, 18, 7, 13, 24, 13, 5, 17, 24, 22, 24, 15, 24, 23, 5, 12, 22, 22, 19, 15, 12, 23, 24, 19, 19, 13, 15, 18, 15, 12, 3, 18, 15, 19, 3, 17, 24, 9, 8, 22, 8, 17, 8, 15, 25, 11, 19, 18, 15, 23, 14, 19, 18, 12, 19, 2, 19, 9, 14, 22, 24, 12, 14, 3, 4, 21, 19, 17, 21, 3, 9, 23, 23, 24, 15, 13, 11, 10, 12, 9, 18, 22, 24, 16, 7, 4, 24, 3, 12, 24, 18, 12, 13, 19, 18, 8, 2, 8, 9, 6, 17, 19, 2, 12, 7, 23, 17, 20, 13, 12, 24, 5, 18, 9, 13, 24, 9, 13, 18, 23, 24, 18, 22, 13, 6, 12, 9, 15, 5, 9, 13, 19, 19, 23, 3, 10, 19, 15, 3, 15, 25, 5, 12, 3, 10, 10, 13, 23, 1, 13, 22, 17, 17, 15, 8, 20, 22, 3, 5, 24, 11, 18, 17, 5, 13, 15, 24, 24, 23, 10, 23, 13, 13, 22, 22, 18, 7, 3, 10, 18, 9, 22, 2, 8, 24, 8, 3, 13, 13, 24, 12, 12, 23, 17, 23, 10, 8, 18, 22, 18, 11, 15, 15, 13, 17, 12, 25, 22, 7, 23, 24, 23, 19, 13, 23, 18, 13, 13, 13, 19, 24, 11, 12]
# 计算f
data['F(月平消费次数)']=data['付款次数']/period_month
data
总金额 订单付款时间 付款次数 R(最后一次消费时间) F(月平消费次数)
买家会员名
00牛哥哥00 402.00 2017-02-06 2 693 days 1.000000
020luo 74.70 2017-11-18 1 408 days 0.090909
0587xueguangju 268.00 2017-04-14 1 626 days 0.250000
0o秋天de童话 411.50 2018-10-09 2 83 days 0.090909
0残缺0 48.86 2018-01-19 1 346 days 0.076923
... ... ... ... ... ...
黑河市2013 47.88 2018-01-11 1 354 days 0.076923
黑瑾瞳 158.44 2018-07-26 2 158 days 0.105263
鼠标右键点 51.87 2018-12-12 1 19 days 0.041667
龙星宇1018 198.00 2017-11-17 1 409 days 0.090909
龙魂爱上凤灵 43.86 2017-12-13 1 383 days 0.083333

1483 rows × 5 columns

(8)更改M为列名,对数据进行标准化

data['m(月平均消费金额)']=data['总金额']/period_month
data
总金额 订单付款时间 付款次数 R(最后一次消费时间) F(月平消费次数) m(月平均消费金额)
买家会员名
00牛哥哥00 402.00 2017-02-06 2 693 days 1.000000 201.000000
020luo 74.70 2017-11-18 1 408 days 0.090909 6.790909
0587xueguangju 268.00 2017-04-14 1 626 days 0.250000 67.000000
0o秋天de童话 411.50 2018-10-09 2 83 days 0.090909 18.704545
0残缺0 48.86 2018-01-19 1 346 days 0.076923 3.758462
... ... ... ... ... ... ...
黑河市2013 47.88 2018-01-11 1 354 days 0.076923 3.683077
黑瑾瞳 158.44 2018-07-26 2 158 days 0.105263 8.338947
鼠标右键点 51.87 2018-12-12 1 19 days 0.041667 2.161250
龙星宇1018 198.00 2017-11-17 1 409 days 0.090909 18.000000
龙魂爱上凤灵 43.86 2017-12-13 1 383 days 0.083333 3.655000

1483 rows × 6 columns

# 标准化
cdata=data[['R(最后一次消费时间)','F(月平消费次数)','m(月平均消费金额)']]
# 修改索引
cdata.index = data.index
cdata
R(最后一次消费时间) F(月平消费次数) m(月平均消费金额)
买家会员名
00牛哥哥00 693 days 1.000000 201.000000
020luo 408 days 0.090909 6.790909
0587xueguangju 626 days 0.250000 67.000000
0o秋天de童话 83 days 0.090909 18.704545
0残缺0 346 days 0.076923 3.758462
... ... ... ...
黑河市2013 354 days 0.076923 3.683077
黑瑾瞳 158 days 0.105263 8.338947
鼠标右键点 19 days 0.041667 2.161250
龙星宇1018 409 days 0.090909 18.000000
龙魂爱上凤灵 383 days 0.083333 3.655000

1483 rows × 3 columns

z_cdata=(cdata-cdata.mean())/cdata.std()
#重命名列名
z_cdata.columns=['R(标准化)','F(标准化)','m(标准化)']
z_cdata
R(标准化) F(标准化) m(标准化)
买家会员名
00牛哥哥00 1.926851 4.432309 2.781167
020luo 0.469973 -0.211456 -0.304766
0587xueguangju 1.584357 0.601203 0.651941
0o秋天de童话 -1.191378 -0.211456 -0.115461
0残缺0 0.153038 -0.282899 -0.352951
... ... ... ...
黑河市2013 0.193933 -0.282899 -0.354149
黑瑾瞳 -0.807990 -0.138134 -0.280168
鼠标右键点 -1.518537 -0.462993 -0.378330
龙星宇1018 0.475085 -0.211456 -0.126656
龙魂爱上凤灵 0.342177 -0.250154 -0.354595

1483 rows × 3 columns

(9)存储预处理后的文件

data.to_csv('/data/bigfiles/client.csv')

2、数据分析

(1)读取预处理后的文件

data=pd.read('/data/bigfiles/client.csv')

(2)利用肘部法确定k的值(图像展示)


# 用SSE来记录每次聚集类后样本到中心的欧式距离
SSE=[]
# 分别聚类为1~9个类别
for k in range(1,9):
    estimator =KMeans(n_clusters=k)
    estimator.fit(z_cdata)
# 样本到最近聚类中心的距离平方之和
    SSE.append(estimator.inertia_)
#设置x轴数据
X=range(1,9)
#设置字体
plt.rcParams['font.sans-serif']=['SimHei']
#开始绘图
plt.plot(X,SSE,'o-')
plt.xlabel('k')
plt.ylabel('SSE')
plt.title("肘部图")
plt.show()

png



(3)建立KMeans模型

# 聚类分析
kmodel=KMeans(n_clusters=4,n_init=4,max_iter=100,random_state = 0)
kmodel.fit(z_cdata)
KMeans(max_iter=100, n_clusters=4, n_init=4, random_state=0)

(4)输出各个簇的质心

#查看每条数据所属的聚类类别 
kmodel.labels_
#查看聚类中心坐标
kmodel.cluster_centers_
array([[ 1.57670505,  1.17239812,  0.98112868],
       [-1.03013307, -0.37085365, -0.28728299],
       [ 0.43389504, -0.13530733, -0.18934963],
       [ 1.70098269,  4.71659247,  5.44718135]])

(5)存储客户类型文件

# 统计所属各个类别的数据个数
r1=pd.Series(kmodel.labels_).value_counts()
r2=pd.DataFrame(kmodel.cluster_centers_)
# 连接labels_与z_cdata
result=pd.concat([r2,r1],axis=1)
#重命名列名
result.columns=['R','F','M']+['类别']
result
R F M 类别
0 1.576705 1.172398 0.981129 157
1 -1.030133 -0.370854 -0.287283 587
2 0.433895 -0.135307 -0.189350 712
3 1.700983 4.716592 5.447181 27
# 连接labels_与z_cdata
KM_data=pd.concat([z_cdata,pd.Series(kmodel.labels_,index=z_cdata.index)],axis=1)
data1=pd.concat([data,pd.Series(kmodel.labels_,index=data.index)],axis=1)
#重命名列名
data1.columns=list(data.columns)+['类别']
KM_data.columns=['R','F','M']+['类别']
KM_data.head()
#买家会员名列与类名标签对应
KM_data['买家会员名']=KM_data.index

3、数据可视化(对每个类型客户标准化后的R、F、M数据分别进行图像展示)

# 分组统计求均值
kmeans_analysis =KM_data.groupby(KM_data['类别']).mean()
#重命名列名
kmeans_analysis.columns=['R','F','M']
kmeans_analysis
R F M
类别
0 1.580417 1.183741 0.988757
1 -1.030133 -0.370854 -0.287283
2 0.436287 -0.134135 -0.187744
3 1.700983 4.716592 5.447181
#绘制柱状图
kmeans_analysis.plot(kind ='bar',rot=0,yticks=range(-1,9))
#完善图表
plt.title("聚类结果统计柱状图")
plt.xticks(range(0,4),['第0类','第1类','第2类','第3类'])
plt.grid(axis='y',color='grey',linestyle='--',alpha=0.5)
plt.ylabel("R,F,M 3个指标均值")
plt.savefig("聚类结果统计柱状图",dpi=128)

png

4、分析评价

实验总结:
通过本次实验,我们学习了如何使用numpy和pandas库处理数据,掌握了使用RFM分析模型对客户信息进行特征提取的方法。
同时,我们还学会了如何对特征数据进行标准化处理,以及使用Sklearn库实现K-Means聚类算法及其评价方法。
最后,我们利用matplotlib结合pandas库对数据分析进行了可视化处理。
在实验过程中,我们首先使用pandas等库完成了数据的预处理,计算了R、F、M三个特征指标,并将处理好的文件进行了保存。
接着,我们使用pandas等库完成了数据的标准化处理。然后,我们利用Sklearn库和RFM分析方法建立了聚类模型,完成了对客户价值的聚类分析,
并对聚类结果进行了评价。最后,我们结合pandas、matplotlib库对聚类完成的结果进行了可视化处理。
通过本次实验,我们对客户价值分析有了更深入的了解,掌握了相关的数据处理和分析方法,为今后的数据分析工作打下了坚实的基础。

标签:分析,24,13,12,15,22,23,价值观,客户
From: https://www.cnblogs.com/Xqiao/p/18240317

相关文章

  • 生产实习Day4 ---- 电商日志数据分析(问题1--统计页面浏览量(每行记录就是一次浏览))
    文章目录项目需求整体架构流程数据集实验步骤代码WebLogPVMapper.javaWebLogPvReducer.javaWebLogPVMapReduce.java代码细节WebLogPVMapper.java详细解释WebLogPvReducer.java详细解释WebLogPVMapReduce.java详细解释项目需求根据电商日志文件,分析:统计页面浏览......
  • base上海,数据科学,数据挖掘,数据分析等岗位求收留
    裁员了,base上海,数据科学,数据挖掘,数据分析等岗位,期望30k~40k,求推荐求收留1,6年数据算法工作,做过指标体系搭建,用户画像,货品定价,社区分析,销量预测,车货匹配,运筹优化等项目;2,熟悉回归,分类,聚类等机器学习算法,熟练掌握python,MySQL和Clickhouse等数据库,Hadoop大数据生态,Pytorch深度......
  • nanoDLA逻辑分析仪上手教程
    前言最近调试NXPFRDM-MCXN947开发板,发现它的硬件i2c接口读取的传感器数据老是不对,排查了硬件电路也发现不了啥问题;于是乎想到用逻辑分析仪试一下,果然很快定位到问题所在;还是那句话,用对的工具做对的事情,别浪费时间!这篇文章主要关于逻辑分析仪的使用教程介绍nanoDLA是MuseLab推......
  • Day4—电商日志数据分析
    项目要求:根据电商日志文件,分析:1.统计页面浏览量(每行记录就是一次浏览)2.统计各个省份的浏览量(需要解析IP)3.日志的ETL操作(ETL:数据从来源端经过抽取(Extract)、转换(Transform)、加载(Load)至目的端的过程)为什么要ETL:没有必要解析出所有数据,只需要解析出有价值的字段即可。......
  • python-数据分析-Pandas-2、DataFrame对象
    如果使用pandas做数据分析,那么DataFrame一定是被使用得最多的类型,它可以用来保存和处理异质的二维数据。这里所谓的“异质”是指DataFrame中每个列的数据类型不需要相同,这也是它区别于NumPy二维数组的地方。DataFrame提供了极为丰富的属性和方法,帮助我们实现对数据的重塑、......
  • TypeScript算法每日一题:最富有客户的资产总量(1672)
    作者:前端小王hs阿里云社区博客专家/清华大学出版社签约作者✍/CSDN百万访问博主/B站千粉前端up主题库:力扣题目序号:1672(简单)题目:最富有客户的资产总量给你一个mxn的整数网格accounts,其中accounts[i][j]是第i​​​​​​​​​​​​位客户在第j家银行托管的资产数......
  • jmeter性能优化之mysql监控sql慢查询语句分析
    接上次博客:基础配置多用户登录并退出jmx文件:百度网盘提取码:0000一、练习jmeter脚本检测mysql慢查询随意找一个脚本(多用户登录并退出),并发数设置300、500后分别查看mysql监控平台启动后查看,主要查看mysql连接情况下图查看:MaxUsedConnections最大176,分析查看:设置......
  • glibc中的localtime方法源码分析
    localtime方法会加锁,当TZ环境变量为空或者变更时,还会读取文件,还有个问题就是这个方法返回的指针是一个全局变量,可以使用redis无锁的localtime方法来优化这个性能。localtime方法调用链:localtime->__localtime64->__tz_convert(加锁、调用tzset_internal方法解释TZ环境变量,如果......
  • mm-qcamera-daemon主函数分析
    目录main函数核心main函数核心  main函数的主要任务包含在一个do{}while(1)循环中.while循环中主要是监听文件描述符,故mai函数是由文件的读写来进行驱动的。  所有的文件描述符被封装成结构体read_fd_info_t.其定义如下:/**read_fd_info_t *@type--ei......
  • seaborn常用的10种数据分析图表
    内置示例数据集seaborn内置了十几个示例数据集,通过load_dataset函数可以调用。其中包括常见的泰坦尼克、鸢尾花等经典数据集。#查看数据集种类importseabornassnssns.get_dataset_names()importseabornassns#导出鸢尾花数据集data=sns.load_dataset('ir......