首页 > 其他分享 >4.2-数据清洗

4.2-数据清洗

时间:2023-10-23 14:56:24浏览次数:26  
标签:... No 4.2 NaN check 7043 清洗 数据 Yes

4.2-数据清洗   In [ ]:

import pandas as pd
  In [ ]:
# 4.2.1 数据导入、预览: read_csv、head
data = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")
data
  Out[ ]:  
 customerIDgenderSeniorCitizenPartnerDependentstenurePhoneServiceMultipleLinesInternetServiceOnlineSecurity...DeviceProtectionTechSupportStreamingTVStreamingMoviesContractPaperlessBillingPaymentMethodMonthlyChargesTotalChargesChurn
0 7590-VHVEG Female 0 Yes No 1 No No phone service DSL No ... No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes ... Yes No No No One year No Mailed check 56.95 1889.5 No
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes ... No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 7795-CFOCW Male 0 No No 45 No No phone service DSL Yes ... Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 9237-HQITU Female 0 No No 2 Yes No Fiber optic No ... No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 6840-RESVB Male 0 Yes Yes 24 Yes Yes DSL Yes ... Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No
7039 2234-XADUH Female 0 Yes Yes 72 Yes Yes Fiber optic No ... Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No
7040 4801-JZAZL Female 0 Yes Yes 11 No No phone service DSL Yes ... No No No No Month-to-month Yes Electronic check 29.60 346.45 No
7041 8361-LTMKD Male 1 Yes No 4 Yes Yes Fiber optic No ... No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes
7042 3186-AJIEK Male 0 No No 66 Yes No Fiber optic Yes ... Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No

7043 rows × 21 columns

  In [ ]:
# 4.2.2 数据结构、列标签: shape、columns
data.shape
  Out[ ]:
(7043, 21)
  In [ ]:
data.columns
  Out[ ]:
Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')
  In [ ]:
data.columns = ['用户ID', '性别', '是否老人', '是否有伴侣', '是否有孩子',
       '合同期限', '通话服务', '多线程', '网络服务',
       '在线安全', '在线备份', '设备安全', '技术支持',
       '流媒体电视', '流媒体电影', '合同类型', '电子账单',
       '支付方式', '月消费', '总消费', '是否流失']      # 设置字段名为中文
data
  Out[ ]:  
 用户ID性别是否老人是否有伴侣是否有孩子合同期限通话服务多线程网络服务在线安全...设备安全技术支持流媒体电视流媒体电影合同类型电子账单支付方式月消费总消费是否流失
0 7590-VHVEG Female 0 Yes No 1 No No phone service DSL No ... No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes ... Yes No No No One year No Mailed check 56.95 1889.5 No
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes ... No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 7795-CFOCW Male 0 No No 45 No No phone service DSL Yes ... Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 9237-HQITU Female 0 No No 2 Yes No Fiber optic No ... No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 6840-RESVB Male 0 Yes Yes 24 Yes Yes DSL Yes ... Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No
7039 2234-XADUH Female 0 Yes Yes 72 Yes Yes Fiber optic No ... Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No
7040 4801-JZAZL Female 0 Yes Yes 11 No No phone service DSL Yes ... No No No No Month-to-month Yes Electronic check 29.60 346.45 No
7041 8361-LTMKD Male 1 Yes No 4 Yes Yes Fiber optic No ... No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes
7042 3186-AJIEK Male 0 No No 66 Yes No Fiber optic Yes ... Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No

7043 rows × 21 columns

  In [ ]:
# 4.2.3 汇总统计(数量、分布指标):describe
data.describe(include='all')
  Out[ ]:  
 用户ID性别是否老人是否有伴侣是否有孩子合同期限通话服务多线程网络服务在线安全...设备安全技术支持流媒体电视流媒体电影合同类型电子账单支付方式月消费总消费是否流失
count 7043 7043 7043.000000 7043 7043 7043.000000 7043 7043 7043 7043 ... 7043 7043 7043 7043 7043 7043 7043 7043.000000 7043 7043
unique 7043 2 NaN 2 2 NaN 2 3 3 3 ... 3 3 3 3 3 2 4 NaN 6531 2
top 7590-VHVEG Male NaN No No NaN Yes No Fiber optic No ... No No No No Month-to-month Yes Electronic check NaN   No
freq 1 3555 NaN 3641 4933 NaN 6361 3390 3096 3498 ... 3095 3473 2810 2785 3875 4171 2365 NaN 11 5174
mean NaN NaN 0.162147 NaN NaN 32.371149 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 64.761692 NaN NaN
std NaN NaN 0.368612 NaN NaN 24.559481 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 30.090047 NaN NaN
min NaN NaN 0.000000 NaN NaN 0.000000 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 18.250000 NaN NaN
25% NaN NaN 0.000000 NaN NaN 9.000000 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 35.500000 NaN NaN
50% NaN NaN 0.000000 NaN NaN 29.000000 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 70.350000 NaN NaN
75% NaN NaN 0.000000 NaN NaN 55.000000 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 89.850000 NaN NaN
max NaN NaN 1.000000 NaN NaN 72.000000 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 118.750000 NaN NaN

11 rows × 21 columns

  In [ ]:
# 4.2.4 缺失值查询与处理:isnull.sum、dropnan
data.isnull
  Out[ ]:
<bound method DataFrame.isnull of             用户ID      性别  是否老人 是否有伴侣 是否有孩子  合同期限 通话服务               多线程  \
0     7590-VHVEG  Female     0   Yes    No     1   No  No phone service   
1     5575-GNVDE    Male     0    No    No    34  Yes                No   
2     3668-QPYBK    Male     0    No    No     2  Yes                No   
3     7795-CFOCW    Male     0    No    No    45   No  No phone service   
4     9237-HQITU  Female     0    No    No     2  Yes                No   
...          ...     ...   ...   ...   ...   ...  ...               ...   
7038  6840-RESVB    Male     0   Yes   Yes    24  Yes               Yes   
7039  2234-XADUH  Female     0   Yes   Yes    72  Yes               Yes   
7040  4801-JZAZL  Female     0   Yes   Yes    11   No  No phone service   
7041  8361-LTMKD    Male     1   Yes    No     4  Yes               Yes   
7042  3186-AJIEK    Male     0    No    No    66  Yes                No   

             网络服务 在线安全  ... 设备安全 技术支持 流媒体电视 流媒体电影            合同类型 电子账单  \
0             DSL   No  ...   No   No    No    No  Month-to-month  Yes   
1             DSL  Yes  ...  Yes   No    No    No        One year   No   
2             DSL  Yes  ...   No   No    No    No  Month-to-month  Yes   
3             DSL  Yes  ...  Yes  Yes    No    No        One year   No   
4     Fiber optic   No  ...   No   No    No    No  Month-to-month  Yes   
...           ...  ...  ...  ...  ...   ...   ...             ...  ...   
7038          DSL  Yes  ...  Yes  Yes   Yes   Yes        One year  Yes   
7039  Fiber optic   No  ...  Yes   No   Yes   Yes        One year  Yes   
7040          DSL  Yes  ...   No   No    No    No  Month-to-month  Yes   
7041  Fiber optic   No  ...   No   No    No    No  Month-to-month  Yes   
7042  Fiber optic  Yes  ...  Yes  Yes   Yes   Yes        Two year  Yes   

                           支付方式     月消费      总消费 是否流失  
0              Electronic check   29.85    29.85   No  
1                  Mailed check   56.95   1889.5   No  
2                  Mailed check   53.85   108.15  Yes  
3     Bank transfer (automatic)   42.30  1840.75   No  
4              Electronic check   70.70   151.65  Yes  
...                         ...     ...      ...  ...  
7038               Mailed check   84.80   1990.5   No  
7039    Credit card (automatic)  103.20   7362.9   No  
7040           Electronic check   29.60   346.45   No  
7041               Mailed check   74.40    306.6  Yes  
7042  Bank transfer (automatic)  105.65   6844.5   No  

[7043 rows x 21 columns]>
  In [ ]:
data.isnull().sum()
  Out[ ]:
用户ID     0
性别       0
是否老人     0
是否有伴侣    0
是否有孩子    0
合同期限     0
通话服务     0
多线程      0
网络服务     0
在线安全     0
在线备份     0
设备安全     0
技术支持     0
流媒体电视    0
流媒体电影    0
合同类型     0
电子账单     0
支付方式     0
月消费      0
总消费      0
是否流失     0
dtype: int64
  In [ ]:
# 4.2.5 重复值查询与处理:duplicated、drop_duplicated
data.duplicated().sum()
  Out[ ]:
0

标签:...,No,4.2,NaN,check,7043,清洗,数据,Yes
From: https://www.cnblogs.com/mlzxdzl/p/17782417.html

相关文章

  • 如何将正常数据转为树结构
    listToTree(list){constresult=[]//用于存放结果constmap={}//用于存放list下的节点//1.遍历list,将list下的所有节点以id作为索引存入mapfor(constitemoflist){map[item.id]={...item}//浅拷贝}//2.再次遍历,将......
  • 使用JpaRepository的save方法执行成功,数据库却没有保存
    使用JpaRepository的save方法执行成功,数据库却没有保存可能是和事务有关的,这里用JpaRepository的flush方法,就可以了@TestvoidtestUserRespositorySave(){Useruser=newUser("小明","123456",18);userRespository.save(user);userRespository.flush();}原......
  • JPA查询修改数据,但是未保存到数据库,数据库却修改了,因为对查询出的Entity实体对象,修改s
    JPA查询修改数据,但是未保存到数据库,数据库却修改了,因为对查询出的Entity实体对象,修改set了属性。导致事务提交时候修改了数据库的数据使用JPA查询数据,对查询出来的结果进行修改,但是不保存数据库,最终数据库却做了同样的修改。JPA对象的四种状态瞬时状态:瞬时状态的实体就是一......
  • 软件设计之C/S结构连接数据库实现登录功能
    界面展示具体代码//DBUtil.javapackageorg.example;importjava.sql.Connection;importjava.sql.DriverManager;importjava.sql.SQLException;publicclassDBUtil{publicstaticConnectiongetConnection()throwsClassNotFoundException,SQLException......
  • mysql数据库类型有哪些
    mysql数据库类型有哪些mysql数据库类型有:1、整数类型;2、浮点数类型;3、定点数类型;4、位类型BIT;5、日期与时间类型;6、文本字符串类型;7、ENUM类型;8、SET类型;9、二进制字符串类型;10、JSON类型;11、空间类型。其中,整数类型一共有5种。1、整数类型整数类型一共有5种,包括TINYI......
  • Oracle数据库表空间和角色/用户 权限
    问题1.2.3.https://www.iteye.com/blog/czmmiao-1304934这个特别好4.5.6.https://www.51cto.com/article/158937.html表空间7.创建用户8.表空间9.oracle体系结构详解10.https://zhuanlan.zhihu.com/p/100390025实例、表空间、用户之间的关系11.https://docs.oracle......
  • PowerBuilder读取硬盘图片显示出来并保存到数据库中
    PowerBuilder读取硬盘图片显示出来并保存到数据库中 数据库是SQLSERVER 表:blobtab 列: id,int,主键自增 blobdata,image,二进制内容可空 注:PB中存储二进制数据和读取二进制数据只能用updateblob语句和selectblob语句  显示图片到控件里的代码://显......
  • BLE低功耗蓝牙数据包结构以及BLE流程分析
    来源: https://mp.weixin.qq.com/s/5z6KmAY_n8X8hED4eC3M-g 摘要本文没有按部就班分析蓝牙协议,而是采用循序渐进的方式,力争通过BLEPDU来分析BLE协议和BLE流程,以便在嵌入式开发和移动应用开发中,能熟悉BLE协议以及够理解这些平台中的high-level的API,特别是当想进一步深入了......
  • 通过pandas读取excel数据,很多数据开头带有'特殊字符,如何处理?
    大家好,我是皮皮。一、前言前几天在Python最强王者群【wen】问了一个Pandas数据处理的问题,一起来看看吧。请教问题:通过pandas读取excle数据,很多数据开头带有'特殊字符,我用replace或者strip()函数处理均无法处理。......
  • Pandas在合并数据的时候,发现部分数据缺失,该怎么解决?
    大家好,我是皮皮。一、前言前几天在Python最强王者群【wen】问了一个Pandas数据合并的问题,一起来看看吧。请教:对两个exlce表示进行合并,df=pd.merge(df1,df2,on="用户账号",how='left'),但是由于系统数据的原因,df1表格的“用户账户”缺少最后两位数,而df2中的“用户账户”是准确的,通过......