whystea1

标签：0.8 df dropna isnull whystea1 缺失 axis

更改顺序 insert、pop df.insert(0,'a',df.pop('a')) 分组排序 df['班级Python成绩排名'] = df.groupby('班级')['Python成绩'].rank(method='min', ascending=False) 离散化 df['class']=df.cut(df['data'],q=2,label=['一类','二类']) df['class']=df.cut(df['data'],bins=[1,5,10],label=['一类','二类']) df['class']=df.qcut(df['data'],q=2,label=['一类','二类']) 分组-by的列不做index df.groupby(by=,groupkey=False) 默认为true 需验证是通过groupkey 还是reset_index 缺失率 #列 #缺失率 df.isnull().mean() #缺失个数 df.isnull().sum() #缺失率高于0.8的列个数 (df.isnull().mean()>0.8).sum() #行 #缺失率 df.isnull().mean(axis=1) #缺失个数 df.isnull().sum(axis=1) #缺失率高于0.8的样本个数 (df.isnull().mean(axis=1)>0.8).sum() 删除缺失数据 #行 #删除全部缺失的行记录 df.dropna(how='all',inplace=Ture) #删除存在缺失的行记录 df.dropna(how='any',inplace=Ture) #删除缺失率高于20%的行记录 -缺失率高于20%不要 -完好数据高于80%的要 df.dropna(thresh=0.8*df.shape[1])) #删除A和B列存在缺失的行记录 df.dropna(subset=['A','B'],how='any') #列 axis=1 #删除全部缺失的列记录 df.dropna(how='all',inplace=Ture,axis=1) #删除存在缺失的列记录 df.dropna(how='any',inplace=Ture,axis=1) #删除缺失率高于20%的列记录 -缺失率高于20%不要 -完好数据高于80%的要 df.dropna(thresh=0.8*df.shape[0],axis=1) #删除index为5、6、7存在缺失的列记录 df.dropna(subset=[5,6,7],how='any',inplace=True) 参看数据分布 df.hist() plt.tight_layout() plt.show() 更改数据类型 df['A']=df['A'].astype(float) 流程去除缺失-> 缺失填补->去除异常->归一化->挑选特征/降维->离散化设置数据格式 pd.set_option('display.precision', 2) df['A'] = df['A'].map('{:.2f}'.format) df = df.round(2) df_str = df.to_string(index=False)

标签：0.8,df,dropna,isnull,whystea1,缺失,axis
From： https://www.cnblogs.com/whystea/p/18328509

相关文章

赞助商

阅读排行