首页 > 其他分享 >数据挖掘2

数据挖掘2

时间:2023-03-05 14:22:21浏览次数:39  
标签:loc np 数据挖掘 new csv data reg

1、

import numpy as np
import pandas as pd

inputfile = 'D:/WeixinWenjian/WeChat Files/wxid_5onnacvxxvpj22/FileStorage/File/2023-03/data.csv'
data = pd.read_csv(inputfile)

description = [data.min(),data.max(),data.mean(),data.std()]

description = pd.DataFrame(description,index = ['Min','Max','Mean','STD']).T
print('描述性统计结果:\n',np.round(description,2))

corr = data.corr(method='pearson')
print('相关系数矩阵为:\n',np.round(corr,2))

import matplotlib.pyplot as plt
import seaborn as sns
plt.subplots(figsize=(10,10))
sns.heatmap(corr,annot=True,vmax=1,square=True,cmap="Blues")
plt.rcParams['font.sans-serif'] = 'SimHei'
plt.title('相关性热力图——3023')
plt.show()
plt.close

 

 

2、

import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

inputfile = 'D:/WeixinWenjian/WeChat Files/wxid_5onnacvxxvpj22/FileStorage/File/2023-03/data.csv'
data = pd.read_csv(inputfile)
lasso = Lasso(1000)
lasso.fit(data.iloc[:,0:13],data['y'])
print('相关系数为:',np.round(lasso.coef_,5))

print('相关系数非零个数为:',np.sum(lasso.coef_ !=0))
mask = lasso.coef_ != 0
print('相关系数是否为零:',mask)
mask = np.append(mask, True)


outputfile = 'D:/WeixinWenjian/WeChat Files/wxid_5onnacvxxvpj22/FileStorage/File/2023-03/data2.csv'
new_reg_data = data.iloc[:,mask]
new_reg_data.to_csv(outputfile)
print('输出数据的维度为:',new_reg_data.shape)

 

 

3、

import sys
sys.path.append('D:/WeixinWenjian/WeChat Files/wxid_5onnacvxxvpj22/FileStorage/File/2023-03')
import numpy as np
import pandas as pd
from GM11 import GM11

inputfile1 = 'D:/WeixinWenjian/WeChat Files/wxid_5onnacvxxvpj22/FileStorage/File/2023-03/data2.csv'
inputfile2 = 'D:/WeixinWenjian/WeChat Files/wxid_5onnacvxxvpj22/FileStorage/File/2023-03/data.csv'
new_reg_data = pd.read_csv(inputfile1)
data = pd.read_csv(inputfile2)
new_reg_data.index = range(1994, 2014)
new_reg_data.loc[2014] = None
new_reg_data.loc[2015] = None
new_reg_data.loc[2016] = None

l = ['x1', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x13']
for i in l:
f = GM11(new_reg_data.loc[range(1994, 2014),i].to_numpy())[0]
new_reg_data.loc[2014,i] = f(len(new_reg_data)-2)
new_reg_data.loc[2015,i] = f(len(new_reg_data)-1)
new_reg_data.loc[2016,i] = f(len(new_reg_data))
new_reg_data[i] = new_reg_data[i].round(2)

outputfile = 'D:/WeixinWenjian/WeChat Files/wxid_5onnacvxxvpj22/FileStorage/File/2023-03/data_GM21.xls'
y = list(data['y'].values)
y.extend([np.nan,np.nan,np.nan])
new_reg_data['y'] = y

new_reg_data.to_excel(outputfile)
print('预测结果为:\n',new_reg_data.loc[2014:2016,:])

 

标签:loc,np,数据挖掘,new,csv,data,reg
From: https://www.cnblogs.com/clef-xc/p/17180393.html

相关文章

  • 数据挖掘(1)--基础知识学习
    前言自20世纪90年代以来,随着数据库技术应用的普及,数据挖掘( Data Mining )技术已经引起了学术界、产业界的极大关注,其主要原因是当前各个单位已经存储了超大规模,即海量规模......
  • 大数据挖掘-python基本绘图函数学习
    1-plot绘制线型图plot是python中最基本的绘制二维线性折线图的函数基本使用方式:plt.plot(x,y,s)代码实现:importmatplotlib.pyplotaspltimportnumpyasnpimport......
  • python数据挖掘绘图
                                                         ......
  • 数据挖掘python 画各类图
    ##-*-coding:utf-8-*-#代码3-1使用describe()方法即可查看数据的基本情况importpandasaspdcatering_sale='D://人工智能//catering_sale.xls'#餐饮数据......
  • 数据挖掘2023.2.26
    #代码3-3捞起生鱼片的季度销售情况importpandasaspdimportnumpyasnpcatering_sale='D:/develop/Spider/data/catering_fish_congee.xls'#餐饮数据data=pd.r......
  • 数据挖掘python画各类图
    1importpandasaspd2importnumpyasnp3catering_sale='D:\data\catering_fish_congee(1).xls'#餐饮数据4data=pd.read_excel(catering_sale,names=......
  • python数据挖掘绘图
    importpandasaspdcatering_sale=(r'D:\数据挖掘\catering_sale.xls')data=pd.read_excel(catering_sale,index_col='日期')print(data.describe())   importma......
  • 数据挖掘
    饼图:importpandasaspdimportmatplotlib.pyplotaspltcatering_dish_profit="D:\大三下\数据分析\data\catering_dish_profit.xls"data=pd.read_excel(catering_dish_......
  • 易基因|DNA甲基化研究的测序数据挖掘思路:干货分享
    大家好,这里是专注表观组学十余年,领跑多组学科研服务的易基因。总体来说,DNA甲基化一般遵循三个步骤进行数据挖掘。首先,进行整体全基因组甲基化变化的分析,包括平均甲基......
  • 数据挖掘基本概念
    数据挖掘定义值对数据进行收集,清洗,加工和分析并从中获取有用知识的过程。数据挖掘过程数据采集使用像传感器网络这样的专门硬件,手工录入的用户调查,或者如同Web爬虫工具......