我想在Python中基于多个相关数据数组和多个独立数据执行多元线性回归。
我见过很多多重线性回归,具有多个独立输入,几乎每个人都认为多重=多元,但事实并非如此。我在互联网上看不到任何真正的多元教程。我想要的是多个输出+多个输入。
from pandas import DataFrame
from sklearn import linear_model
import tkinter as tk
import statsmodels.api as sm
Stock_Market = {'Year': [2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018],
'Agriculture': [1, 0.8965517282485962, 0.4350132942199707, 0.5384615659713745, 1.1071428582072258, 0.1071428582072258, 0.1290322244167328, -0.07096776366233826, -0.37857140600681305, -0.439440980553627, -0.2020460031926632, -0.16339869424700737, 2.277777746319771],
'Demand_risk':[1,0.015701416,0.638652235,0.744531459,0.630988038,0.787568771,1.796302615,1.708789548,1.897916832,1.643077606,1.579785002,2.444568612,2.626896547],
'International_risk':[1,1.609574468,1.225836431,1.30566937,1.771415837,1.737162303,2.156292933,2.365513975,2.502820771,2.660719511,2.468833192,2.624733983,2.577283326],
'Production_risk': [1,0.76346912,1.421097464,1.423616355,1.434009229,1.307186577,1.378837063,1.3577073,1.744395371,1.744281735,1.559044776,1.570226289,1.116485043],
'Technology_risk': [1,1.029845201,1.042711964,1.053634438,1.038367263,0.659816279,0.90179752,1.448686704,1.836091216,1.644680334,1.413661748,1.089683923,1.191047799]
}
df = DataFrame(Stock_Market,columns=['Year','Agriculture','Demand_risk','International_risk','Production_risk', 'Technology_risk'])
X = df[['Demand_risk','International_risk','Production_risk', 'Technology_risk']] # here we have 2 input variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Year', 'Agriculture'] # output variable (what we are trying to predict)
# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
# compute with statsmodels, by adding intercept manually
import statsmodels.api as sm
X1 = sm.add_constant(X)
result = sm.OLS(Y, X1).fit()
#print dir(result)
print (result.rsquared, result.rsquared_adj)
我想更改输出变量 Y,以便它可以处理多个数组,而不仅仅是单个数组(现在它会抛出错误)。
可以试试这样做。正试图一次预测多个目标变量。为此,在 scikit-learn 中,可以使用
LinearRegression
或其他回归器。以下是使用数据的示例:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
Stock_Market = {'Year': [2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018],
'Agriculture': [1, 0.8965517282485962, 0.4350132942199707, 0.5384615659713745, 1.1071428582072258, 0.1071428582072258, 0.1290322244167328, -0.07096776366233826, -0.37857140600681305, -0.439440980553627, -0.2020460031926632, -0.16339869424700737, 2.277777746319771],
'Demand_risk':[1,0.015701416,0.638652235,0.744531459,0.630988038,0.787568771,1.796302615,1.708789548,1.897916832,1.643077606,1.579785002,2.444568612,2.626896547],
'International_risk':[1,1.609574468,1.225836431,1.30566937,1.771415837,1.737162303,2.156292933,2.365513975,2.502820771,2.660719511,2.468833192,2.624733983,2.577283326],
'Production_risk': [1,0.76346912,1.421097464,1.423616355,1.434009229,1.307186577,1.378837063,1.3577073,1.744395371,1.744281735,1.559044776,1.570226289,1.116485043],
'Technology_risk': [1,1.029845201,1.042711964,1.053634438,1.038367263,0.659816279,0.90179752,1.448686704,1.836091216,1.644680334,1.413661748,1.089683923,1.191047799]
}
df = pd.DataFrame(Stock_Market)
# 选择所有风险列作为特征
X = df[['Demand_risk', 'International_risk', 'Production_risk', 'Technology_risk']]
# 选择年份和农业作为目标变量
Y = df[['Year', 'Agriculture']]
# 创建一个线性回归模型
model = LinearRegression()
# 使用数据训练模型
model.fit(X, Y)
# 进行预测
predictions = model.predict(X)
# 打印预测值
print(predictions)
# 评估模型
mse = mean_squared_error(Y, predictions)
print(f"Mean Squared Error: {mse}")
这将创建一个多元回归模型,该模型尝试预测作为输入给定的风险因素的年份和农业值。然后,该代码使用相同的训练数据进行预测并打印预测值。请注意,此示例中不执行训练集和测试集的拆分,但在实际应用中建议执行以进行可靠的模型评估。
此外,可以通过查看诸如 R 平方值之类的指标来评估模型的性能。更高的 R 平方值表示模型的拟合度更高。
标签:python,regression,multivariate-testing From: 57813536