首页 > 其他分享 >Linear Regression NEKN96

Linear Regression NEKN96

时间:2024-09-19 15:24:26浏览次数:1  
标签:Linear random eps Regression add OLS NEKN96 np regression

Homework Assignment 1

Linear Regression NEKN96Andreas Johansson✯Ioannis Tzoumas❸

Guidelines

  1. Upload the HWA in .zip format to Canvas before the 2nd of October, 23:59, and onlyupload one HWA for each group. The .zip file should contain two parts:- A report in .pdf format, which will be corrected.- The code you used to create the output/estimates for the report. The code itself willnot be graded/corrected and is only required to confirm your work. The easiest is to addthe whole project folder you used to the zip file.1 However, if you have used online tools,sharing a link to your work is also fine.2
  1. The assignment should be done in groups of 3-4 people, pick groups atCanvas People Groups.3
  1. Double-check that each group member’s name and ID number are included in the .pdf file.
  2. To receive your final grade on the course, a PASS is required on this HWA.

- If a revision is required, the comments must be addressed, and an updated version shouldbe mailed to [email protected]. However, you are only guaranteed an additionalevaluation of the assignment in connection to an examination period.4You will have a lot of flexibility in how you want to solve each part of the assignment, and all thingsthat are required to get a PASS are denoted in bullet points:

❼ . . .Beware, some things require a lot of work, but you should still only include the final table or figureand not all intermediary steps. If uncertain, add a sentence or two about how you reached yourconclusions, but do not add supplementary material. Only include the tables/figures explicitly askedfor in the bullet points.Good Luck!✯[email protected]

[email protected] uploading the code, copy-paste the project folder to a new directory and try to re-run it. Does it still work?2Make sure the repository/link is public/working before sharing it.3 代 写Linear Regression NEKN96 Rare exceptions can be made if required. In that case, [email protected] as soon as possible.4Next is the retake on December 12th, 2024.1NEKN96

Assignment

Our goal is to put into practice the separation of population vs. sample using a linear regressionmodel. This hands-on approach will allow us to generate a sample from a known Population RegressionFunction (PRF) and observe how breakages of the Gauss-Markov assumptions can affect our sampleestimates.

We will assume that the PRF is:Y = α + β1X1 + β2X2 + β3X3 + ε (1)However, to break the assumptions, we need to add:

A0: Non-linearities

A2: Heteroscedasticity

A4: Endogeneity

A7: Non-normality in a small sample

A3 autocorrelation will be covered in HWA2, time-series modelling.

Q1 - All Assumptions Fulfilled

Let’s generate a ”correct” linear regression model. Generate a PRF with the parameters:α = 0.7, β1 = 1, β2 = 2, β3 = 0.5, ε N(0, 4), Xi iidN(0, 1). (2)The example code is also available in Canvas Setup Parametersn = 30p = 3 = [ -1, 2, 0.5]alpha = 0.7Simulate X and Y, using normally distributed errors5np . random . seed ( seed =96)X = np . random . normal ( loc =0, scale =1, size =( n , p ))eps = np . random . normal ( loc =0, scale =2, size = n )y = alpha + X @ beta + eps

Run the correctly specified linear regression modelresult_OLS = OLS ( endog =y , exog = add_constant ( X )). fit ()result_OLS . summary ()

❼ Add a well-formatted summary table

❼ Interpret the estimate of βˆ 2 and the R2 .5

Important: The np.random.seed() will ensure that we all get the same result. In other words, ensure that we areusing the ”correct” seed and that we don’t generate anything else ”random” before this simulation.2NEKN96

❼ In a paragraph, discuss if the estimates are consistent with the population regression function.Why, why not?

❼ Re-run the model, increasing the sample size to n = 10000. In a paragraph, explain what happensto the parameter estimates, and why doesn’t R2 get closer and closer to 1 as n increases?

Q2 - Endogeneity What if we (wrongly) assume that the PRF is:Y = α + β1X1 + β2X2 + ε (3)Use the same seed and setup as in Q1, and now estimate both the ”correct” and the ”wrong” model:result_OLS = OLS ( endog =y , exog = add_constant ( X )). fit()result_OLS . summary ()result_OLS_endog = OLS ( endog =y , exog = add_constant ( X [: ,0:2 ])). fit ()result_OLS_endog . summary ()

❼ Shouldn’t this imply an omitted variable bias? Show mathematically why it won’t be a problemin this specific setup (see lecture notes ”Part 2 - Linear Regression”).

Q3 - Non-Normality and Non-Linearity Let’s simulate a sample of n = 3000, keeping the same parameters, but adding kurtosis and skewnessto the error terms:6n = 3000X = np . random . normal ( loc =0, scale =1, size =( n , p ))eps = np . random . normal ( loc =0, scale =2, size = n )eps_KU = np . sign ( eps ) * eps **2eps_SKandKU_tmp = np . where ( eps_KU > 0, eps_KU , eps_KU *2)eps_SKandKU = eps_SKandKU_tmp - np . mean ( eps_SKandKU_tmp )Now make the dependent variable into a non-linear relationshipy_exp = np . exp ( alpha + X @ beta + eps_SKandKU )

❼ Create three figures:

  1. Scatterplot of y exp against x 1
  2. Scatterplot of ln(y exp) against x 1
  3. plt.plot(eps SKandKU)The figure(s) should have a descriptive caption, and all labels and titles should be clear to the

reader.Estimate two linear regression models:6

The manual addition of kurtosis and skewness will make E [ε] = 0, so we need to remove the average from the errors

to ensure that the exogeneity assumption is still fulfilled.

3NEKN96

res_OLS_nonLinear = OLS ( endog = y_exp , exog = add_constant ( X )). fit ()

res_OLS_transformed = OLS ( endog = np . log ( y_exp ) , exog = add_constant ( X )). fit ()

❼ Add the regression tables of the non-transformed and transformed regressions

❼ In a paragraph, does the transformed model fit the population regression function?

Finally, re-run the simulations and transformed estimation with a small sample, n = 30

❼ Add the regression table of the transformed small-sample estimate

❼ Now, re-do this estimate several times7 and observe how the parameter estimates behave. Do

the non-normal errors seem to be a problem in this spot?

Hint: Do the parameters seem centered around the population values? Do we reject H0 : βi = 0?

❼ In a paragraph, discuss why assuming a non-normal distribution makes it hard to find the

distributional form under a TRUE null hypothesis, H0 Distribution?

Hint: Why is the central limit theorem key for most inferences?

Q4 - Heteroscedasticity

Suggest a way to create heteroscedasticity in the population regression function.8

❼ Write down the updated population regression function in mathematical notation

❼ Estimate the regression function assuming homoscedasticity (as usual)

❼ Adjust the standard errors using a Heteroscedastic Autocorrelated Consistent (HAC) estimatorclearly state which HAC estimator you use)

❼ Add the tables of both the unadjusted and adjusted estimates

❼ In a paragraph, discuss if the HAC adjustment to the standard errors makes sense given theway you created the heteroscedasticity. Did the HAC adjustment seem to fix the problem?Hint: Bias? Efficient? 7Using a random seed for each estimate.

8Tip: Double-check by simulating the model and plotting the residuals against one of the regressors. Does it lookheteroscedastic?4

标签:Linear,random,eps,Regression,add,OLS,NEKN96,np,regression
From: https://www.cnblogs.com/WX-codinghelp/p/18420379

相关文章

  • 数据稀缺条件下的时间序列微分:符号回归(Symbolic Regression)方法介绍与Python示例
    时间序列概况在日常生活和专业研究中都很常见。简而言之,时间序列概况是一系列连续的数据点 y(0),y(1),...,y(t) ,其中时间 t 的点依赖于时间 t-1 的前一个点(或更早的时间点)。在许多应用中,研究者致力于预测时间序列概况的未来行为。存在各种建模方法。这些模型通常基......
  • [np-ml] Ridge Regression
    算法描述Ridgeregressionusesthesamesimplelinearregressionmodelbutaddsanadditionalpenaltyonthe\(L2\)-normofthecoefficientstothelossfunction.ThisissometimesknownasTikhonovregularization.Inparticular,theridgemodelisthesame......
  • *Python*机器学习算法——线性回归(Linear Regression)
    目录⭐️引言⭐️理论1、 简单线性回归2、 多元线性回归3、最佳拟合⭐️结语⭐️引言        线性回归(LinearRegression)是一种基本的预测分析方法,它通过拟合数据点来建立因变量(目标变量)与一个或多个自变量之间的关系模型。线性回归假设这种关系是线性的,并试图找到......
  • Adafactor Adaptive Learning Rates with Sublinear Memory Cost
    目录概符号说明AdafactorFactoredSecondMomentEstimationNoMomentumOut-of-DateSecondMomentEstimator算法代码ShazeerN.andSternM.Adafactor:Adaptivelearningrateswithsublinearmemorycost.ICML,2018.概本文介绍了一种memory-efficient的优化器:Ad......
  • 逻辑回归(Logistic Regression)
    许多问题需要将概率估算值作为输出。由于线性回归无法保证输出值表示概率(介于零和一之间),所以需要逻辑回归——它是一种极其高效的概率计算机制。那么逻辑回归如何保证其输出表示概率?1.逻辑回归如何计算概率?碰巧,有一族函数称为“逻辑函数”,其输出满足上述条件。标准逻辑函数/S......
  • Towards Achieving Asynchronous MPC with Linear Communication and Optimal Resilie
    Abstract.Securemulti-partycomputation(MPC)allowsasetofnpartiestojointlycomputeafunctionovertheirprivateinputs.TheseminalworksofBen-Or,CanettiandGoldreich[STOC’93]andBen-Or,KelmerandRabin[PODC’94]settledthefeasibility......
  • 基于Python的机器学习系列(17):梯度提升回归(Gradient Boosting Regression)
    简介        梯度提升(GradientBoosting)是一种强大的集成学习方法,类似于AdaBoost,但与其不同的是,梯度提升通过在每一步添加新的预测器来减少前一步预测器的残差。这种方法通过逐步改进模型,能够有效提高预测准确性。梯度提升回归的工作原理        在梯度提升......
  • Spark MLlib模型训练—回归算法 Decision tree regression
    SparkMLlib模型训练—回归算法Decisiontreeregression在机器学习中,决策树是一种常用且直观的模型,广泛应用于分类和回归任务。决策树回归(DecisionTreeRegression)通过将数据集分割成多个区域,构建一棵树形结构,以预测目标变量的连续值。本文将详细探讨Spark中的决......
  • Spark MLlib模型训练—回归算法 GLR( Generalized Linear Regression)
    SparkMLlib模型训练—回归算法GLR(GeneralizedLinearRegression)在大数据分析中,线性回归虽然常用,但在许多实际场景中,目标变量和特征之间的关系并非线性,这时广义线性回归(GeneralizedLinearRegression,GLR)便应运而生。GLR是线性回归的扩展,能够处理非正态分布的目标......
  • SciTech-Mathmatics-Probability+Statistics: How to Read and Interpret a Regressio
    HowtoReadandInterpretaRegressionTableBYZACHBOBBITTPOSTEDONMARCH20,2019https://www.statology.org/read-interpret-regression-table/Instatistics,regressionisatechniquethatcanbeusedtoanalyzetherelationshipbetweenpredictorvariabl......