SME Notes 1
Simple linear regression model
\[y_i=\beta_1+\beta_2 x_i+\epsilon_i \]There are a number of assumptions required to formulate the simple linear regression model:
- The value of \(y_i\), at each value of \(x_i\), is \(y_i=\beta_1+\beta_2 x_i+\epsilon_i\).
- The independent variables \(x_i\) are not random, and must take at least two different values.
- The expected value of the random errors, \(\epsilon_i\), is \(\mathbb{E}\left(\epsilon_i\right)=0\) or equivalently \(\mathbb{E}\left(y_i\right)=\) \(\beta_1+\beta_2 x_i\)
- The variances of the the random errors, \(\epsilon_i\), and the random variables, \(y_i\), are equal to each other:
In fact, \(\epsilon_i\) and \(y_i\), both of which are random, only differ by constants \(\beta_1+\beta_2 x_i\).
- The covariance between any pair of the random errors, \(\epsilon_i\) and \(\epsilon_j(i \neq j)\), is zero:
Covariance equals to 0 does not necessarily imply that 2 random variables are (statistically) independent.
Definition (Statistically independent)
Two events are independent if the occurrence of one event does not affect the chances of the occurrence of the other event.
- (Optional) The values of the random errors, \(\epsilon_i\), are normally distributed about their means if the values of the random variables, \(y_i\), are normally distributed, and vice versa
poe version:
ASSUMPTIONS OF THE SIMPLE LINEAR REGRESSION MODEL-II SR1. The value of \(y\), for each value of \(x\), is
\[y=\beta_1+\beta_2 x+e \]SR2. The expected value of the random error \(e\) is
\[E(e)=0 \]which is equivalent to assuming that
\[E(y)=\beta_1+\beta_2 x \]SR3. The variance of the random error \(e\) is
\[\operatorname{var}(e)=\sigma^2=\operatorname{var}(y) \]The random variables \(y\) and \(e\) have the same variance because they differ only by a constant.
\[\operatorname{cov}\left(e_i, e_j\right)=\operatorname{cov}\left(y_i, y_j\right)=0 \]
SR4. The covariance between any pair of random errors \(e_i\) and \(e_j\) isThe stronger version of this assumption is that the random errors \(e\) are statistically independent, in which case the values of the dependent variable \(y\) are also statistically independent.
\[e \sim N\left(0, \sigma^2\right) \]
SR5. The variable \(x\) is not random and must take at least two different values.
SR6. (optional) The values of \(e\) are normally distributed about their meanif the values of \(y\) are normally distributed, and vice versa.
Uncorrelated vs independent
Two random variables \(X\) and \(Y\) are uncorrelated when their correlation coefficient \(\rho\) is zero:
\[\rho(X, Y)=\frac{\operatorname{Cov}(X, Y)}{\sqrt{\operatorname{Var}(X) \operatorname{Var}(Y)}}=0 . \]Moreover, having zero correlation coefficient is the same as having zero covariance:
\[\operatorname{Cov}(X, Y)=\mathbb{E}(X Y)-\mathbb{E}(X) \mathbb{E}(Y)=0 \]which leads to
\[\mathbb{E}(X Y)=\mathbb{E}(X) \mathbb{E}(Y) \]Definition
If \(\rho(X, Y) \neq 0\), then \(X\) and \(Y\) are correlated.
Definition
Two random variables are (statistically) independent when their joint probability distribution is the product of their marginal probability distributions: for all \(x\) and \(y\),
\[p_{X, Y}(x, y)=p_X(x) p_Y(y) . \]Equivalently, the conditional distribution is the same as the marginal distribution:
\[p_{Y \mid X}(y \mid x)=p_Y(y) \]Some Questions
exam 3b
(b) True or False? Explain your answer for the following statements.
(i) When the errors in a regression model have \(\mathrm{AR}(1)\) serial correlation, the ordinary least squares (OLS) standard errors tend to correctly estimate the sampling variation in the estimators.
[3]
F
\[E\left(e_t\right)=0 \quad \operatorname{var}\left(e_t\right)=\sigma_e^2=\frac{\sigma_v^2}{1-\rho^2} \](ii) The weighted least squares method is preferred to OLS when an important variable is omitted from the model.
[3]
F
Weighted Least Squares method: In this way we take advantage of the heteroskedasticity to improve parameter
estimation.
The Ramsey Regression Equation Specification Error Test (RESET) is
designed to detect omitted relevant variables and an incorrect functional form.
(iii) The OLS estimators are no longer BLUE (best linear unbiased estimators) under the situation of the heteroskedasticity.
[3]
T
BLUE:
- Assumptions 1-5
- smallest variance
- unbiased linear estimator
when heteroskedasticity exists,
- The least squares estimator is still a linear and unbiased estimator, but it is no longer
best. There is another estimator with a smaller variance. - The standard errors usually computed for the least squares estimator are incorrect.
Confidence intervals and hypothesis tests that use these standard errors may be
misleading.
(iv) The adjusted \(R^2\) will not decrease if an additional explanatory variable is introduced into the model.
(v) We impose assumptions on the dependent variable and the random error term in linear regression models using the least squares principle. We do not need to impose assumptions on the explanatory variables since they are random variables.
[3]
F
The independent variables \(x_i\) are not random, and must take at least two different values.
(vi) For linear models, it is always appropriate to use \(R^2\) as a measure of how well the estimated regression equation fits the data because it shows the proportion of total variation that is explained by the regression.
[3]
F?
- not always appropriate
- when comparing models with same number of explanatory variables, choose the one with highest \(R^2\) is appropriate.
- problem: by adding more and more explanatory variables, \(R^2\) can be made larger and larger.
- It shows the proportion of variation in a dependent variable explained by variation in the explanatory variables.
(vii) Interval estimates based on the least squares principle incorporate both the point estimate and the standard error of the estimate, and the sample size as well, so a true parameter is actually certain to be included in such an interval.
[3]
F
we can only say the true parameter is in out confidence interval with significance level of \(\alpha\), or say we have ... certainty to ensure the estimated value is in the confidence interval.
The estimated value still have probability \(\alpha\) to fall out of the interval.
6.6 poe carter 3e
(a) Least squares estimation of \(y_i=\beta_1+\beta_2 x_i+\beta_3 w_i+e_i\) gives \(b_3=0.4979, \operatorname{se}\left(b_3\right)=0.1174\) and \(t=0.4979 / 0.1174=4.24\). This result suggests that \(b_3\) is significantly different from zero and therefore \(w_i\) should be included in the model. Additionally, the RESET test based on the equation \(y_i=\beta_1+\beta_2 x_i+e_i\) gives \(F\)-values of \(17.98\) and \(8.72\) which are much higher than the \(5 \%\) critical values of \(F_{(0.95,1,32)}=4.15\) and \(F_{(0.95,2,31)}=3.30\), respectively. Thus, the model omitting \(w_i\) is inadequate.
(b) Let \(b_2^*\) be the least squares estimator for \(\beta_2\) in the model that omits \(w_i\). The omittedvariable bias is given by
Now, \(\widehat{\operatorname{cov}(x, w)}>0\) because \(r_{x v}>0\). Thus, the omitted variable bias will be positive. This result is consistent with what we observe. The estimated coefficient for \(\beta_2\) changes from \(-0.9985\) to \(4.1072\) when \(w_i\) is omitted from the equation.
(c) The high correlation between \(x_i\) and \(w_i\) suggests the existence of collinearity. The observed outcomes that are likely to be a consequence of the collinearity are the sensitivity of the estimates to omitting \(w_i\) (the large omitted variable bias) and the insignificance of \(b_2\) when both variables are included in the equation.
6.10 poe carter 4e
beer.def
Q PB PL PR I
Obs: 30 annual observations from a single household
1. Q = litres of beer consumed
2. PB = Price of beer ($)
3. PL = price of other liquor ($)
4. PR = price of remaining goods and services (an index)
5. I = income ($)
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
Q | 30 56.11333 7.857381 44.3 81.7
PB | 30 3.08 .6421945 1.78 4.07
PL | 30 8.367333 .7696347 6.95 9.52
PR | 30 1.251333 .298314 .67 1.73
I | 30 32601.8 4541.966 25088 41593
Use the sample data for beer consumption in the file beer.dat to
(a) Estimate the coefficients of the demand relation (6.14) using only sample information. Compare and contrast these results to the restricted coefficient results given in (6.19).
(b) Does collinearity appear to be a problem?
(c) Test the validity of the restriction that implies that demand will not change if prices and income go up in the same proportion.
(d) Use model (6.19) to construct a 95% prediction interval for \(Q\) when \(P B=3.00, P L=10, P R=2.00\), and \(I=50000\). (Hint: Construct the interval for \(\ln (Q)\) and then take antilogs.)
(e) Repeat part (d) using the unconstrained model from part (a). Comment.
solution
6.20 poe carter 4e
rice.def
firm year prod area labor fert
Obs: a panel with 44 firms over 8 years (1990-1997)
total observations = 352
firm Firm number ( 1 to 44)
year Year = 1990 to 1997
prod Rice production (tonnes)
area Area planted to rice (hectares)
labor Hired + family labor (person days)
fert Fertilizer applied (kilograms)
Data source: These data were used by O’Donnell, C.J. and W.E. Griffiths (2006),
"Estimating State-Contingent Production Frontiers", American Journal of
Agricultural Economics, 88(1), 249-266.
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
firm | 352 22.5 12.7165 1 44
year | 352 1993.5 2.294549 1990 1997
prod | 352 6.466392 5.076672 .09 31.1
area | 352 2.117528 1.451403 .2 7
labor | 352 107.2003 76.6456 8 436
-------------+--------------------------------------------------------
fert | 352 187.0545 168.5852 3.4 1030.9
Reconsider the production function for rice estimated in Exercise \(5.24\) using data in the file rice.dat:
\[\ln (P R O D)=\beta_1+\beta_2 \ln (\text { AREA })+\beta_3 \ln (\text { LABOR })+\beta_4 \ln (\text { FERT })+e \](a) Using a 5% level of significance, test the hypothesis that the elasticity of production with respect to land is equal to the elasticity of production with respect to labor.
(b) Using a \(10 \%\) level of significance, test the hypothesis that the production function exhibits constant returns to scale-that is, \(H_0: \beta_2+\beta_3+\beta_4=1\).
(c) Using a 5% level of significance, jointly test the two hypotheses in parts (a) and (b)-that is, \(H_0: \beta_2=\beta_3\) and \(\beta_2+\beta_3+\beta_4=1\).
(d) Find restricted least squares estimates for each of the restricted models implied by the null hypotheses in parts (a), (b) and (c). Compare the different estimates and their standard errors.
Solution
(a) Testing \(H_0: \beta_2=\beta_3\) against \(H_1: \beta_2 \neq \beta_3\), the calculated \(F\)-value is \(0.342\). We do not reject \(H_0\) because \(0.342<3.868=F_{(0.95,1,348)}\). The \(p\)-value of the test is \(0.559\). The hypothesis that the land and labor elasticities are equal cannot be rejected at a \(5 \%\) significance level.
Using a \(t\)-test, we fail to reject \(H_0\) because \(t=-0.585\) and the critical values are \(t_{(0.025,348)}=-1.967\) and \(t_{(0.975,348)}=1.967\). The \(p\)-value of the test is \(0.559\).
标签:right,variables,random,operatorname,beta,SME,Notes,left From: https://www.cnblogs.com/kion/p/16945714.html