STAT802 – Assignment 1, Part A. 1
STAT802: Advanced Topics in Analytics - Semester 1 2023
STAT802 Assignment 1 – Part A Due: 5pm on Friday 24 March 2023
Outline: Assignment 1 – Part A comprises three questions worth 15% of your final grade.
Total: 50 marks.
Only documents in portable format (pdf) will be accepted. You can use, e.g., Word, knitr
or Sweave to create your report, as well as R Studio as editor of the source files.
Formats other than PDF will be ignored and the author will be asked to re–submit the
assignment within 24 hours after the due date & time at the cost of 5% of the total marks.
If the assignment is not resubmitted within this time frame, then it will be assigned a mark
of zero and deemed as non–submission.
Any SAS code required to complete this assignment, especially the code to support your
conclusions & answers, must be self-explanatory and must be embedded in the correspond-
ing answer as text (not image). SAS code submitted in separate files will be ignored and
not considered for marking.
Optionally, you may submit only your answers and avoid copying & pasting each question
in the PDF document. If this is the case, then just make reference to each question, e.g.,
Answer Question 1 (a), Answer Question 1 (b), ... , etc.
Read carefully – Answer all the questions as requested. Any material or information
unrelated to the correct answer may result in a significant reduction of marks for that
question.
Several questions will come to light while solving these tasks. You may need to visit
the SAS–support website for additional information about specific statements/steps to
complete them.
Finally, fill in and sign the cover sheet which must be the very first page in the PDF. Use,
e.g., Adobe Acrobat Pro on Uni computers. Do not submit the cover sheet separately.
Finally, if you need an extension because your performance has been impacted by some exten-
uating, unexpected, circumstances, then you can submit and SCA along with relevant evidence
using the submission link from our STAT802 Home page. Bear in mind that SCA processing
may take up to 5 working days. If you have questions, contact [email protected].
STAT802 – Assignment 1, Part A. 2
Question 1. The file binary.csv contains information of 400 students who applied to graduate
school last year. The file can be downloaded from the Week-2 Lab Canvas webpage.
There are four variables, as follows:
admit, which is equal to 1 if the individual was admitted to graduate school, and 0
otherwise,
gre, the student’s gre score when the application was submitted,
gpa, the student’s gpa when the application was submitted, and
rank, that takes on the values 1 through 4 and indicates the prestige of the Institution
the student obtained their bachelor’s degree. Institutions with a rank of 1 have the
highest prestige, while those with a rank of 4 have the lowest.
Using regression models, your manager (Cathy) is willing to explore gre, gpa, and institu-
tion rank as factors that may influence the chance of students to be admitted to graduate
school. Specifically, she believes that gpa has the highest influence on anticipating the
admission (and non-admission) of these students to graduate school. Cathy also believes
that the differences among the institution’s prestige in the chances of students ‘admitted’
and ‘not admitted’ differ based on the gre scores. Is your manager correct with both
assumptions? These results will be used in the next Executive Board meeting.
a) (1 marks - model + 4 marks - justification = 5marks) Propose and EXPLAIN
an appropriate modelling framework to deal with your manager’s concern. Name the
model (e.g., ordinary regression, logistic regression, etc.)
b) (3 marks) Write down the full (theoretical) model. Derive the reduced models, if
any. If no reduced models are to be considered, then write down a short paragraph
explaining this point.
c) You should by aware by now of the exceedingly large difference between the GPA and
GRE supports. While GPA ranges 1 to 5 points, GRE’s minimum is 220 units. Inter-
preting regression output with GRE or GPA as response and the other as predictor
may be hardly intuitive. Before going through d) - f), you are required to re-scale
GRE or GPA in a suitable and appropriate way. To complete this task, read the
following report:
https://scc.ms.unimelb.edu.au/resources/reporting-statistical-inference/
rescaling-explanatory-variables-in-linear-regression.
(5 marks) Write down 2-3 sentences outlining the approach you have adopted to deal
with this matter. Don’t go through d) - f) with this issues yet unresolved.
Hint: You can re-scale GRE to, say, ‘tens’ or ‘hundreds’.
STAT802 – Assignment 1, Part A. 3
d) (3 marks) Generate SAS code to estimate your model, AND appropriately address
any issue related to OVERDISPERSION, if any.
e) (6 marks) For the following students, your manager wants to know how likely (or
unlikely) is for them to be admitted to graduate school. See Slide 12 (predicted
probabilities) from the Week 2-Lecture Slides Part II deck!
Teresa: gre = 680, gpa = 3.5, and rank = 2.
Johanna: gre = 530, gpa = 4.18, and rank = 3.
Tim: gre = 600, gpa = 4.34, and rank = 4.
f) (8 marks) Write down an executive summary (avoid technical jargon). Focus on the
question Is your manager correct with both assumptions?. Include a short
discussion on Part d) - Overdispersion and Part e).
NOTE: Present output relevant to this question correctly cited and including captions
in an Appendix!
Question 2. The data set testScores.sas7bdat contains data from 200 high school students.
These are scores on various tests, including science, math, reading and social studies. The
variable female is coded as ‘1’ if the student was a female and 0 otherwise.
Your client claims (beyond all doubt) that the ‘math’ scores are a good predictor of the
student’s results in their ‘science’ test. Moreover, your client is convinced that they can
find segregated modelling frameworks for this purpose based on the variable female.
a) (0 marks) Run a PROC CONTENTS on this data set and carefully look at the
attributes and labels for each variable. Then, read and understand the regression
analysis conducted on this data presented at https://stats.oarc.ucla.edu/sas/
output/regression-analysis/
Make sure you understand the inputs from the Anova Table: Source, DF, etc.
b) (5 marks) Write 5-7 sentences describing the variables you will use in this ques-
tion. Use, e.g., PROC BOXPLOT or PROC SUMMARY. Present the output in an
Appendix correctly labelled and cited.
c) (5 marks) Propose and EXPLAIN a suitable regression model to look into your
client’s claims. Write down the full and reduced models, if applies.
NOTE: The model shown in a)-website is just an example and may be completely
different from the model that must be proposed and used in this question.
Continued...
STAT802 – Assignment 1, Part A. 4
d) (10 marks) Write down an executive summary. Using plain English, you are required
to make use of goodness of fit metrics. Include but not limited to the ‘F test’ (F-
value), the adjusted R-squared and the estimated coefficients. Present the output in
an Appendix correctly labelled and cited.
** END OF ASSIGNMENT 1A **
Would you like to increase your chances of successfully completing this assignment?
Read the following online documents:
a) A toy problem (interaction):
https://www.theanalysisfactor.com/interaction-dummy-variables-in-linear-regression/
b) Section 11.2 of (you will also need to read Section 11.1) https://book.stat420.org/
categorical-predictors-and-interactions.html
Edited by Victor Miranda; March 2023.
WX:codehelp mailto: [email protected]
标签:分析,regression,will,Part,marks,your,STAT802 From: https://www.cnblogs.com/sonjava/p/17254167.html