首页 > 其他分享 >STAT802 分析

STAT802 分析

时间:2023-03-25 09:55:37浏览次数:35  
标签:分析 regression will Part marks your STAT802


STAT802 – Assignment 1, Part A. 1

STAT802: Advanced Topics in Analytics - Semester 1 2023
STAT802 Assignment 1 – Part A Due: 5pm on Friday 24 March 2023
Outline: Assignment 1 – Part A comprises three questions worth 15% of your final grade.
Total: 50 marks.

Only documents in portable format (pdf) will be accepted. You can use, e.g., Word, knitr
or Sweave to create your report, as well as R Studio as editor of the source files.
Formats other than PDF will be ignored and the author will be asked to re–submit the
assignment within 24 hours after the due date & time at the cost of 5% of the total marks.
If the assignment is not resubmitted within this time frame, then it will be assigned a mark
of zero and deemed as non–submission.
Any SAS code required to complete this assignment, especially the code to support your
conclusions & answers, must be self-explanatory and must be embedded in the correspond-
ing answer as text (not image). SAS code submitted in separate files will be ignored and
not considered for marking.
Optionally, you may submit only your answers and avoid copying & pasting each question
in the PDF document. If this is the case, then just make reference to each question, e.g.,
Answer Question 1 (a), Answer Question 1 (b), ... , etc.
Read carefully – Answer all the questions as requested. Any material or information
unrelated to the correct answer may result in a significant reduction of marks for that
question.
Several questions will come to light while solving these tasks. You may need to visit
the SAS–support website for additional information about specific statements/steps to
complete them.
Finally, fill in and sign the cover sheet which must be the very first page in the PDF. Use,
e.g., Adobe Acrobat Pro on Uni computers. Do not submit the cover sheet separately.
Finally, if you need an extension because your performance has been impacted by some exten-
uating, unexpected, circumstances, then you can submit and SCA along with relevant evidence
using the submission link from our STAT802 Home page. Bear in mind that SCA processing
may take up to 5 working days. If you have questions, contact [email protected].
STAT802 – Assignment 1, Part A. 2
Question 1. The file binary.csv contains information of 400 students who applied to graduate
school last year. The file can be downloaded from the Week-2 Lab Canvas webpage.
There are four variables, as follows:
admit, which is equal to 1 if the individual was admitted to graduate school, and 0
otherwise,
gre, the student’s gre score when the application was submitted,
gpa, the student’s gpa when the application was submitted, and
rank, that takes on the values 1 through 4 and indicates the prestige of the Institution
the student obtained their bachelor’s degree. Institutions with a rank of 1 have the
highest prestige, while those with a rank of 4 have the lowest.
Using regression models, your manager (Cathy) is willing to explore gre, gpa, and institu-
tion rank as factors that may influence the chance of students to be admitted to graduate
school. Specifically, she believes that gpa has the highest influence on anticipating the
admission (and non-admission) of these students to graduate school. Cathy also believes
that the differences among the institution’s prestige in the chances of students ‘admitted’
and ‘not admitted’ differ based on the gre scores. Is your manager correct with both
assumptions? These results will be used in the next Executive Board meeting.
a) (1 marks - model + 4 marks - justification = 5marks) Propose and EXPLAIN
an appropriate modelling framework to deal with your manager’s concern. Name the
model (e.g., ordinary regression, logistic regression, etc.)
b) (3 marks) Write down the full (theoretical) model. Derive the reduced models, if
any. If no reduced models are to be considered, then write down a short paragraph
explaining this point.
c) You should by aware by now of the exceedingly large difference between the GPA and
GRE supports. While GPA ranges 1 to 5 points, GRE’s minimum is 220 units. Inter-
preting regression output with GRE or GPA as response and the other as predictor
may be hardly intuitive. Before going through d) - f), you are required to re-scale
GRE or GPA in a suitable and appropriate way. To complete this task, read the
following report:
https://scc.ms.unimelb.edu.au/resources/reporting-statistical-inference/
rescaling-explanatory-variables-in-linear-regression.
(5 marks) Write down 2-3 sentences outlining the approach you have adopted to deal
with this matter. Don’t go through d) - f) with this issues yet unresolved.
Hint: You can re-scale GRE to, say, ‘tens’ or ‘hundreds’.
STAT802 – Assignment 1, Part A. 3
d) (3 marks) Generate SAS code to estimate your model, AND appropriately address
any issue related to OVERDISPERSION, if any.
e) (6 marks) For the following students, your manager wants to know how likely (or
unlikely) is for them to be admitted to graduate school. See Slide 12 (predicted
probabilities) from the Week 2-Lecture Slides Part II deck!
Teresa: gre = 680, gpa = 3.5, and rank = 2.
Johanna: gre = 530, gpa = 4.18, and rank = 3.
Tim: gre = 600, gpa = 4.34, and rank = 4.
f) (8 marks) Write down an executive summary (avoid technical jargon). Focus on the
question Is your manager correct with both assumptions?. Include a short
discussion on Part d) - Overdispersion and Part e).
NOTE: Present output relevant to this question correctly cited and including captions
in an Appendix!
Question 2. The data set testScores.sas7bdat contains data from 200 high school students.
These are scores on various tests, including science, math, reading and social studies. The
variable female is coded as ‘1’ if the student was a female and 0 otherwise.
Your client claims (beyond all doubt) that the ‘math’ scores are a good predictor of the
student’s results in their ‘science’ test. Moreover, your client is convinced that they can
find segregated modelling frameworks for this purpose based on the variable female.
a) (0 marks) Run a PROC CONTENTS on this data set and carefully look at the
attributes and labels for each variable. Then, read and understand the regression
analysis conducted on this data presented at https://stats.oarc.ucla.edu/sas/
output/regression-analysis/
Make sure you understand the inputs from the Anova Table: Source, DF, etc.
b) (5 marks) Write 5-7 sentences describing the variables you will use in this ques-
tion. Use, e.g., PROC BOXPLOT or PROC SUMMARY. Present the output in an
Appendix correctly labelled and cited.
c) (5 marks) Propose and EXPLAIN a suitable regression model to look into your
client’s claims. Write down the full and reduced models, if applies.
NOTE: The model shown in a)-website is just an example and may be completely
different from the model that must be proposed and used in this question.
Continued...
STAT802 – Assignment 1, Part A. 4
d) (10 marks) Write down an executive summary. Using plain English, you are required
to make use of goodness of fit metrics. Include but not limited to the ‘F test’ (F-
value), the adjusted R-squared and the estimated coefficients. Present the output in
an Appendix correctly labelled and cited.
** END OF ASSIGNMENT 1A **
Would you like to increase your chances of successfully completing this assignment?
Read the following online documents:
a) A toy problem (interaction):
https://www.theanalysisfactor.com/interaction-dummy-variables-in-linear-regression/
b) Section 11.2 of (you will also need to read Section 11.1) https://book.stat420.org/
categorical-predictors-and-interactions.html
Edited by Victor Miranda; March 2023.

WX:codehelp mailto: [email protected]

标签:分析,regression,will,Part,marks,your,STAT802
From: https://www.cnblogs.com/sonjava/p/17254167.html

相关文章

  • 可行性分析报告(软件项目)
    本项目是机票预订系统,老师对这份报告评分是89分,其中数据流图和系统流程图一定要画。1引言1.1参考资料《软件工程导论》,海藩,清华大学。2.《实用软件工程》,人杰等......
  • R语言中贝叶斯网络(BN)、动态贝叶斯网络、线性模型分析错颌畸形数据|附代码数据
    全文链接:http://tecdat.cn/?p=22956最近我们被客户要求撰写关于贝叶斯网络的研究报告,包括一些图形和统计输出。贝叶斯网络(BN)是一种基于有向无环图的概率模型,它描述了一组......
  • 人工神经网络ANN中的前向传播和R语言分析学生成绩数据案例|附代码数据
    全文链接:http://tecdat.cn/?p=19936最近我们被客户要求撰写关于人工神经网络ANN的研究报告,包括一些图形和统计输出。在本教程中,您将学习如何在R中创建神经网络模型这里......
  • const对象分析
    const定义一个常量,常量定义以后就不能被修改。1classA{2public:3inta;4int*b;5int&c;6A():a(1),b(newint(1)),c(*newint(1)){......
  • MyBatisPlus插件扩展_SqlExplainInterceptor执行分析插件的使用
    简介SQL执行分析拦截器【目前只支持MYSQL-5.6.3以上版本】,作用是分析处理DELETEUPDATE语句,防止小白或者恶意deleteupdate全表操作!com.baomidou.mybatisplus.pl......
  • MyBatisPlus插件扩展_PerformanceInterceptor性能分析插件的使用
    简介性能分析拦截器,用于输出每条SQL语句及其执行时间参数:maxTimeSQL执行最大时长,超过自动停止运行,有助于发现问题。参数:formatSQLSQL是否格式化,默认false。实现来到......
  • 创建对象的底层分析
    Java是一门面向对象的编程语言,Java程序运行过程中无时无刻都有对象被创建出来。在语言层面上,创建对象通常(例外:复制、反序列化)仅仅是一个new关键字而已,而在虚拟机中,对象(文中......
  • RunnerGo与JMeter在产品设计方面的比较分析
    当谈到对于性能测试的需求时,JMeter和RunnerGo在测试场景设置、执行性能测试、性能测试结果分析等方面都提供了很多功能,但两个工具的结构方面仍然存在一些区别。以下是详细的......
  • RunnerGo与JMeter在产品设计方面的比较分析
    当谈到对于性能测试的需求时,JMeter和RunnerGo在测试场景设置、执行性能测试、性能测试结果分析等方面都提供了很多功能,但两个工具的结构方面仍然存在一些区别。以下是详细......
  • 数字IC低功耗分析
    前言:为什么要降低芯片功耗芯片封装都较小,如果功耗过大,则能量密度太大功耗影响到芯片内部甚至外部的电源网络架构设计高功率带来温度提升,会使性能受影响,时序跑不高面向数......