首页 > 其他分享 >STAT3010统计方法


时间:2023-05-03 10:45:56浏览次数:37  
标签:STAT3010 set 方法 code marks report data your 统计

STAT3010/6075 Statistical Methods in Insurance
Assignment 2
 This assignment is worth 10% of the overall mark for STAT3010/6075.
 The deadline for submission is 16.00 on Thursday 4 May 2023.
 Standard University policies and procedures will be followed for late submission, extensions and
academic integrity (see the Module Outline for details).
 Submission is via Blackboard. You must submit a report of at most six pages (in pdf format),
containing your answers, and a separate R script, containing the code that you used to obtain
your results.
– Your should submit your report via TurnitinUK on Blackboard (see Module Outline for
details) in a file called report-ID.pdf, where ID is your student ID number, for example
report-12345678.pdf. In the Assignments folder, click on Assignment 2 report submission
to submit your report. Please enter this file name as the Submission Title.
– You should not include R code used in your analysis in your report, but you must submit
a separate R script via Blackboard containing your code called code-ID.R, for example
code-12345678.R. Please rename and use the R template code-yyy.R provided. In the
Assignments folder, click on Assignment 2 code submission to submit your code.
? The page limit is strict and is easily sufficient to receive full credit. If your report is more than
six pages of A4, only the first six pages will be marked.
Recall from Assignment 1 that a health insurance company is developing a model to assess the risk of
its policy holders having diabetes based on the following data from the file diabetes.csv:
Diabetes Binary variable indicating diabetes diagnosis, either positive (pos) or negative (neg)
Age Age of individual, recorded in years
BMI Body mass index (weight in kg/(height in m)2)
Glucose Plasma glucose concentration
Pressure Diastolic blood pressure (mm Hg)
Pregnant Number of times pregnant
Use the code in the R template to:
(a) Set the seed to be your student ID number with the command set.seed(ID ), for example
(b) Select a random training data set (train=1) of size 450 and test data set (train=0) of size 274
with the command train <- sample(c(rep(0,274), rep(1,450))).
1. Calculate the diabetes rate in the test and training data sets, and hence calculate the classification
rate of the na¨?ve classifier. Comment on the usefulness of this classifier for identifying cases of
[4 marks]
2. Fit a logistic regression model to predict Diabetes from Age, BMI, Glucose, Pressure and
Pregnant using the training data set and calculate its classification rate using the test data
[4 marks]
3. Fit ridge regression models with λ = 0.1, 0.2, 0.3 and 0.4 to predict Diabetes from Age, BMI,
Glucose, Pressure and Pregnant using the training data set and calculate their classification
rates using the test data set.
[8 marks]
4. Fit logistic regression models using LASSO with λ = 0.01, 0.02, 0.03 and 0.04 to predict Diabetes
from Age, BMI, Glucose, Pressure and Pregnant using the training data set and calculate their
classification rates using the test data set.
[8 marks]
5. Calculate the classification rates on the test data set for the K-nearest neighbours classifiers with
K = 1 to 15 to predict Diabetes from Age, BMI, Glucose, Pressure and Pregnant trained on
the training data set.
[8 marks]
6. Produce a classification tree to predict Diabetes from Age, BMI, Glucose, Pressure and Pregnant
grown on the training data set.
[4 marks]
7. The R function predict can be used on a classification tree to classify new observations contained
in a dataframe unseen: predict(tree, unseen, type="class"). Use this function to calculate
the classification rate for the tree produced in part 6.
[4 marks]
8. Which of the above classifiers would you recommend the company uses? Justify your answer.
Start by selecting a value for λ for the ridge regression model and logistic regression model using
LASSO, and a value for K for the K-nearest neighbours classifier.
[10 marks]


From: https://www.cnblogs.com/tongu1/p/17368763.html


  • 拓展必学1.1 常见排序方法
    今天遇到了需要排序的题目,我们已知的就是用已有的算法进行排序,也就是使用Arrays.sort(nums[]),这个算法本质是快排,在算法比赛的时候可以使用,但是如果需要手撕代码就不可以了,所以我们需要积累常见的数组排序方法。一、冒泡排序(改进版)基本思想:   冒泡排序(BubbleSort)是一种简......
  • xshell下上传文件无法上传,速度一直为0的解决方法
    连接服务器cd切换到主目录执行以下命令如果在xshell下上传文件速度一直为0,可以尝试安装yum -yinstalllrzsz......
  • 废弃P-value,还是学学如何评估统计检验结果?
  • 12.石油储备(简单搜索 DFS/BFS 统计连通块个数)
  • Oracle ORA-01033: ORACLE initialization or shutdown in progress(误删了DBF数据库
  • c#中的 委托、匿名方法、lambda表达式、事件
  • HJ18 识别有效的IP地址和掩码并进行分类统计
  • 服务器如何测试网速?服务器测试带宽常用方法分享
  • 利用Linux系统生成随机密码的8种方法
  • 入门3-Pytest测试用例运行方式(1)Main方法运行
    通过主函数main方式执行If__name__== ‘__main__’:  Pytest.main({“--vs”})一般run.py里用main()方法,里边可以加参数 ......