首页 > 其他分享 >STAT3010统计方法

STAT3010统计方法

时间:2023-05-03 10:45:56浏览次数:37  
标签:STAT3010 set 方法 code marks report data your 统计


STAT3010/6075 Statistical Methods in Insurance
Assignment 2
 This assignment is worth 10% of the overall mark for STAT3010/6075.
 The deadline for submission is 16.00 on Thursday 4 May 2023.
 Standard University policies and procedures will be followed for late submission, extensions and
academic integrity (see the Module Outline for details).
 Submission is via Blackboard. You must submit a report of at most six pages (in pdf format),
containing your answers, and a separate R script, containing the code that you used to obtain
your results.
– Your should submit your report via TurnitinUK on Blackboard (see Module Outline for
details) in a file called report-ID.pdf, where ID is your student ID number, for example
report-12345678.pdf. In the Assignments folder, click on Assignment 2 report submission
to submit your report. Please enter this file name as the Submission Title.
– You should not include R code used in your analysis in your report, but you must submit
a separate R script via Blackboard containing your code called code-ID.R, for example
code-12345678.R. Please rename and use the R template code-yyy.R provided. In the
Assignments folder, click on Assignment 2 code submission to submit your code.
? The page limit is strict and is easily sufficient to receive full credit. If your report is more than
six pages of A4, only the first six pages will be marked.
Recall from Assignment 1 that a health insurance company is developing a model to assess the risk of
its policy holders having diabetes based on the following data from the file diabetes.csv:
Diabetes Binary variable indicating diabetes diagnosis, either positive (pos) or negative (neg)
Age Age of individual, recorded in years
BMI Body mass index (weight in kg/(height in m)2)
Glucose Plasma glucose concentration
Pressure Diastolic blood pressure (mm Hg)
Pregnant Number of times pregnant
Use the code in the R template to:
(a) Set the seed to be your student ID number with the command set.seed(ID ), for example
set.seed(12345678).
(b) Select a random training data set (train=1) of size 450 and test data set (train=0) of size 274
with the command train <- sample(c(rep(0,274), rep(1,450))).
1
Tasks
1. Calculate the diabetes rate in the test and training data sets, and hence calculate the classification
rate of the na¨?ve classifier. Comment on the usefulness of this classifier for identifying cases of
diabetes.
[4 marks]
2. Fit a logistic regression model to predict Diabetes from Age, BMI, Glucose, Pressure and
Pregnant using the training data set and calculate its classification rate using the test data
set.
[4 marks]
3. Fit ridge regression models with λ = 0.1, 0.2, 0.3 and 0.4 to predict Diabetes from Age, BMI,
Glucose, Pressure and Pregnant using the training data set and calculate their classification
rates using the test data set.
[8 marks]
4. Fit logistic regression models using LASSO with λ = 0.01, 0.02, 0.03 and 0.04 to predict Diabetes
from Age, BMI, Glucose, Pressure and Pregnant using the training data set and calculate their
classification rates using the test data set.
[8 marks]
5. Calculate the classification rates on the test data set for the K-nearest neighbours classifiers with
K = 1 to 15 to predict Diabetes from Age, BMI, Glucose, Pressure and Pregnant trained on
the training data set.
[8 marks]
6. Produce a classification tree to predict Diabetes from Age, BMI, Glucose, Pressure and Pregnant
grown on the training data set.
[4 marks]
7. The R function predict can be used on a classification tree to classify new observations contained
in a dataframe unseen: predict(tree, unseen, type="class"). Use this function to calculate
the classification rate for the tree produced in part 6.
[4 marks]
8. Which of the above classifiers would you recommend the company uses? Justify your answer.
Start by selecting a value for λ for the ridge regression model and logistic regression model using
LASSO, and a value for K for the K-nearest neighbours classifier.
[10 marks]

  WX:codehelp

标签:STAT3010,set,方法,code,marks,report,data,your,统计
From: https://www.cnblogs.com/tongu1/p/17368763.html

相关文章

  • 拓展必学1.1 常见排序方法
    今天遇到了需要排序的题目,我们已知的就是用已有的算法进行排序,也就是使用Arrays.sort(nums[]),这个算法本质是快排,在算法比赛的时候可以使用,但是如果需要手撕代码就不可以了,所以我们需要积累常见的数组排序方法。一、冒泡排序(改进版)基本思想:   冒泡排序(BubbleSort)是一种简......
  • xshell下上传文件无法上传,速度一直为0的解决方法
    连接服务器cd切换到主目录执行以下命令如果在xshell下上传文件速度一直为0,可以尝试安装yum -yinstalllrzsz......
  • 废弃P-value,还是学学如何评估统计检验结果?
    前几天,Nature上一篇comment再度引发关于p-value如何使用和解释的文章:Scientistsriseupagainststatisticalsignificance,800多名科学家联合声明拒绝使用基于p-value或置信区间或贝叶斯因子等的二分法将研究结果分为统计显著和统计不显著两个部分,而是应该把置信区间改为兼容性区......
  • 12.石油储备(简单搜索 DFS/BFS 统计连通块个数)
    石油储备题目一片土地可以看作是一个\(n\)行\(m\)列的方格矩阵。其中一些方格藏有石油,用@表示,其余方格没有石油,用*表示。每个方格都与其上、下、左、右、左上、右上、左下、右下八个方格视为相邻。如果两个藏有石油的方格相邻,则它们被认为是处于同一片油田,否则它们被......
  • Oracle ORA-01033: ORACLE initialization or shutdown in progress(误删了DBF数据库
    先声明一下前期的一些手欠欠儿的操作导致oracl登录不进去了,起先是清理磁盘空间的时候误删除了orcleDBF数据文件后无法进入系统,plsql登录报错如下:一般情况下,删除表空间的正确方法是:DROPTABLESPACEBDCDJINCLUDINGCONTENTSANDDATAFILES;如果没有通过以上命令删除而直接删除......
  • c#中的 委托、匿名方法、lambda表达式、事件
    综述:委托、匿名方法、lambda表达式、事件委托的意义在于:通过委托把函数当成方法参数来传递,以便方法内部调用额外传过来的处理逻辑。(定义委托类型→声明委托变量→实例化委托变量(附加方法)→作为参数传递给目标方法→目标方法内调用委托)匿名方法的意义在于:快速方便的实例化委托,不......
  • HJ18 识别有效的IP地址和掩码并进行分类统计
    思路:程序实现不难,困难的是看懂题目。需要右一点IP地址和子网掩码的基本知识。困难点1:255.255.255.32就是一个非法的掩码。32的二进制0b100000,不足8位,需要补全为00100000,因此1前面有‘0’是非法掩码。困难2:当成对的子网掩码或IP地址为非法时,计算为一个错误,并不再对IP地址的类别......
  • 服务器如何测试网速?服务器测试带宽常用方法分享
    相信大家都知道服务器的性能决定了服务器的稳定和速度,比如运算速度,传输速度这个直接影响着每毫秒可以处理多少数据,这个就类似U盘读写速度目前都是光纤,光纤的质量差别并不会很大,如果访问速度不好的话,会让网站加载非常慢,在选择服务商时,首先一定要选择有保障的,方便日常维护其次就是就......
  • 利用Linux系统生成随机密码的8种方法
    Linux操作系统的一大优点是对于同样一件事情,你可以使用高达数百种方法来实现它。例如,你可以通过数十种方法来生成随机密码。本文将介绍生成随机密码的十种方法。1.使用SHA算法来加密日期,并输出结果的前10个字符:[root@kafka60shell]#date+%s|sha256sum|base64|head-c10......
  • 入门3-Pytest测试用例运行方式(1)Main方法运行
    通过主函数main方式执行If__name__== ‘__main__’:  Pytest.main({“--vs”})一般run.py里用main()方法,里边可以加参数 ......