首页 > 其他分享 >EBIS4043 Big Data Analysis and Applications

EBIS4043 Big Data Analysis and Applications

时间:2024-10-26 18:47:11浏览次数:6  
标签:customer EBIS4043 Big use dataset model Total Data your

The purpose of this assignment is to make sure that you are picking up the R based analytics skills (Please do not use other tools to generate the answers!) that have been introduced in this class and check your ability. (Total 50 marks)
1.Use the dataset available at iSpace.
2.Make sure to have the entire process from data loading to analysis and interpretation in the submission. 
3.All your answers including your identity, codes, interpretation should be in one file: HTML generated from Rmarkdown file (.rmd). Any sort of multiple files will be graded as zero mark.
4.You can discuss the coding for this assignment with your friends. However, any visible overlap in your interpretation will be considered plagiarism.
5.The use of any generative AI tool is strictly prohibited for this assignment. If such use is detected, it will be considered an attempt at plagiarism.
6.There can be more than one correct answer to every question. Use any technique that you learned from the classroom.
7.If needed, use 20240614 as a random number seed.

Data Description
This dataset is originally from the Orange Telecom’s churn dataset, which consists of customer information known to the telecom company, along with a churn indicator (“TRUE” = canceled the subscription, “FALSE” = otherwise). Regarding the customer information, the dataset contains customers’ location, extra service plans (e.g., international roaming and voice mail services), usage (in terms of minutes, no. of calls, charged fees, …), and so on. All customers in the dataset are from the United States.

Questions
1.Write and execute R code to build and test the below regression equation for predicting the value of the Churn variable using the dataset with 1) Linear Probability Model (LPM) and 2) Logistic regression model. Transforming & creating variables appropriately if needed. Which model has a better fit? (Total 10 marks)

 

Where CS.contacted: = 0 if the customer has never contacted customer service, = 1 otherwise, and Total.all.charge: = Sum of all fees charged to the telecom customer for calls, except for customer service calls, and 代 写EBIS4043 Big Data Analysis and Applications Total.all.time: = Sum of all time the customer spent on calls, except for customer service (in minutes). 
2.Using the LPM model estimated for question 1, plot the effect of Total.all.charge on Churn in the case of CS.contacted = 0 and CS.contacted = 1 while the values of other predictors are held at their mean values. (Total 10 marks)

3.Write and execute R code to build and test the below regression equation for predicting the values of the Churn variable using all predictors in the dataset with 1) Linear Probability Model (LPM) and 2) Logistic regression model. Please use 5-fold cross-validation for both models. (Total 10 marks)
Hint1: use the caret package.
Hint2: use as.factor() function to convert a variable into a factor variable.

 

4.Based on the results from question 3, which model is preferred for prediction, in terms of accuracy at the threshold of 0.3? (Total 10 marks)
Hint: use data.frame() function to convert the list output from predict() into a dataframe.

5.Do you think the LPM model developed in question 3 can be used for predicting whether a Canadian customer will be churned? Please provide at least two reasons for your answer based on this document and answers you have generated so far. (Total 10 Marks)

标签:customer,EBIS4043,Big,use,dataset,model,Total,Data,your
From: https://www.cnblogs.com/goodlunn/p/18504353

相关文章

  • DataGrip 2024.2.2 最新安装教程(附激活-2099年~)
    下载DataGrip2024.2.2版本的安装包。下载补丁https://pan.quark.cn/s/fcc23ab8cadf检查免责声明:本文中的资源均来自互联网,仅供个人学习和交流使用,严禁用于商业行为,下载后请在24小时内从电脑中彻底删除。对于因非法使用而引起的版权争议,与作者无关。所有资源仅供学习......
  • 数据集&yolo关键点模型 -关键点系列- 手部关键点数据集 handpose keypoints >> DataBall
    数据集&yolo关键点模型-关键点系列-手部关键点数据集handposekeypoints>>DataBall该示例用3k+数据训练,模型采用yolo11n架构,对于一些简单场景可以满足左右手检测及21关键点检测,运算量小,模型效能高。后期会推出yolo11s,yolo11m架构模型或其它yolo系列。一、模型推......
  • Win11系统appdata文件夹位置详解
    Win11系统appdata文件夹位置详解在我们的日常电脑使用中,C盘作为系统盘,承载着大量的系统文件和应用程序数据。其中,Appdata文件夹是一个非常重要的目录,它包含了软件的配置信息、临时文件等,这些文件对于软件的正常运行至关重要。然而,由于Appdata文件夹默认是隐藏的,很多使用Win......
  • 使用Insomnia来调用Dataverse的Web API
    这是我的第513篇原创文章,写于2024年10月26日。以前我写过一篇文章:配置Postman通过OAuth2implicitgrant获取D365数据,以前我这个文章参考的的官方原文使用的是Postman这个工具,现在变成使用Insomnia了,官方原文是:UseInsomniawithDataverseWebAPI。所以我今天来讲讲使用Inso......
  • BigDecimal使用
    常见方法我们在使用BigDecimal时,为了防止精度丢失,推荐使用它的BigDecimal(Stringval)构造方法或者BigDecimal.valueOf(doubleval)静态方法来创建对象。《阿里巴巴Java开发手册》对这部分内容也有提到,如下图所示。加减乘除add方法用于将两个BigDecimal对象相加,subt......
  • SciTech-BigDataAIML-KLD(KL散度):测度比较"两Distribution(概率分布)"的Similarity(接
    KLD(Kullback-LeiblerDivergence,KL散度):测度比较两Distribution的SimilarityAI领域最重要的MeasureMethodofDistributions(分布度量方法)简写和全称:KLD(Kullback-LeiblerDivergence,KL散度)用途:测度比较两Distribution的Similarity(统计应用上,我们经常需要:......
  • Paper Reading: Multi-class Imbalance Classification Based on Data Distribution a
    目录研究动机文章贡献基于样本权重的数据分布类间数据分布类内数据分布基于分布的样本权重自适应样本权重跟踪当前的训练状态基于自适应分布的样本权重基于自适应分布的样本权重的AdaboostAdaBoost.AD算法理论分析实验结果数据集和实验设置对比实验消融实验优点和创新点PaperR......
  • ODATA入门
    OData即开放数据协议(OpenDataProtocol)是一种描述如何创建和访问Restful服务的OASIS标准(经ISO/IEC批准),该标准由微软发起。OData是一个开源的协议,可以帮助您在创建RESTFULAPIs期间专注于商业逻辑,而不必担心用哪种方法来定义请求和响应头、状态代码、HTTP方法、URL规则、......
  • [MySQL#1] database概述 | 常见的操作指令 | MySQL架构 | 存储引擎
    #1024程序员节|征文#目录一.数据库概念0.连接服务器1.什么是数据库口语中的数据库为什么数据不直接以文件形式存储,而需要使用数据库呢?总结二.......
  • BigFoot BigDebuffs
    BigFoot BigDebuffs大脚插件调整目标DOT图标大小,其目标就是让我们自己的DOT图标大一些,而团队其他人小一点,区别开。178新版魔兽插件站-大脚插件站-178.comBigDebuffs-v41.zip  2024.10.24下载的版本解压文件后,得到一堆的lua脚本把整个文件夹拷贝到D:\Battle.ne......