首页 > 其他分享 >MSE 609 Quantitative Data Analysis

MSE 609 Quantitative Data Analysis

时间:2024-11-09 19:09:35浏览次数:1  
标签:model 609 regression points Quantitative MSE data your population

MSE 609 Quantitative Data AnalysisMidterm 3Instructions:

  1. Prepare your answers using Jupyter Notebook or R Markdown, and submit as a PDF or HTMLdocument. Ensure your submission is clear, organized, and well-formatted.Use complete sentences when explaining, commenting, or discussing. Provide thorough answerswithin the context of the problem for full credit.

Show all work and reasoning in your submission. Your grade will depend on the clarity, detail,and correctness of your answers.

  1. The exam is open book and open notes. You may use textbooks, course notes, and approvedcoding tools (e.g., Jupyter Notebook or R Studio). However, using generative AI tools (e.g., largelanguage models) is not permitted.
  2. Total points = 100.
  3. The exam duration is 1 week. Submit your completed exam by Thursday, November 14, at 11:59
  4. Late submissions will not be accepted.
  5. Upload your submission to Crowdmark in PDF or HTML format1 (42 points total) The data file question1.csv contains information about the economies of 366metropolitan areas (MSAs) in the United States for the year 2006. The dataset includes variablesuch as the population, the total value of all goods and services produced for sale in the city thatyear per person (“per capita gross metropolitan product”, pcgmp), and the share of economicoutput coming from four selected industries.(1 points) Load the data file and confirm that it contains 366 rows and 7 columns. Explain whythere are seven columns when only six variables are described in the dataset.
  1. (1 points) Compute summary statistics for the six numerical columns.(4 points) Create univariate exploratory data analysis (EDA) plots for population and per capitaGMP. Use histograms and boxplots, and describe the distributions of these variables(4 points) Generate a bivariate EDA plot showing per capita GMP as a function of populationDescribe the relationship observed in the plot.
  1. (3 points) Using only basic functions like mean, var, cov, sum, and arithmetic operations, calculate the slope and intercept of the least-squares regression line for predicting per capita GMPbased on population(3 points) Compare the slope and intercept from your calculations to those returned by the function in R. Are they the same? Should they be(3 points) Add both regression lines to the bivariate EDA plot. Comment on the fit and whetherthe assumptions of the simple linear regression model appear to hold. Are there areas where thfit seems particularly good or poor?(3 points) Identify Pittsburgh in the dataset. Report its population, per capita GMPthe percapita GMP predicted by your model, and the residual for Pittsburgh(2 points) Calculate the mean squared error (MSE) of the regression model. That is, comput(2 points) Discuss whether the residual for Pittsburgh is large, small, or typical relative to thMSE2k. (4 points) Create a plot of residuals (vertical axis) against population (horizontal axis). Whapattern should you expect if the assumptions of the simple linear regression model are valid? Doethe plot you generated align with these assumptions? Explain(3 points) Create a plot of squared residuals (vertical axis) against population (horizontal axis)What pattern should you expect if the assumptions of the simple linearegression model are代写MSE 609 Quantitative Data Analysis  valid?Does the plot you generated align with these assumptions? Explain3 points) Carefully interpret the estimated slope in the context of the actual variables involvedin this problem, rather than using abstract terms like ”predictorvariable” or ”X”points) Using the model, predict the per capita GMP for a city with a population that is 105higher than Pittsburgh’(3 points) Discuss what the model predicts would happen to Pittsburgh’s per capita GMP if policy intervention were to increase its population by 105 people.3 (40 points total) In real-world data analysis, the process goes beyond simply generating amodel and reporting the results. It’s essential to accurately frame theproblem, select appropriateanalytical methods, interpret the findings, and communicate them in a way that is accessible to anaudience that may not be familiar with advanced statistical methods.Research Scenario: Coral shells, known scientifically as Lithoria crusta, are marine mollusksthat inhabit rocky coastal areas. Their meat is highly valued as a delicacy, eaten raw or cooked inmany cultures. Estimating the age of Lithoria crusta, however, is difficultsince their shell size iinfluenced not only by age but also by environmental factors, such as foodsupply. The traditionalmethod for age estimation involves applying stain to a shell sample and counting rings under amicroscope. A team of researchers is exploring whether certain physical characteristics of Lithoria crusta, particularly their height, might serve as indicators of age. They propose using a simplelinear regression modelwith normally distributed errors to examine the association between shellheight and age, positing that taller shells are generally older. The dataset for this research isavailable at question2.csv.(3 points) Load the data. Describe the research hypothesis.(4 points) Examine the two variables individually (univariate). Find summary measures foeach (mean, variance, range, etc.). Graphically display each and describe your graphs. What is theunit of height?(4 points) Generate a labeled scatterplot of the data. Describe interesting features or trends youobserve. points) Fit a simple linear regression to the data, predicting the number of rings using theheight of the Lithoria crusta(4 points) Generate a labeled scatterplot that displays the data and the estimated regressiofunction line (you may add this to the previous scatterplot). Describe the fit of the line(5 points) Perform diagnostics to assess whether the model assumptions are met. If not, appropriately transform the height and/or number of rings and re-fit your model. Justify your decisionsnd re-check your diagnostics(4 points) Interpret your final parameter estimates in context. Provide 95% confidence intervalsfor β0 and β1, and interpret these in the context of the problem.(ints) Determine whether there is a statistically significant relationship between the heigh4and the number of rings (and hence, the age) of Lithoria crusta. Explain your findings in thecontext of the problem.(4 points) Find the point estimate and the 95% confidence interval for the average number ofrings for a Lithoria crusta with a height of 0.128 (in the same unit aother observations of height)Interpret this in the context of the problem.(4 points) We are interested inpredicting the number of rings for a Lithoria crusta with a heightof 0.132 (in the same unit as other observations of height). Find the predicted value and a 99%prediction interval.(3 points) What are your conclusions? Identify a key finding and discuss its validity. Canyou come up with any reasons for what you observe? Do you have any suggestions or recommendations for the researchers? How could this analysis be improved? (Provide 6–8 sentences in total.)5 (18 points total) Load the stackloss data:data(stackloss)names(stackloss)help(stackloss)(3 points) Plot the data and describe any noticeable patterns or trends.(5 points) Fit a multiple regression model to predict stack loss from the three other variables.The model isY = β0 + β1X1 + β2X2 + β3X3 + ϵ where Y is stack loss, is airflow, X2 is water temperature, and X3 is acid concentration. Summarize the results of theregression analysis, including the estimatedcoefficients and their interpretation(points) Construct 90 percent confidence intervals for the coefficients of the linear regressionmodel. Interpret these intervals in the context of the proble(3 points) Construct a 99 percent prediction interval for a new observation when Airflow = 58,Water temperature = 20, and Acid = 86. Interpret the prediction interval(4 points) Test the null hypothesis H0 : β3 = 0. What is the p-value? Based on a significancelevel of α = 0.10, what is your conclusion? Explain your reasoning.6

标签:model,609,regression,points,Quantitative,MSE,data,your,population
From: https://www.cnblogs.com/comp9321/p/18536476

相关文章

  • NSET or MSET算法--原理解析
    1.背景NSET/MSET是一种非线性的多元预测诊断技术,广泛应用于系统状态估计、故障诊断和预测等领域;相比于传统的线性模型和方法,NSET/MSET能够更好地处理非线性系统,并提供更准确的预测和诊断能力。在早期,MSET融合了模式识别技术和序贯概率比检验方法,主要应用于核电厂信号验证、......
  • (57)MATLAB使用迫零均衡器和MMSE均衡器的BPSK调制系统仿真
    文章目录前言一、仿真测试模型二、仿真代码三、仿真结果四、迫零均衡器和MMSE均衡器的实现1.均衡器的MATLAB实现2.均衡器的性能测试总结前言本文给出仿真模型与MATLAB代码,分别使用具有ISI的三个不同传输特性的信道,仿真测试了使用迫零均衡器和MMSE均衡器的基带BPSK......
  • IMSE7140 Cracking CAPTCHAs
    IMSE7140Assignment2CrackingCAPTCHAs(20points)2.1BriefIntroductionCAPTCHAorcaptchaistheacronymfor“CompletelyAutomatedPublicTuringtesttotellComputersandHumansApart.”Youmusthavebeenalreadyfamiliarwithitbecauseofitspopu......
  • mse~路由实现某个页面的灰度功能
    起因我有个网站A【蓝色服务】,要对网站A进行改版【绿色服务】,其中用户中心已经改完了,希望当用户访问时,如果http请求头中包含isGroup,并且isGroup=1时,去新的绿色服务,反之就还是去蓝色服务。前提蓝绿服务,域名是同一个,如lind.gray.com蓝绿服务,各个页面的URL是同一个用户测在访问U......
  • springboot菜谱个性化推荐系统-计算机毕业设计源码08609
    摘 要本文旨在探讨基于SpringBoot框架的菜谱个性化推荐系统的设计与实现。该系统利用先进的个性化推荐算法,结合用户的历史行为和偏好,为普通用户提供精准、个性化的菜谱推荐服务。同时,系统为管理员提供了全面、高效的后台管理功能,确保系统的稳定运行和数据的安全性。对于普......
  • CCF认证-201609-3 | 炉石传说
    问题描述《炉石传说:魔兽英雄传》(Hearthstone:HeroesofWarcraft,简称炉石传说)是暴雪娱乐开发的一款集换式卡牌游戏(如下图所示)。游戏在一个战斗棋盘上进行,由两名玩家轮流进行操作,本题所使用的炉石传说游戏的简化规则如下:*玩家会控制一些角色,每个角色有自己的生命......
  • springboot叙州区图书馆管理系统设计与实现---附源码60921
    摘 要图书馆作为知识传播和学术研究的重要场所,扮演着非常关键的角色。随着信息技术的快速发展和图书馆管理的日益复杂化,传统的手工管理方式已经无法满足现代图书馆的需求。因此,采用计算机技术和信息系统来辅助图书馆管理成为一种必要的选择。本系统的前端界面涉及的技......
  • C库函数 memset 学习
    见代码(很久以前的):本人实力不济,如有错误或建议及补充,请指出(评论或私信都行)/*码风很丑,见谅可用于数组之间的赋值,节省代码量*/#include<stdio.h>#include<stdlib.h>#include<string.h>#definem0(x)memset(x,0,sizeof(x))intmin(intx,inty){if(x<y)returnx......
  • AME 209/MSE 280 solution
    AME209/MSE280Homework4Fall2024Thehomework4solutionwillonlyincludetwom-files,oneforeachofthefollowingproblems.NoPDFwriteupisneededforthisassignment.Nameyoursolutionfiles:hw04_prob1_NNNN.mhw04_prob2_NNNN.msubstitutingthelast......
  • 边缘检测评估方法:FOM、RMSE、PSNR和SSIM对比实验和理论研究
    图像分割与边缘检测是密切相关的计算机视觉任务。以下图1展示了一个海岸线分割模型的输出示例:图1:分割掩码到边缘图的转换过程(数据集:LICS)模型将每个像素分类为陆地或海洋(分割掩码)。随后,海岸线被定义为分类发生变化的像素位置(边缘图)。边缘检测可以通过提取图像分割模......