首页 > 其他分享 >MATH38161 Multivariate Statistics and Machine Learning

MATH38161 Multivariate Statistics and Machine Learning

时间:2024-11-23 17:55:16浏览次数:7  
标签:Multivariate Statistics clear analysis Machine marks report data your

MATH38161 Multivariate Statistics and Machine Learning

Courseworkovember 2024

Overview The coursework is a data analysis project with a written report. You will apply skills

and techniques acquired from Week 1 to Week 8 to analyse a subset of the FMNISTdataset.In completing this coursework, you should primarily use the techniques and methods introduced during the course. The assessment will focus on your understanding anddemonstration of these techniques in alignment with the learning outcomes, ratherthan the accuracy or exactness of the final results.The project report will be marked out of 30. The marking scheme is detailed below.Software: You should mainly use R to perform the data analysis. You may usebuilt-in functions from R packages or implement the algorithms with your owncodes.

  • Report: You may use any document preparation system of your choice but thefinal document must be a single PDF in A4 format. Ensure that the text in thePDF is machine-readable.
  • Content: Your report must include the complete analysis in a reproducible format,integrating the computer code, figures, and text etc. in one document.
  • Title Page: Show your full name and your University ID on the title page of yourreport.

Length: Recommended length is 8 pages of content (single sided) plus title

The deadline for submission is 11:59pm, Friday 29 November 2024.

  • Submission is online on Blackboard (through Grapescope).

Academic Integrity and Use of AI Tools This is an individual coursework. Your analysis and report must be completedindependently, including all computer code. Note that according to the Universityguidances, output generated by AI tools is considered work created by another person.

  • Citations: Acknowledge all sources, including AI tools used to support text andcode writing.
  • Ethics: Use sources in an academically appropriate and ethical manner. Do notcopy verbatim, and cite the original authors rather than second- or third-levelsources.
  • Accuracy: Be mindful that sources, including Wikipedia and AI tools, may containnon-obvious errors.Copying and plagiarism (=passing off someone else’s work as your own) is a very serious offence and will be strictly prosecuted. For more details see the “Guidanceto students on plagiarism and other forms of academic malpractice” availableathttps://documents.manchester.ac.uk/display.aspx?DocID=2870 .

Analysis of the FMNIST data using principal component analysis

(PCA) and Gaussian mixture models (GMMs) 代写 MATH38161 Multivariate Statistics and Machine LearningThe Fashion MNIST dataset contains 70,000 grayscale images of fashion productsandcontains 10,000 images, each with dimensions of 28 by 28 pixels, resulting in a total of84 pixels per image. Each pixel is represented by an integer value ranging from 0 to

You can download this data subset as “fmnist.rda” (7.4 MB) from Blackboard.

oad("fmnist.rda")# load sampled FMNIST data set dim(fmnist$x)# dimension of features data matrix (10000, 784) ## [1] 10000 784range(fmnist$x)# range of feature values (0 to 255) ## [1]0 25re is a plot of the first 15 images:

ar(mfrow=c(3,5), mar=c(1,1,1,1))

or (k in 1:15)

# first 15 images {

m = matrix( fmnist$x[k,] , nrow=28, byrow=TRUE)

}3Each sample is assigned to one label represented by an integer from 0 to 9 (as R factorwith 10 levels):fmnist$label[1:15]

# first 15 labels ## [1] 7 1 4 8 1 4 7 1 2 0 7 0 8 1 6## Levels: 0 1 2 3 4 5 6 7 8 9

Task 1: Dimension reduction for FMNIST data using principal components analysis (PCA) The following steps are suggested guidelines to help structure your analysis but are notmeant as assignment-style questions. Integrate your work as part of a cohesive reportwith a logical narrative.

  • Do some research to learn more about the FMNIST data.
  • Compute the 784 principal components from the 784 original pixel variables.
  • Compute and plot the proportion of variation attributed to each principal component.
  • Create a scatter plot of the first two principal components. Use the known labelsto colour the scatter plot.
  • Construct the correlation loadings plot.
  • Interpret and discuss the result.
  • Save the first 10 principal components of all 10,000 images to a data file for Task 2.

Task 2: Analysis of the FMNIST data set using Gaussian mixture models (GMMs) Using all 784 pixel variables for cluster analysis is computationally impractical. Inthis task, use the 10 (or fewer) principal components instead of the original784pixelvariables. Again, these steps serve as guidelines. Integrate this work into your reportlogically following from Task 1.

Cluster the data using Gaussian mixture models (GMMs).

  • Find out how many clusters can be identified.
  • Interpret and discuss the results.

Structure of the report Your report should be structured into the following sections:

  1. Dataset
  2. Methods
  3. Results and Discussion
  4. References

n Section 1 provide some background and describe the data set. In Section 2 brieflyntroduce the method(s) you are using to analyse the data. In Section 3 run the analysesnd present and interpret the results. Show all your R code so that your results areully reproducible. In Section 4 list all journal articles, books, wikipedia entries, githubpages and other sources you refer to in your report.

4Marking scheme

The project report will be assessed out of 30 points based on the following rubrics.Criteria Marks RubricsDescription ofdata6Excellent (5-6 marks): Provides a clear and thoroughoverview of the FMNIST dataset, detailing the imagestructure, pixel data, and its context within multivariateanalysis.

Good (3-4 marks): Provides a clear overview of thedataset with some context; minor details may be missing.Adequate (1-2 marks): Basic description of the datasetwith limited context; lacks important details.

Insufficient (0 marks): Little to no description provided.ofMethods6Excellent (5-6 marks): Clearly and thoroughly explainsPCA and GMMs, their purposes, and how they apply tothis dataset.Good (3-4 marks): Provides a clear explanation of PCA

and GMMs, with minor gaps in clarity or relevance. (1-2 marks): Basic explanation of methods withlimited detail or relevance to the course techniques.

Insufficient (0 marks): Lacks clear explanations of themethods.Results andDiscussion12

Excellent (10-12 marks): Correctly applies PCA andGMMs, presents clear and informative visualisations, andprovides a coherent and insightful interpretation of theresults.

Good (7-9 marks): Accurately applies PCA and GMMswith mostly clear visuals and reasonable interpretation;minor improvements needed.Adequate (4-6 marks): Basic application of techniques,limited or unclear visuals, minimal interpretation.

Insufficient (0-3 marks): Incorrect application oftechniques, with little to no interpretation.OverallPresentation of

Report6Excellent (5-6 marks): Report is well-organised, clear, andprofessionally formatted, with a logical narrative and

adherence to page limits.Good (3-4 marks): Report is generally clear andorganised, with minor structural or formatting issues.Adequate (1-2 marks): Report lacks coherence or hassignificant formatting issues; may not meet all format

requirements.Insufficient (0 marks): Report lacks structure and clarity,does not meet formatting requirements.5

标签:Multivariate,Statistics,clear,analysis,Machine,marks,report,data,your
From: https://www.cnblogs.com/CSE231/p/18564219

相关文章

  • 机器学习(MachineLearning)(8)——模型评估与优化
    机器学习(MachineLearning)(1)——机器学习概述机器学习(MachineLearning)(2)——线性回归机器学习(MachineLearning)(3)——决策树回归机器学习(MachineLearning)(4)---------分类_逻辑回归机器学习(MachineLearning)(5)——分类_决策树机器学习(MachineLearning)(6)——分类_支持向量机机......
  • 机器学习(MachineLearning)(7)——分类_朴素贝叶斯
    机器学习(MachineLearning)(1)——机器学习概述机器学习(MachineLearning)(2)——线性回归机器学习(MachineLearning)(3)——决策树回归机器学习(MachineLearning)(4)---------分类_逻辑回归机器学习(MachineLearning)(5)——分类_决策树机器学习(MachineLearning)(6)——分类_支持向量机一......
  • vulnhub-Machine_Matrix靶机的测试报告
    目录一、测试环境1、系统环境2、使用工具/软件二、测试目的三、操作过程1、信息搜集2、Getshell3、提权四、结论一、测试环境1、系统环境渗透机:kali2021.1(192.168.202.134)靶 机:Linuxporteus4.16.3-porteus2、使用工具/软件Kali:arp-scan(主机探测)、nma......
  • COMS 6998 - High Performance Machine Learning
    COMS6998-HighPerformanceMachineLearningHomeworkAssignment1Fall2024DueDate:September292024UsetheGoogleCloudplatform.(GCP)oryourownmachine.MakesurethatyourGoogleVMoryourmachinehasatlast32GBofRAMtobeabletocomplet......
  • 【Unity】CinemachineVirtualCamera:实现第一人称视角控制
    相机视角的控制,利用CinemachineVirtualCamera插件(在packageManager中下载)实现键盘和鼠标控制第一人称视角。WASD前进后退向左向右,QE左右旋转;鼠标滚轮控制远近、俯仰和升降。另外还支持鼠标靠近边缘移动、鼠标拖拽等控制方式。成果展示Scene部分主相机增加CinemachineBrain组......
  • ECE598HZ: Advanced Topics in Machine Learning
    ECE598HZ:AdvancedTopicsinMachineLearningandFormalMethodsFall2024Homework1DueSep2311:59pmCTTypesetyoursolutionsusingLATEX,createasinglezip fileincludingyoursolutions(ina singlePDF file), your code, andinstructionstorun......
  • [HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\partmgr\Parameters] "SanP
    WindowsRegistryEditorVersion5.00;关闭windowstogo特性[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control]"PortableOperatingSystem"=dword:00000000 [HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\partmgr\Parameters]"SanPolicy"=......
  • COMP5328 - Advanced Machine Learning
    COMP5328-AdvancedMachineLearningAssignment1Due:19/09/2024,11:59PMThisassignmentistobecompletedingroupsof3to4students.Itisworth25%ofyourtotalmark.1ObjectiveTheobjectiveofthisassignmentistoimplementNon-negativeMatri......
  • SciTech-Mathmatics-Probability+Statistics-VII-Statistics:Quantifing Uncertainty+
    SciTech-Mathmatics-Probability+Statistics-VII-Statistics:QuantifingUncertaintySamplingMethods(抽样方法)的原理与实践(终章)在过去的几篇文章,我们一起探索统计学的许多重要概念与方法:样本与总体,统计量、参数估计、假设检验、置信区间、ANOVA(方差分析),RA(回归分......
  • SciTech-Mathmatics-Probability+Statistics-V-Statistics:Quantifing Uncertainty+AN
    SciTech-Mathmatics-Probability+Statistics-V-Statistics:QuantifingUncertaintyANOVA(ANalysisOfVAriance)方差分析原理方差分析的基本概念(AnalysisofVariance,ANOVA)方差分析(AnalysisofVariance,简称ANOVA)是一种统计方法,用于检验三个或更多组数据的均值是否存在显......