首页 > 其他分享 >CA Data Classification algorithm

CA Data Classification algorithm

时间:2024-06-10 10:22:54浏览次数:22  
标签:code classification algorithm Data CA should data class

CA Assignment 1 Data Classification  Implementing Perceptron algorithm

Assessment Information

Assignment Number 1 (of 2)

Weighting 15%

Assignment Circulated 10 Feb 2023

Deadline 3 March 2023 at 17:00Submission Mode Electronic via CanvasPurpose of assessment The purpose of this assignment is todemonstrate: (1) the understanding of thePerceptron algorithm; (2) the ability toimplement the Perceptron algorithm for binary classification; (3) the ability to evaluate a classification algorithm; (4) the ability to turna binary classification algorithm to a multi-class classification algorithm using the 1-vs-rest approach; (4) the ability toincorporate regularisation into a classificationalgorithm.Learning outcome assessed (1) A critical awareness of current problemsand research issues in data mining. (3) Theability to consistently apply knowledge concerning current data mining research issues in an original manner and produce work whichis at the forefront of current developments inthe sub-discipline of data mining.

1Objectives

This assignment requires you to implement the Perceptron algorithm using the Python programming language.

Assignment description

Download the CA1data.zip file. Inside, you will find two files: train.data and test.data, corresponding respectively to the train and test data to be used in this assignment. Each line in thefile represents a different train/test instance. The first four values (separated by commas) arefeature values for four features. The last element is the class label (class-1, class-2 or class-3).

Questions/Tasks

  1. (15 marks) Explain the Perceptron algorithm (both the training and the test procedures)for the binary classification case. Provide the pseudo code of the algorithm. It should bethe most basic version of the Perceptron algorithm, i.e. the one that was discussed in thelectures.
  1. (30 marks) Implement a binary perceptron. The implementation should be consistent with the pseudo code in the answer to Question 1.
  1. (15 marks) Use the binary perceptron to train classifiers to discriminate between
  • class 1 and class 2,
  • class 2 and class 3, and
  • class 1 and class 3.Report the train and test classification accuracies for each of the three classifiers aftertraining for 20 iterations. Which pair of classes is most difficult to separate?
  1. (30 marks) Explain in your own words what the 1-vs-rest approach consist of. Extend thebinary perceptron that you implemented in part 3 above to perform multi-class classificationusing the 1-vs-rest approach. Report the train and test classification accuracies for the

multi-class classifier after training for 20 iterations.

  1. (10 marks) Add an ` 2 regularisation term to your multi-class classifier implemented inpart 4. Set the regularisation coefficient to 0.01, 0.1, 1.0, 10.0, 100.0 and compare the train

and test classification accuracies. What can you conclude from the results?

Submission Instructions

Submit via Canvas the following two files (please do NOT zip files into an archive)

  1. the source code for all your programs (do not provide ipython/jupyter/colab note

books, instead submit standalone code in a single .py file), and

  1. a PDF file (report) of no more than 3 pages providing the answers to the questions.It is extremely important that you provide the two files described above and not just the sourcecode!

2Important notes

(read carefully and double check compliance before submission)

  1. No credit will be given for implementing any other type of classification algorithm or usingan existing library for classification instead of implementing it by yourself. However, youare allowed to use
  • numpy library for accessing data structures such as numpy.array;
  • random module; and
  • pandas.read_csv, csv.reader, or similar modules only for reading data from the files.

However, it is not a requirement of the assignment to use any of those modules.

  1. Your program
  • should run and produce all results for Questions 3, 4, and 5 in one click withoutrequiring any changes to the code;
  • should output only the required data in a clearly structured way; it should NOToutput any intermediate steps;
  • should assume that the input files are named ‘test.data’ and ‘train.data’, and arelocated in the same folder as the program; in particular, it should NOT use absolutepaths.
  1. Programs that do not run will result in a mark of zero!
  2. Your code should be as clear as possible and should contain only the functionality neededto answer the questions. Provide as much comments as needed to make sure that the logicof the code is clear enough to a marker. Marks may be deducted if the code is obscure,mplements unnecessary functionality, or is overly complicated.
  1. You are allowed to shuffle the data. If you use module random to shuffle the data, usea fixed seed value so that your program always produces the same output. This outputshould be exactly the one that you provide in the PDF report.
  1. Your answers in the PDF report should be succinct, but complete and clear. The clarityand presentation of the report will be assessed.
  1. Your submission should be your own work. Do not copy or share! Make sure that youclearly understand the severity of penalties for academic misconduct (https://www.liverpool.ac.uk/media/livacuk/tqsd/code-of-practice-on-assessment/appendix_L_cop_assess.pdf).

标签:code,classification,algorithm,Data,CA,should,data,class
From: https://www.cnblogs.com/qq99515681/p/18240439

相关文章

  • Case专题--->(28)verilog 优先Case(四)
     (28)verilog优先Case(四)1目录(a)IC简介(b)vim简介(c)Verilog简介(d)verilog优先Case(四)(e)结束1IC简介(a)在IC设计中,设计师使用电路设计工具(如EDA软件)来设计和模拟各种电路,例如逻辑电路、模拟电路、数字信号处理电路等。然后,根据设计电路的规格要求,进行布局设计和布线,确定各......
  • PingCastle 3.2.0.1 - Active Directory 安全检测和评估
    PingCastle3.2.0.1-ActiveDirectory安全检测和评估活动目录域安全分析工具请访问原文链接:https://sysin.org/blog/pingcastle/,查看最新版。原创作品,转载请保留出处。作者主页:sysin.org在20%的时间内获得80%的ActiveDirectory安全性ActiveDirectory正迅速成为......
  • Git-SSL证书-验证问题-可能由加速器引起:SSL certificate problem: unable to get loca
    一、问题的出现    当我们在使用Git 将本地仓库的代码推送到远程仓库或者从远程仓库克隆到本地时可能遇到以下问题。fatal:unabletoaccess'https://github.com/User/XXX/':SSLcertificateproblem:unabletogetlocalissuercertificate    即......
  • 腾讯冷启动论文阅读《Enhancing User Interest based on Stream Clustering and Memor
    背景用户冷启动一直是推荐系统中的一个难题,新用户(或非活跃用户)由于缺少行为数据,模型预估不准确。为了改善用户冷启动,腾讯提出了UserInterestEnhancement(UIE)模型(论文中提到也可以用于item的冷启动)。基本思想是先对用户聚类,然后用userembedding检索最相似的k个聚类中心来表示......
  • CSAPP Lab04——Cache Lab大师手笔,匠心制作
    浮沉浪似人潮哪会没有思念你我伤心到讲不出再见——讲不出再见完整代码见:CSAPP/cachelab-handoutatmain·SnowLegend-star/CSAPP(github.com)PartA:CacheSimulator这个lab描述背大锅,开始我是真有点没看懂题目的描述。特别是“M20,1”“L10,1”,这种描述二......
  • python-数据分析-Pandas-2、DataFrame对象
    如果使用pandas做数据分析,那么DataFrame一定是被使用得最多的类型,它可以用来保存和处理异质的二维数据。这里所谓的“异质”是指DataFrame中每个列的数据类型不需要相同,这也是它区别于NumPy二维数组的地方。DataFrame提供了极为丰富的属性和方法,帮助我们实现对数据的重塑、......
  • 屏幕录制TechSmith Camtasia 2024 v24.0.0.1041 中文破解版2024最新免费版
    camtasia2024是由美国TechSmith公司出品的一款屏幕录制及视频编辑软件,其包含了屏幕录像、视频剪辑和编辑、视频录音配音、视频菜单制作、视频剧场和视频播放等功能,可以将多种格式的图像、视频剪辑连接成电影,支持输出AVI、MP4、GIF、RM、WMV、MOV等常见格式,并可将电影文件打包成......
  • 计算机组成原理-cache详解
    一、Cache的概念和原理1、cache原理2、cache性能分析一道例题3、cache和主存数据交换的单位每次访问到的主存块会立即放入cache中小结二、cache和主存之间的映射关系全相联映射全相联访存过程直接映射组相联映射小结三、cache替换算法在直接映射中,每......
  • glibc中的localtime方法源码分析
    localtime方法会加锁,当TZ环境变量为空或者变更时,还会读取文件,还有个问题就是这个方法返回的指针是一个全局变量,可以使用redis无锁的localtime方法来优化这个性能。localtime方法调用链:localtime->__localtime64->__tz_convert(加锁、调用tzset_internal方法解释TZ环境变量,如果......
  • 在settings加入AUTHENTICATION_BACKENDS设置导致root用户无法登录问题
    是因为后台没有实现get_user认证导致get_user方法的作用:会话管理:当用户登录后,Django会在会话中存储用户的ID。每次请求时,Django会调用get_user方法来从会话中获取用户ID并加载用户对象。这确保了每个请求都能正确识别已登录的用户。处理请求中的用户:Django需要从......