首页 > 其他分享 >UFCFVQ-15-M Programming for Data Science

UFCFVQ-15-M Programming for Data Science

时间:2024-07-06 12:42:12浏览次数:12  
标签:function 15 assessment Programming should UFCFVQ file data your

  College of Arts, Technology and Environment aCADEMIC YEAR 2023/24   Resit Assessment Brief Submission and feedback dates Submission deadline: Before 14:00 on 15th July 2024  This assessment is eligible for 48-hour late submission window. Marks and Feedback due on: 12th August 2024 N.B. all times are 24-hour clock, current local time (at time of submission) in the UK. Submission details Module title and code:UFCFVQ-15-M Programming for Data Science Assessment type: Practical Skill Assessment Assessment title:Practical Coursework Assessment weighting: 100% of total module mark Size or length of assessment: No word limit; Development time 20 hours Module learning outcomes assessed by this task: 1. Apply the principles of programming and data management to solve problems. 2. Demonstrate the use of an object-oriented paradigm when solving software problems. 3. Design and implement algorithms for numerical analysis. 4. Demonstrate the use of proactive error handling techniques to address software reliability and program vulnerability issues. 5. Critique and reflect on alternative solutions to a given problem or on their own work in a constructive way. 6. Undertake independent research activities with relation to innovative approaches to data science problem solving. 7. Demonstrate the use of Data Visualisation techniques for supporting numerical data analysis. 8. Demonstrate the use of a version control system (such as Git) as part of an integrated development process. Completing your assessment  What am I required to do on this assessment? For this assessment, you are required to complete four different tasks. A brief outline is given below. Exact details of what is required are given in Appendix 1. 1.Develop a set of functions to solve a programming problem using ONLY built-in Python functions and data structures. 2.Perform basic data analysis of a given dataset and identify an “interesting” pattern or trend within the data. 3.Write a reflective report about the process you followed while developing solutions to the two main programming tasks (i.e., 1 & 2 above). Where should I start? To demonstrate your understanding and programming skills it is important that you develop a sufficient knowledge of the module materials and gain practical experience of coding in Python before you begin this assessment. You should read the detailed description of each task given in Appendix 1.  Firstly, you should create a GitHub account and follow the instructions given by the tutor for accessing the GitHub Classroom that has been set up for this assessment. How to complete this will be covered during one of your workshops. In addition, there is a pre-recorded explanation of how to do this available in the Assessment folder on Blackboard. Secondly, you need to clone your GitHub repository to your local machine. Now, you should open a Jupyter Notebook console from Anaconda Navigator and load the Resit Programming Task 1 Template. You can now begin working through the programming requirements set out in Section A What do I need to do to pass?  To pass this coursework assessment you will need to achieve an overall mark of 50% or above. Realistically, this will not be possible without at least attempting both programming tasks. However, you should make sure to attempt the other two task to ensure that you have maximised your mark for this assessment. How do I achieve high marks in this assessment?  High marks can be achieved by carefully following the requirements set out in Appendix 1. Marks will be deducted for solutions which do not follow the requirements precisely. In addition, you should make sure that you demonstrate good coding standards, write an insightful reflective (rather than descriptive) process report, and follow all naming conventions set out in this assessment. How does the learning and teaching relate to the assessment?  Week 1 focuses on Git and so following this material is important for accessing the assessment materials and submitting your work. Weeks 2 through 6 focus on basic Python programming. You should pay particular attention to Week 6 to identify built-in functions. These are important for the first task. Weeks 7 through 9 focus on how to use Python for data analysis and are important for the second task. Week 11’s Data Science demonstration may also be useful for the second task. What additional resources may help me complete this assessment? Additional resources that you might find useful for completing this assessment include: Reflective Writing course at https://xerte.uwe.ac.uk/play_4988 Referencing information at https://www.uwe.ac.uk/study/study-support/study-skills/referencing Module Discussion Boards: Coursework Queries and FAQs The Module Leader and Module Tutors will also available via email to clarify any issues you may be having with the assessment. Formative feedback can be requested during the tutorial sessions.   What do I do if I am concerned about completing this assessment? UWE Bristol offer a range of Assessment Support Options that you can explore through this link, and both Academic Support and Wellbeing Support are available. For further information, please see the Academic Survival Guide. How do I avoid an Assessment Offence on this module? 2 Use the support above if you feel unable to submit your own work for this module. The most common form of Assessment Offense for this type of assessment is copying code from another source (such as a forum, webpage, another student, etc) without referencing (and citing) it correctly. Referencing is an important part of academia, and you should become clear about when you need to reference an external source and how to reference it (more information is available in the study skills link above). However, it should be made clear that any copied code may result in partial marks for any sub-task in which it is used. During the marking phase, an analysis of submissions will be made across the cohort to identify any evidence of collusion and/or plagiarism.   UWE Bristol’s UWE’s Assessment Offences Policy requires that you submit work that is entirely your own and reflects your own learning, so it is important to: Ensure you reference all sources used, using the UWE Harvard and the guidance available on UWE’s Study Skills referencing pages.  Avoid copying and pasting any work into this assessment, including your own previous assessments, work from other students or internet sources. Develop your own style, arguments, and wording, so avoid copying sources and changing individual words but keeping, essentially, the same sentences and/or structures from other sources. Never give your work to others who may copy it If an individual assessment, develop your own work and preparation, and do not allow anyone to make amends on your work (including proof-readers, who may highlight issues but not edit the work)   When submitting your work, you will be required to confirm that the work is your own, and text-matching software and other methods are routinely used to check submissions against other submissions to the university and internet sources. Details of what constitutes plagiarism and how to avoid it can be found on UWE’s Study Skills pages about avoiding plagiarism. Marks and Feedback Your assessment will be marked according to the marking criteria set out in each task in Appendix 1. You can use these to evaluate your own work before you submit.      Appendix 1 – Assessment Overview This single coursework assessment involves four separate tasks. The requirements for each task are detailed below together with deliverables, submission details and grading criteria. Below is a breakdown of percentage weighting per task: Task % Weighting Programming Task 1 48 Programming Task 2 38 Process Development Report 14 Total 100   Section A. Programming Task 1 This programming task focuses on using Python to calculate a set of Student’s t-test statistics for a given dataset using ONLY built-in functions and data structures. oFor Programming Task 1, you MUST NOT import any Python library functions. This means you cannot use Python modules such as math, SciPy, csv or libraries such as Pandas or NumPy. To print the Student’s t-test statistic for a given pair of Python Lists, it would be very easy to use the ttest_rel() function provided in the SciPy library. However, this programming task is designed to assess your coding abilities and by preventing you from using this function you are forced to gain a deeper understanding of how to complete that task. To do this, you will need to develop your own algorithm. Try typing “calculate Student’s t-test statistic by hand” into your favourite search engine. For your information, a t-test statistic values greater than 1.972 indicates a statistically significant result at a level of 5% (assuming a paired two-tailed test). There is a single data file available in your resit GitHub repository for use in this programming task. The file contains data about the prevalence of mental health disorders in countries around the world in 2017 based on different age groups. oThe data file is called resit_task1.csv. This CSV file includes a header row with multiple named data values.  oThis file is available in the Resit Materials section on Blackboard. Students are expected to follow appropriate coding standards such as code commenting, docstrings, consistent identifier naming, code readability, and appropriate use of data structures. A.1. Requirements ID Requirement Description Marks Available FR1 Develop a function to read a single specified column of data from a CSV file The function should accept two parameters: the data file name and a column number. The column number specifies which of the columns to read. It can range between 0 and n-1 (where n is the number of columns). The function should return two values: the column name and a List containing all the specified column’s data values. You should use the resit_task1.csv data file to test your function but your function should also work for other CSV files. An illustration of this is given in Appendix 2. 6 FR2 Develop a function to read CSV data from a file into memory The resit_task1.csv data file contains several columns of data values. This function should accept a single parameter: the data file name. It should make use of the function developed in FR1 to read all columns of data from the data file and add them to a Dictionary data structure. The Dictionary should contain one entry for each column in the CSV data file. An illustration of this is given in Appendix 3. 6 FR3 Develop a function to calculate a paired Student’s t-test statistic for two lists of data This function should calculate a paired Student’s t-test statistic for two lists of data. The function should take two lists of data (of equal length) as parameters. The function should ensure that the lists are of equal length otherwise raise an error. The function should return the calculated statistic value. 12 FR4 Develop a function to generate a set of paired Student’s t-test statistics for a given data file The function should accept one parameter: the Dictionary data structure generated in FR2. This function should make use of the function developed in FR3 to generate a paired Student’s t-test statistic for every pair of columns in the input data structure parameter. The function should return a list of tuples, each tuple containing the two column names and associated statistic value. An illustration of this is given in Appendix 4. 10 FR5 Develop a function to print a custom table This function should output the paired Student’s t-test statistics for a subset of the column pairs generated in FR4. The function should take three parameters: list of Student’s t-test statistic tuples, border character to use and which columns to include. You should indicate values which are statistically significant values (at the level of 5%) using stars, e.g., *2.43*. High marks will be given for good use of padding in the table cells to improve readability. An illustration of this is given in Appendix 5. 9   A.2. Deliverables A Jupyter Notebook file (in .ipynb format) containing a complete solution to this Programming Task.  oYou must use the template provided[ There is a Jupyter Notebook template available in your GitHub repository - UFCFVQ-15-M_Resit_Programming_Task 1_Template.ipynb]. A.3. Submission You should commit your completed Jupyter Notebook file to your resit GitHub repository with an appropriate commit message. A.4. Grading Criteria Marks are allocated as follows:  oup to 43 marks for the Python code solution Marks will be awarded for each requirement according to the level of completion. To gain high marks you must follow the requirement instruction precisely.  oup to 5 marks for adherence to good coding standards. Section B. Programming Task 2 This programming task focuses on using NumPy/SciPy, Pandas, and Matplotlib/Seaborn to combine and analyse two datasets related to bike sharing in London between 2015 and 2017. Two data files have been provided in your GitHub repository for this task.  oThe resit_task2a.csv data file contains the number of bike shares per hour between January 2015 and January 2017. oThe resit_task2b.csv data file contains the temperature, “feels like” temperature, humidity, wind speed for every hour between 2015 and 2017. Students are expected to follow appropriate coding standards such as code commenting, consistent identifier naming, code readability, and appropriate use of data structures. B.1. Requirements ID Requirement Description Marks Available FR6 Read CSV data from two files and merge it into a single Data Frame For this task you should use the resit_task2a.csv and resit_task2b.csv data files. 4 FR7 Explore the dataset to identify an "interesting" pattern or trend[ An “interesting” pattern or trend might include a correlation between two columns of data, equality of two columns of data or estimating a linear or non-linear relationship between columns of data.] Use an appropriate visualisation tool (such as Matplotlib or Seaborn) to illustrate your exploration. You should include at least three visualisations as part of your exploration. You could consider other ways to explore the data such as data summaries or transformations. You must include an explanation of the dataset exploration, your selected "interesting" pattern or trend and your reasons for selecting it. 10 FR8 Detect and remove any outliers in the data used for your "interesting" pattern or trend Using an appropriate technique to detect and remove any outliers in the data used for your "interesting" pattern or trend. You must include an explanation of the detection method used, how it works, and the any outliers detected. NOTE: there may not be any detectable outliers using the selected detection method – if this is the case, please state this clearly in the explanation given. 6 FR9 Define a hypothesis to test your “interesting” pattern or trend Using an appropriate hypothesis testing formulation to define a hypothesis and provide an explanation for your choices. 6 FR10 Test your hypothesis with statistical significance level of 0.05 Using an appropriate Python library, test the hypothesis stated in FR9. You must include a detailed explanation of your findings to achieve good marks for this task. 7   B.2. Deliverables A Jupyter Notebook file (in .ipynb format) containing a complete solution to this Programming Task.  oYou must use the template provided [ There is a Jupyter Notebook template available in your GitHub repository - UFCFVQ-15-M_Resit_Programming_Task_2_Template.ipynb]. B.3. Submission You should commit your completed Jupyter Notebook file to your GitHub repository with an appropriate commit message. B.4. Grading Criteria Marks are allocated as follows:  oup to 33 marks for the Python code solution Marks will be awarded for each requirement according to the level of completion.  To gain high marks you must follow the requirement instruction precisely.  oup to 5 marks for adherence to good coding standards. Section C. Process Development Report You are expected to identify the strengths/weaknesses of your approach to your coding tasks.  For this coursework, you must write a reflective report which focuses on the process you took to develop a solution to the two programming tasks described in Section A and Section B above. Please reflect on your experiences rather than simply describing what you did.  The report must be split into TWO different sections – one for each programming task.  Each section should:  oinclude an explanation of how you approached the task: describe your thought process. did you find it easy or difficult? Why? what problems did you encounter? How did you overcome them? oidentify any strengths/weaknesses of the approaches used. oconsider how the approaches used could be improved. osuggest alternative approaches that could have been taken instead of the ones you used. C.1. Requirements The development process report MUST be submitted in .docx format – pdf, pages, or any other file format will NOT be accepted for this task. The report must not exceed 800 words. Please indicate the word count at the end of the document. C.2. Deliverables A development process report written in .docx format. C.3. Submission You should commit the report to your GitHub repository with an appropriate commit message. C.4. Grading Criteria There are 14 marks available for the report – 7 marks per section. oMarks will be awarded for appropriate use of technical language, critical reflection on development process and quality of engagement with the reflective process.     Appendix 2 – Example Column Extraction For the following illustration, you should assume that the column number parameter is equal to 1 for the data file. There are 9 columns in this file and so column number can range between 0 and 8. For this data, the function would return two values: “Glucose” and [148,85,183,89,137,116,78,115,197,125,110,168,139]   Appendix 3 – In-Memory Data Structure Using the file illustrated in Appendix 2, the Dictionary produced in FR2 should look something like the illustration below. However, you must ensure that your function can work for any CSV file with a similar structure (such as a file with 5 columns and 100 rows or with 20 columns and 1000 rows). { "Pregnancies" : [6,1,8,1,0,5,3,10,2,8,4,10,10], "Glucose" : [148,85,183,89,137,116,78,115,197,125,110,168,139], "BloodPressure" : [72,66,64,66,40,74,50,0,70,96,92,74,80], "SkinThickness" : [35,29,0,23,35,0,32,0,45,0,0,0,0], "Insulin" : [0,0,0,94,168,0,88,0,543,0,0,0,0], "BMI" : [33.6,26.6,23.3,28.1,43.1,25.6,31,35.3,30.5,0,37.6,38,27.1], "DiabetesPedigreeFunction" : [0.627,0.351,0.672,0.167,2.288,0.201,   0.248,0.134,0.158,0.232,0.191,0.537,1.441], "Age" : [50,31,32,21,33,30,26,29,53,54,30,34,57], "Outcome" : [1,0,1,0,1,0,1,0,1,1,0,1,0] } Appendix 4 – Statistical data based on In-Memory Data Structure  Using the in-memory data structure illustrated in Appendix 3, the List of Tuples produced in FR4 should look something like the illustration below. The full data output is too large to include here and so only some of the data has been included to help illustrate what is required. Remember that different CSV data files will result in different data being stored. The data file you have been provided with does not include any of the data shown below. Don’t be tempted to simply copy the result below into your Jupyter Notebook. [ ("Pregnancies", "Glucose", 0.337), ("Pregnancies", "BloodPressure", -0.0025), ("Pregnancies", "SkinThickness", -0.7481), ("Pregnancies", "Insulin",  -0.4772), ("Pregnancies", "BMI", -0.2313), ("Pregnancies", "DiabetesPedigreeFunction", -0.0872), ("Pregnancies", "Age", 0.3428), ("Pregnancies", "Outcome", 0.0167),   ("Glucose", "Pregnancies", 0.337), ("Glucose", "BloodPressure", 0.1429), ("Glucose", "SkinThickness", -0.0028), ("Glucose", "Insulin", 0.4304), ("Glucose", "BMI", 0.0584), ("Glucose", "DiabetesPedigreeFunction", 0.2192), ("Glucose", "Age", 0.5328), ("Glucose", "Outcome", 0.5465),   +++++++ More data would be included here ++++++++   ("Outcome", "Pregnancies", 0.0167), ("Outcome", "Glucose", 0.5465), ("Outcome", "BloodPressure", 0.0755), ("Outcome", "SkinThickness", 0.3585), ("Outcome", "Insulin", 0.3355), ("Outcome", "BMI", -0.0768), ("Outcome", "DiabetesPedigreeFunction", 0.2185), ("Outcome", "Age", 0.314) ] Appendix 5 – Output table for Statistics Using the output from the function produced in FR4, the following table outputs a subset of the available columns (as defined by the function parameter) using the border character * and padding within the cells to ensure the table is readable:                 ***********************************************                 * Glucose * BloodPressure *   BMI   *   Age   * *************************************************************** *    Glucose    *    -    *     0.1429    *  0.0584 *  0.5328 * * BloodPressure *  0.1429 *       -       * -0.4522 *  0.4194 * *      BMI      *  0.0584 *    -0.4522    *    -    * -0.3847 * *      Age      *  0.5328 *     0.4194    * -0.3847 *    -    * ***************************************************************

标签:function,15,assessment,Programming,should,UFCFVQ,file,data,your
From: https://www.cnblogs.com/qq-99515681/p/18286541

相关文章

  • 内存管理-15-slab、slob和slub分配器-初探
    一、slab简介1.简介首先,“slab”已成为一个通用名称,指的是使用对象缓存的内存分配策略,可实现内核对象的高效分配和释放。它最初由Sun工程师JeffBonwick记录下来,并在Solaris2.4内核中实现。Linux目前为其“slab”分配器提供了三种选择:Slab是最初的分配器,基于Bonwic......
  • 代码随想录day15 平衡二叉树 | 二叉树的所有路径 | 左叶子之和 | 完全二叉树的节点个
    平衡二叉树平衡二叉树解题思路二叉树节点的深度:指从根节点到该节点的最长简单路径边的条数。二叉树节点的高度:指从该节点到叶子节点的最长简单路径边的条数。这道题由于需要求节点的高度差来进行判断,因此我们需要用后序遍历,先左右,后中间。推荐使用递归把每个节点的高度算出来......
  • Linux remoteproc子系统(基于STM32MP157)概览
    remoteproc(RemoteProcessorFramework)用于管理异构远程处理器设备。这些设备通常在非对称多处理(AsymmetricMultiProcessing,AMP)配置中,可能运行不同的操作系统实例,包括Linux或其他实时操作系统的变体。remoteproc框架允许不同平台或架构控制远程处理器(例如,开启电源、加载固件......
  • c++ primer plus 第15章友,异常和其他:15.1.2 友元成员函数
    #c++primerplus第15章友,异常和其他:15.1.2友元成员函数提示:这里可以添加系列文章的所有文章的目录,目录需要自己手动添加例如:15.1.2友元成员函数提示:写完文章后,目录可以自动生成,如何生成可参考右边的帮助文档文章目录前言15.1.2友元成员函数程序清单15.4tvfm......
  • c++ primer plus 第15章友,异常和其他:15.1.3 其他友元关系
    c++primerplus第15章友,异常和其他:15.1.3其他友元关系提示:这里可以添加系列文章的所有文章的目录,目录需要自己手动添加15.1.3其他友元关系提示:写完文章后,目录可以自动生成,如何生成可参考右边的帮助文档文章目录c++primerplus第15章友,异常和其他:15.1.3其他......
  • 代码随想录算法训练营第十三天|今天量大管饱144、145、94、102、107、199、637、429、
    今天来处理二叉树part1、2、3,顶级享受,一次到位。完全二叉树和满二叉树概念没问题。二叉搜索树,左子树所有结点的值小于它的根结点的值,右子树上所有结点的值大于它的根结点的值平衡二叉搜索树,它是一棵空树或它的左右两个子树的高度差的绝对值不超过1。二叉树的存储方式:链式存储......
  • SP15620 POSTERIN - Postering 题解
    题目传送门前置知识单调栈解法容易有每个建筑物的宽度对答案没有影响,故可以将其宽度均看作\(1\)。在最优策略下,对于每张海报,其高度一定等于所覆盖的楼的最小高度。单调栈维护最小高度,记录额外海报数量(与先前高度相等时可以少用一张海报)。最终,用总张数\(n\)减去额外海报......
  • 【ESP32】打造全网最强esp-idf基础教程——15.WiFi连接STA模式
    WiFi连接STA模式一、ESP32的WiFi功能介绍    前面章节内容,基本上都是描述了ESP32强大的MCU能力,这些MCU能力使得ESP32可以替换许多类型的单片机工作,而自己承担这部分功能;当然ESP32的IOT能力才是它的主业,从硬件配置来看,ESP32支持2.4GHz频段WiFi+BT(LE)4.2,而esp-idf对WiFi......
  • 代码随想录算法训练营第七天| 454. 两数相加Ⅱ、383.赎金信、15.三数之和、18.四数之
    454题拆成两块各自匹配化成两个O(n^2)运算1classSolution{2public:3intfourSumCount(vector<int>&nums1,vector<int>&nums2,vector<int>&nums3,vector<int>&nums4){4//四个数组拆分成两块两块5unordered_ma......
  • 代码随想录算法训练营第九天|151.反转字符串中的单词、55.右旋字符串、28.找出字符串
    151以前写过很呆的写法但能用嘿1classSolution{2public:3stringreverseWords(strings){4//初始化变量5vector<vector<int>>data;//存储单词的起始地址和长度6stringans;//最终结果字符串7intnum=0;......