首页 > 其他分享 >COMP5310开发预测模型

COMP5310开发预测模型

时间:2023-05-03 13:22:22浏览次数:34  
标签:COMP5310 group 预测 模型 dataset section report model your


COMP5310 Project Stage 2B
Develop and Evaluate Predictive Model
Due: 11:59pm on 14th of May 2023 (end of Week 11)
Value: 15% of the unit

This stage is usually done with the same group members as you worked with for Stage 2A.
However, under exceptional circumstances an alternative group may be created by the unit
coordinator when a group is reduced in size due to member discontinuing this unit. If this
applies to you, please urgently email [email protected] to discuss this.

DISPUTE RESOLUTION
If, during the course of the assignment work, there is a dispute among group members that
you can’t resolve, or that will impact your group’s capacity to complete the task well, you
need to inform the unit coordinator, [email protected]. Make sure that your
email includes your group number and tutorial session, and is explicit about the difficulty.
Also, make sure this email is copied to your tutor and all the members of the group (including
anyone you are complaining about). We need to know about problems in time to help fix
them and deal with non-performance promptly (don’t wait until a few days before the work
is due to complain that someone is not delivering on their tasks). If necessary, the unit
coordinator will split a group, and leave anyone who didn’t participate effectively in a group
by themselves (they will need to achieve all the outcomes on their own). This option is only
available up until Monday May 1st, which is the last day with time to resolve the issue before
the due date. For any group issues that arise after this time, you will need to try to resolve
the problem on your own, and you will continue to be treated as a single group. If someone
doesn’t provide the material required for the report, or their material is not of the agreed
standard, you should still have the report show what that person did. Their section of the
report may be empty if they don’t produce anything, or it may have material but not enough.
In such cases, please put a “Note to marker” on the front page of the report, which describes
the circumstances. That way, we can consider how best to apply the marking scheme. Note
that it is not expected or sensible for other members to do the work that someone failed to
deliver.

TASKS
GROUP TASKS:
1. Identify an attribute that you will all make predictions about and find a dataset that
contains this attribute. The attribute you are predicting may be quantitative or nominal.
The dataset may be one from the previous stages of this project.
2. Decide on the measure of success for the predictive models you will be producing. You
will need to justify your choice of measure and describe its strengths and limitations.
3. Divide the dataset into a training set and a test set. We suggest having at least one-
tenth of the original dataset in the test dataset.
4. Coordinate in choosing the methods you will use, to each produce a predictive model for
this attribute, using the training dataset (the coordination is needed to avoid duplication
between members, and to enable a good conclusion for your report).
5. Write Part B of the report, that discusses the different models and their strengths and
weaknesses. This should be written for a reader who is interested in your research or
business question.
Page 2 of 5

Note: The models created in this Stage must ALL be predicting (in different ways) one
common attribute in the one common dataset. You are allowed to use a dataset you already
have from Stage 1 or 2A, but you are equally free to change dataset and even domain, however,
keep in mind that many machine learning techniques do not work well unless the dataset is
large enough and quite clean. We recommend that you do some preliminary data analysis to
convince yourself that there is some relationship between the other attributes and the one you
are going to predict (otherwise predictions will not be very effective). You also need to choose
how you will measure the effectiveness of predicting. We recommend that you use one of the
measures that is built-in for scikit-learn to calculate, given the test data and the predictions
made for those items. For higher levels than pass, you need more than one measure that you
will calculate on each model.

INDIVIDUAL TASKS:
1. Use Python (for example, the scikit-learn library) to produce a predictive model for the
chosen attribute from the training dataset, using the kind of model and training method
allocated to you by the group. If your method for training has hyper-parameters, you
should adjust them as well as possible, but only using parts of the training dataset in
doing so (you must not use any of the test dataset for this).
2. Evaluate the quality of the predictive model you produced, in terms of the measure of
success that the group chose.
3. Write your section in Part A of the report, in which you present the work you have done
individually.

WHAT TO SUBMIT
There are TWO deliverables in this stage of the project, and both should be submitted by
ONE PERSON on behalf of the whole group.
1. A written report on your work, as a PDF document. There is a maximum length for the
report of 2500 words for groups of 2 and 3000 words for groups of 3. The report
should have a front page, that gives the group name and lists the members involved
(giving their SID and unikey, not their name), and then the body of the report has a
structure as follows (this corresponds to the marking scheme):
Part A: It should be targeted at a tutor or lecturer whose goal is to see what you
achieved, so they can allocate a mark. In this section you must:
a. State your research or business question.
b. State the domain and the dataset you are using.
c. Indicate how you split your dataset into training and test data.
d. Then, there should be one section for each member (the section should state the
SID/unikey of the group member who did the work reported in this section). In
this section, there should be the following sub-sections:
o A description of the way you produced the predictive model, including the
Python code you wrote that produces the model and any pre-processing
(e.g., rescaling some attributes). If possible, you should also give the
predictive model itself (e.g., for a linear regression, you would report what
coefficients each attribute has in the model; for a decision tree you would
state the different decision points).
o The evaluation of how well your predictive model does in predicting. This
must include the Python code you wrote that calculates some measure of
effectiveness (on the test data), as well as stating the actual value of this
measure for your predictive model. For higher marks, textual discussion
is also needed (see the mark scheme below). For example, you may
consider using significance testing, confidence intervals, regression r-
Page 3 of 5

square, clustering V-measure, classification f1-score, etc.

Part B: Targeted at someone who is interested in your research or business question,
and wants to understand how well various machine learning approaches work for
producing predictive models in the context of your research or business question. This
part is written as a group, and you must:
a. Describe the different ways the members produced predictive members.
b. Comment on the evaluations to draw conclusions about the strengths and
limitations of the different approaches, tying this back to your business question
(see the marking scheme for more guidance on what is expected here).

2. The code and dataset you used to produce your predictive model and calculate some
measure of effectiveness of the model. If you have done any further transforms on
attributes before training/testing, this code should also be included. The code should be
submitted as a single zip or tar.gz file which contains a subfolder for each group
member.

MARKING
Here is the mark scheme for this assignment. The score (out of five) is the sum of separate
scores for each of three components. Note that there is an individual and a group component
to each member’s mark.

Predictive Models [3 points] [Individual Mark]
This component is assessed based on the corresponding subsection of the separate member
section in Part A of the report; the uploaded data and code may be checked by the marker as
supporting evidence for claims made in the report.

[Full marks]: The Distinction criteria holds, and also there is a clear explanation of any
method that is not presented in the tutorials, including an argument for why this is a
reasonable approach to consider for the task (this discussion should go well beyond simply
reporting that the model predicts well, to argue that one could reasonably hope that it might
be good, in several ways).

[Distinction]: The Pass criteria holds, and also at least one of the methods used must go
beyond what is covered in the tutorials.

[Pass]: The group member uses Python and the agreed training dataset and correctly
produces a predictive model for the agreed attribute. The code that each member wrote to
produce their model (including doing any preliminary attribute transformations) must be
explicitly shown in the report. The ways in which the various members’ models are produced
should all be different from one another (this could be different algorithmic training
techniques, different choice of hyper-parameters, different scaling, or choice of input
attributes, etc.).

[Flawed]: Some predictive model is produced using Python.

Evaluation of Predictive Models [4 points] [Individual Mark]
This component is assessed based on the corresponding sub-section of the separate member
section in Part A of the report. The uploaded data and code may be checked by the marker as
supporting evidence for claims made in the report.

Page 4 of 5

[Full marks]: The Distinction criteria holds, and also, for each approach, there is a reasonable
discussion relating the outcome of the measurements to the nature of the training approach,
characteristics of the dataset and any transformations done.

[Distinction]: The group member has correctly reported on more than one measure of
performance of the model on the test dataset. The code that does this measurement must be
explicitly shown in the report. Also, for each approach there is a sensible discussion of the
interpretation of the measurements (for example, whether it is indicating overfitting or
underfitting, whether the accuracy/precision/recall/F1 score differs between different
classes in your data).

[Pass]: The group member has correctly reported on some measure of performance of the
model on the test dataset. The code that does this measurement must be explicitly shown in
the report. The ways in which the various members’ models are produced should all be
different from one another (this could be different algorithmic training techniques, different
choice of hyper-parameters, different scaling or choice of input attributes, etc).

[Flawed]: Some reasonable attempt to evaluate the effectiveness of a predictive model.

Discussion [7 points] [Group Mark]
This component is assessed based on Part B of the report. Material in Part A, or the submitted
data and code may be checked by the marker as supporting evidence for claims made in this
part of the report.

[Full marks]: The Discussion section meets the Distinction criteria and suggests at least one
reasonable improvement that can be made to each member’s predictive model. The structure
needs to be logical and well-organised.

[Distinction]: The Discussion section provides some accurate and clear information about the
different machine learning methods that were used for this task, and provides useful insight
into strengths and weaknesses of the different machine learning methods for answering the
business or research question. It also indicates features of the dataset that impact on the
outcomes. It also discusses honestly and with insight, the strengths, limitations and
uncertainties about the comparisons made between different machine learning techniques
(for example, what are strengths and limitations of the measurements which were used).

[Pass]: The Discussion section provides some accurate and clear information about the
machine learning techniques that were used for this task, and how the resulting predictive
models performed.

[Flawed]: The Discussion section describes the machine learning techniques that were used.

Conclusion [1 point] [Group Mark]
This component is assessed based on Part B (group component) of the report. Material in
Part A, or the submitted data and code, may be checked by the marker as supporting evidence
for claims made in the report.

[Full marks]: The Conclusion section meets the Distinction criteria and makes reasonable
suggestions for future work on your analysis and predictive models that can help achieve the
recommended course of action.

[Distinction]: In addition to the Pass criteria, the Conclusion section describes the extent of
Page 5 of 5

support for this course of action, based on the information in the Discussion section,
identifying what risks, limitations and caveats apply.

[Pass]: The Conclusion section describes a recommended course of action in relation to your
research or business question, that is supported by the information in the Discussion section.

[Flawed]: The Conclusion section describes a recommended course of action in relation to
your research or business question.

Penalties
10% of the overall mark will be deducted if your report is unnecessarily longwinded and
does not address the marking criteria within the word limits.

Late Work
As announced in the unit outline, late work (without approved special consideration or other
arrangements) suffers a penalty of 5% of the maximum marks, for each calendar day after
the due date. No late work will be accepted more than 10 calendar days after the due date.

   WX:codehelp

标签:COMP5310,group,预测,模型,dataset,section,report,model,your
From: https://www.cnblogs.com/tongu1/p/17368956.html

相关文章

  • 余弦相似度算法进行客户流失分类预测
    余弦相似性是一种用于计算两个向量之间相似度的方法,常被用于文本分类和信息检索领域。具体来说,假设有两个向量A和B,它们的余弦相似度可以通过以下公式计算:其中,dot_product(A,B)表示向量A和B的点积,norm(A)和norm(B)分别表示向量A和B的范数。如果A和B越相似,它们的余弦相似度就越接......
  • 如何设计一个轻量化网络模型
    要设计一个轻量化网络模型,并具备强大的特征提取与语义理解能力,可以采用以下策略:使用较少的卷积层和全连接层,减少模型的参数数量和计算量;使用卷积层进行特征提取,使用全局池化层进行特征整合;加入注意力机制,提升模型的语义理解能力;使用残差连接,增强模型的稳定性和泛化能力;对模......
  • pytorch模型降低计算成本和计算量
    下面是如何使用PyTorch降低计算成本和计算量的一些方法:压缩模型:使用模型压缩技术,如剪枝、量化和哈希等方法,来减小模型的大小和复杂度,从而降低计算量和运行成本。分布式训练:使用多台机器进行分布式训练,可以将模型训练时间大大缩短,提高训练效率,同时还可以降低成本。硬件加......
  • 4隐马尔可夫模型与序列标注
    4隐马尔可夫模型与序列标注序列标注问题•序列标注(tagging)指的是给定一个序列x=x_1x_2…x_n,找出序列中每个元素对应标签y=y_1y_2…y_n的问题其中,y所有可能的取值集合称为标注集(tagset)序列标注与中文分词考虑一个字符序列x,想象切词器真的是拿刀切割字符串。那么每个字符在分......
  • 数字三角形模型
    数字三角形模型给定一个如下图所示的数字三角形,从顶部出发,在每一结点可以选择移动至其左下方的结点或移动至其右下方的结点,一直走到底层,要求找出一条路径,使路径上的数字的和最大。状态表示:\(f[i][j]\)代表从\((1,1)\)到\((i,j)\)的路径和最大值状态属性:\(MAX\)状态计算:\((......
  • Linux的IO模型
    一、基本概念五种IO模型包括:阻塞IO、非阻塞IO、IO多路复用、信号驱动IO、异步IO。首先需要了解下系统调用的几个函数和基本概念。1.1简单介绍几个系统调用函数由于我对于C语言不熟悉,几个系统函数参考了一些文章,如果错误欢迎指出!recvfromLinux系统提供给用户用于接收网络IO的系统接......
  • 中台设计- 业务中台设计模型(5/5)
    中心三层模型0级功能架构图0级技术架构图0级业务数据流图1级功能架构图业务建模功能需求汇总功能抽象汇总系统应用一级功能二级功能功能说明业务领域业务实体领域能力订单管理订单列表订单列表交易域订单生成订单订单发货库存域面单查询面单......
  • 软件工程师能力模型探讨
    软件工程师能力模型探讨高级JAVA工程师通用技能ExpertJavaknowledge  JAVA知识专家级Object-OrientedDesignPatterns  面向对象与设计模式High-leveldesignskills  高层模块设计Designingforspecificrequirements(e.g.security,scalability,optimization) ......
  • 中通快递财报预测:中通快递2023年收入和利润将大幅下降
    市场对中通快递2023年的预测卖方虽然预测中通快递(ZTO)在2023年的表现会很不错,但他们也预计中通快递今年的财务业绩将不会像去年那样好。根据S&PCapitalIQ的数据,卖方预计中通快递2023财年的收入增长率将从2021财年的20.6%和2022财年的17.4%下降到本年度的15.8%(以人民币计算)。与此......
  • 什么是Auto GPT-4? OpenAI 最新语言模型概览
    动动发财的小手,点个赞吧!人工智能正在快速发展,近年来最令人兴奋的发展之一是创建可以生成类似人类文本的语言模型。领先的人工智能研究机构OpenAI最近发布了其最新的语言模型AutoGPT-4。在什么是AutoGPT-4?OpenAI最新语言模型概述一文,我们将概述什么是AutoGPT-4、Auto......