首页 > 其他分享 >南澳大学INFS 2042 Data Structures Advanced Assignment 2 – Contact Tracing

南澳大学INFS 2042 Data Structures Advanced Assignment 2 – Contact Tracing

时间:2024-06-02 20:31:58浏览次数:19  
标签:provided 2042 Structures data person program they Data your

INFS 2042 Data Structures Advanced Assignment 2 – Contact Tracing

INFS 2042 Data Structures Advanced Assignment 2 – Contact Tracing

wechat:help-assignment

1. Introduction

To track and reduce the spread of a disease during an epidemic or pandemic situation it is critical that authorities and health experts can trace who has been in contact with whom, when the contact occurred and where. This is known as contact tracing. Efficiently searching potentially millions of people and where they have been will require an efficient way to store and navigate through the data.
In this assignment, you are tasked with building a basic contact tracing system. You must use your knowledge of data structures and search algorithms to efficiently store and process large quantities of contact tracing data. You are not restricted to the data structures and algorithms explored in this course. You may also make use of structures and algorithms from the Data Structures Essentials course.

1.导言
为了跟踪和减少疫情期间的疾病传播,当局和卫生专家必须追踪哪些人与谁有过接触,接触发生的时间和地点。这被称为接触追踪。要高效地搜索可能有数百万人以及他们去了哪里,就需要有一种高效的方式来存储和浏览数据。
在本作业中,你的任务是建立一个基本的联系人追踪系统,你必须使用你对数据结构和搜索算法的知识来有效地存储和处理大量的联系人追查数据,你并不局限于本课程中所探讨的数据结构和算法,你也可以使用数据结构基础课程中的结构和算法。

  1. Requirements
    Your client has provided you with a strict set of system requirements that the program must meet. How you achieve those requirements and which algorithms or data structures you use are up to you. You must implement the program in Java using OpenJDK 11 or newer. You should also aim to make the program as efficient as possible. For example, exhaustively searching lists in nested loops would not be the most efficient implementation in many cases.
    Generally, it is easier to design with optimisation in mind. When using the following data structures: Binary Search Tree, Self-Balancing Search Tree, Graph, Skip List, Blockchain, Hash Map, Hash Set etc. you must implement the data structure yourself. It is expected that a selection of these structures will be required to meet the client requirements as efficiently as possible.
    You may use provided data structures in Java libraries (such as Linked List, Queue, Stack etc.) only if they are not a part of the content covered in this course to support the implementation of other structures and store data where necessary. Be wary of functions that are built into provided data structures, if you do use them ensure you consider their performance impact.
    You are also required to provide supporting documentation, in this, you must explain each data structure you used, what they were used for and why. This includes cases where you have used Java’s built-in data structures. Consider your implementation in the context of a real contact tracing application. The data provided for this assignment, as described below, is for 40 people, with 80 visits to 6 locations. In a real application we likely have millions of people, with tens or hundreds of millions of visits to hundreds of thousands of locations. Your implementation should be efficient for storage and processing of large amounts of data.
    Remember, it is not enough that your system implements the requirements, it must implement them efficiently.

  2. 所需资源
    您的客户端已经为您提供了一组严格的系统要求,程序必须满足这些要求。如何实现这些要求以及使用哪些算法或数据结构取决于您。您必须使用OpenJDK 11或更高版本在Java中实现该程序。你还应该致力于使程序尽可能高效。例如,在嵌套循环中彻底搜索列表在许多情况下不是最有效的实现。
    一般来说,在设计时考虑优化是比较容易的。当使用以下数据结构:二叉搜索树,自平衡搜索树,图,跳表,区块链,哈希映射,哈希集等等您必须自己实现数据结构。预计将需要对这些结构进行选择,以尽可能有效地满足客户的要求。
    您可以使用Java库中提供的数据结构(如链表、队列、堆栈等)。仅当它们不是本课程涉及的内容的一部分时,才支持其他结构的实现,并在必要的地方存储数据。对于内置在所提供的数据结构中的函数要保持警惕,如果要使用它们,请确保考虑到它们对性能的影响。
    你还需要提供支持文件,在这方面,你必须解释你所使用的每一个数据结构,他们被用来做什么,为什么。这包括使用Java内置数据结构的情况。在实际接触者追踪应用程序的上下文中考虑您的实现。如下所述,为这项任务提供的数据涉及40人,对6个地点进行了80次访问。在一个真实的应用程序中,我们可能有数以百万计的用户,对数十万个位置进行了数千万或数亿次的访问。您的实现应该能够高效地存储和处理大量数据。
    记住,你的系统实现需求是不够的,它必须高效地实现这些需求。

2.1 System Requirements
Below are a set of requirements for the operation of the program as provided by your client.
• The system administrator would like the ability to load existing data from the provided .csv files. The code to read the files is already provided by the client however they have not implemented a method
to store the data.
• In addition, public health officials need the ability to add a new Person, Location or Visit to the data.
The client has provided the input command parsing code to support this however they have not
implemented the functionality.
• Public health officials need the ability to search for a Person by name. This should show them all details
about the person. This includes listing all visits the Person has made.
o Hint: This would require an efficient means of searching for the Person and all Visits in which
the Person has visited any Location.
o If a startDate and endDate are provided, this should also filter the list of Visits to only include
those between these times.
• Public health officials need the ability to search for a Location by name. This should show them all
details about the location. This includes listing all people that have visited the location.
o If a startDate and endDate are provided, this should also filter the list of Visits to only include
those between these times.
• The public health officials would like the ability to produce a list of potential contacts up to (n) levels
away from a given person (including known contacts).
o If n = 1, the list will contain only direct contacts of the given person.
o If n = 2, the list will contain all direct contacts (n=1) of the given person and all contacts of
those contacts (n=2).
o If n=r, the list will contain all n=1 to n=r-1 contacts of the given person and all contacts of those
contacts (n=r).
o Hint: This would require an efficient method of identifying contacts of a given person based
on their visits.
• Public health officials also need the ability to specify if the person is a new Active Case (i.e., they have
become infected with the virus).
o When an Active Case is added, they also need to see an estimation of where, when and from
whom the person likely contracted the virus. Your program should output the most likely contact source including the location and time of contact. Note: The most likely contact source is the pair of people with the highest Chance of Spread © as defined later in this document.
o If a new Active Case has no immediate contacts that are also an Active Case, the program should instead find the nearest or most likely Active Case. That is, the existing Active Case for which each contact between them and the new Active Case have the highest total Chance of Spread ©.
o Hint: This would require a method for identifying the person from which the visit during which the person most likely contracted the virus.
• The public health officials would like to output a trace of the transmission of the virus from its original source to a target person. In this process this trace should ensure the date each person along the path was infected is correct by verifying the start date of their infection is the day after the contact with the highest Chance of Spread ©. In a ‘real world’ data set this would be useful for identifying different branches of the virus as it spreads and tracing the virus back to its original source.
o Hint: this would require a method for tracing the path through each person backwards from the given person until no previous source case can be found (in the provided data).
• The public health officials would like to be able to produce a list of all active cases.
• The program must be robust and user friendly, so it does not crash but print proper messages.
3
2.2 Supporting Documentation
You must provide a document to support your program design and demonstrate your program meets the requirements. This must include:
• One-page summary of your program design and the reasoning behind your design decision.
Explain all data structures and algorithms you used, what they were used for, and your reasoning for selecting them. (e.g., estimate of overall performance, space and time-efficiency)
• Sample outputs from your program. (no page limit)
This is to demonstrate that your program meets the requirements. Provide headings to clarify what requirement does the provided sample output demonstrates.
2.1 系统要求
以下是您的客户提供的一组程序操作要求。
• 系统管理员希望能够从提供的.csv文件加载现有数据。客户端已经提供了读取文件的代码,但是它们尚未实现方法
以存储数据。
• 此外,公共卫生官员需要能够在数据中添加新的人员、位置或访问。
客户端提供了输入命令解析代码来支持此功能,但是他们没有
实现了该功能。
• 公共卫生官员需要能够按姓名搜索一个人。这应该向他们显示所有详细信息
关于这个人。这包括列出此人进行的所有访问。
o 提示:这需要一种有效的方法来搜索该人和所有访问,其中
该人访问过任何地点。
o 如果提供了 startDate 和 endDate,则还应过滤访问列表以仅包括
这些时间之间的那些。
• 公共卫生官员需要能够按名称搜索位置。这应该向他们展示所有内容
有关位置的详细信息。这包括列出所有访问过该地点的人。
o 如果提供了 startDate 和 endDate,则还应过滤访问列表以仅包括
这些时间之间的那些。
• 公共卫生官员希望能够编制一份最高(n)级的潜在接触者名单
远离特定人员(包括已知联系人)。
o 如果 n = 1,则列表将仅包含给定人员的直接联系人。
o 如果 n = 2,则列表将包含给定人员的所有直接联系人 (n=1) 和
这些触点 (n=2)。
o 如果 n=r,则列表将包含给定人员的所有 n=1 到 n=r-1 联系人以及这些联系人的所有联系人
触点 (n=r)。
o 提示:这需要一种有效的方法来识别给定人员的联系人
在他们的访问中。
• 公共卫生官员还需要能够具体说明该人是否是新的活动病例(即他们有
感染病毒)。
o 添加活动案例时,他们还需要查看对地点、时间和来源的估计
该人可能感染了病毒。您的程序应输出最可能的联系人来源,包括联系人的位置和时间。注意:最有可能的接触源是本文档后面定义的传播几率 (C) 最高的一对人。
o 如果新的活动案例没有同时也是活动案例的直接联系人,则程序应查找最近或最有可能的活动案例。也就是说,它们与新的活动案例之间的每个接触都具有最高的总传播机会 (C) 的现有活动案例。
o 提示:这需要一种方法来识别最有可能感染病毒的人。
• 公共卫生官员希望输出病毒从原始来源传播给目标人的痕迹。在此过程中,此跟踪应通过验证其感染的开始日期是与传播机会最高 (C) 接触后的第二天,确保沿途每个人被感染的日期是正确的。在“真实世界”数据集中,这将有助于识别病毒传播时的不同分支,并将病毒追溯到其原始来源。
o 提示:这需要一种方法,从给定的人向后跟踪每个人的路径,直到找不到以前的源案例(在提供的数据中)。
• 公共卫生官员希望能够编制一份所有活跃病例的清单。
•该程序必须健壮且用户友好,因此它不会崩溃,但会打印正确的消息。
3
2.2 支持文档
您必须提供一份文档来支持您的程序设计并证明您的程序符合要求。这必须包括:
• 一页纸的程序设计摘要和设计决策背后的原因。
解释您使用的所有数据结构和算法、它们的用途以及选择它们的原因。(例如,对整体性能、空间和时间效率的估计)
• 程序的示例输出。(无页数限制)
这是为了证明您的程序符合要求。提供标题以阐明提供的示例输出所展示的要求。

3.2 Provided Code
The client has provided the basic interface commands they wish to use to handle the data. You are free to add commands for your testing purposes if you wish, however you must keep the commands listed here the same. The provided base code handles the parsing of these commands and provides some supporting types and functions. It is recommended that you retain the command functionality and build upon it, however you are free to modify the base code however you want/need to meet the requirements. See testfiles/test.txt in the provided code for a set of example commands.
The program is configured with an artificial “CURRENT_DATE” variable that relates to the provided data files. You should use whenever referring to the current date. This is configured by an initialization command in the test files.
For simplicity, a limited data set is provided. A person is only considered infectious if they are currently an active case and only the dates between which they are infections is recorded. All of the data is artificial data that has been procedurally generated. A person is only considered an active case if they have an activeStartDate, and they either don’t have an activeEndDate or the activeEndDate is after the “current date”.
3.2 提供的代码
客户端提供了他们希望使用的基本接口命令来处理数据。如果您愿意,您可以自由添加命令以进行测试,但是您必须保持此处列出的命令相同。提供的基线代码处理这些命令的解析并提供一些支持类型和函数。建议您保留命令功能并在此基础上进行构建,但是您可以自由修改基线代码以满足要求。在提供的代码中,请参见testfiles/test.txt以获取一组示例命令。
该程序配置了一个与提供的数据文件相关的人为“CURRENT_DATE”变量。每当提到当前日期时,您都应该使用它。这是由测试文件中的初始化命令配置的。
为了简便起见,我们只提供有限的数据集。只有当一个人目前是活跃病例时,才会认为他们具有传染性,并且只记录他们被感染之间的日期。所有数据都是人工生成的数据。只有当一个人有活动开始日期时,才会认为他们是活跃病例,他们要么没有活动结束日期,要么活动结束日期在“当前日期”之后。

3.3 Calculating the Chance of Spread
For this assignment, we have an imaginary virus that has a high chance of spreading and becomes detectable and contagious the following day. That is. if John is detected as an active case on 5/1/2021, they must have caught the virus some day before 5/1/2021
For this virus the chance of contact between an active case and another person resulting in a spread to that person is based on the overlap in time spent by two people at a given location, the time since the active case contracted the virus and the incubation time. The chance is the percentage of one hour spent in contact (in the same location).
Let D be the time spent by two people in the same location (in minutes) The Chance of Spread © is:

标签:provided,2042,Structures,data,person,program,they,Data,your
From: https://blog.csdn.net/telnet3000/article/details/139265592

相关文章

  • 【swagger】@ApiImplicitParam注解Integer类型required=true时不能提交、@ApiImplicit
    文章目录一、问题描述二、问题原因三、解决方法3.1、修改dataType=int3.2、修改dataType=Long3.3、修改dataType类型为String类型3.4、当dataType类型为Integer时,删除required=true以下内容基于swagger2.9.2进行讲解<!--swagger2--><dependency> <groupId>io.sp......
  • 【云原生进阶之数据库技术】第二章-Oracle-使用-3.3.2-Oracle Data Guard原理
    2DataGuard原理解析2.1数据同步原理        DG的核心组件包括:主数据库:负责处理所有的写操作,并将这些操作记录在重做日志(RedoLogs)中。备用数据库:可以是物理备用数据库(PhysicalStandby)或逻辑备用数据库(LogicalStandby)。物理备用数据库通常是只读的,而逻辑备用......
  • ProgGen: Generating Named Entity Recognition Datasets Step by step with Self Ref
    本文是LLM系列文章,针对《ProgGen:GeneratingNamedEntityRecognitionDatasetsStepbystepwithSelfReflexiveLargeLanguageModels》的翻译。ProgGen:使用自反射大型语言模型逐步生成命名实体识别数据集摘要1引言2相关工作3方法4实验5结论6局限性......
  • 为bufdataset增加ftVariant支持
    最近使用bufdataset时发现可以创建ftVariant类型,但无法正常读写。经搜索,发现fpc官方早在2011年就有解决方案,但到今天最新的fpc3.3.1还没支持,按issues的方法,经测试读写ok,如果你也需要在bufDataSet要用到ftVariant,可以参考以下方法:打开fpcsrc\packages\fcl-db\src\base\bufdataset.......
  • RxJava和LiveData两者优缺点的详细分析
    RxJava和LiveData两者优缺点的详细分析RxJava和LiveData都是用于处理异步数据流的工具,但它们各自有不同的特点和适用场景。下面是对两者优缺点的详细分析以及如何选择和使用的建议:RxJava优点:功能强大:RxJava提供了丰富的操作符,能够轻松实现复杂的数据流操作。灵活性高:可以用......
  • 使用git报错:error: RPC failed; curl 18 transfer closed with outstanding read data
    今天在使用git下载项目时发生报错:error:RPCfailed;curl18transferclosedwithoutstandingreaddataremainingerror:4790bytesofbodyarestillexpectedfatal:earlyEOFfetch-pack:unexpecteddisconnectwhilereadingsidebandpacketfatal:fetch-pack:in......
  • datasets for stereo depth
    CREdateset#0,1,2,3https://data.megengine.org.cn/research/crestereo/dataset/tree/0.tarhttps://data.megengine.org.cn/research/crestereo/dataset/shapenet/0.tarhttps://data.megengine.org.cn/research/crestereo/dataset/reflective/0.tarhttps://data.megen......
  • 【VTKExamples::PolyData】第五十四期 SelectVisiblePoints
    很高兴在雪易的CSDN遇见你 VTK技术爱好者QQ:870202403   公众号:VTK忠粉前言本文分享VTK样例SelectVisiblePoints,并解析接口vtkSelectVisiblePoints,希望对各位小伙伴有所帮助!感谢各位小伙伴的点赞+关注,小易会继续努力分享,一起进步!你的点赞就是我的动力(^U^)ノ~YO1. ......
  • 论文总结:Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
    目录一、论文摘要二、Grasp-Anything数据集A.场景生成B.抓取姿势标注​编辑C.Grasp-Anything统计D.Grasp-Anything对社区的帮助三、实验A.零样本抓取检测B.机器人评估C.野外抓取检测D.讨论四、总结论文:https://arxiv.org/pdf/2309.09818v1代码:https://......
  • Oracle data link创建
    SELECT*FROMDBA_DB_LINKS;droppublicdatabaselinkDL_AGG_TDM;createpublicdatabaselinkDL206_YSBconnecttoAGG_TDMidentifiedby"T_agg_tdm_3e!Q"using'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.130.4.6)(POR......