首页 > 其他分享 >CSE 158/258 设计与实现

CSE 158/258 设计与实现

时间:2023-11-22 19:48:07浏览次数:61  
标签:files code CSE 158 game will 258 each your

任务定于11月20日星期一完成,但请确保将解决方案上传到排行榜有规律地您应该提交两个文件:writeup.txt对每个任务的解决方案进行简要的纯文本描述;请在 提交截止日期提前;这只是为了帮助我们遵循您的代码,不需要待详细说明。assignment1.py包含解决方案的工作代码的python文件。自动标记器不会执行 您的代码;这个文件是必需的,这样我们就可以在解决方案不正确的情况下分配部分等级,检查抄袭等。你的解决方案应该清楚地记录哪些部分对应 每个任务。我们可能偶尔会运行代码来确认您的输出与提交的答案匹配,因此请确保您的代码生成提交的答案。1以及与您的预测相对应的两个文件:


CSE 158/258, DSC 256, MGTA 461, Fall 2023: Assignment 1
Instructions
In this assignment you will build recommender systems to make predictions related to video game reviews
from Steam.
Submissions will take the form of prediction files uploaded to gradescope, where their test set performance
will be evaluated on a leaderboard. Most of your grade will be determined by ‘absolute’ cutoffs;
the leaderboard ranking will only determine enough of your assignment grade to make the
assignment FUN.
The assignment is due Monday, Nov 20, though make sure you upload solutions to the leaderboard
regularly.
You should submit two files:
writeup.txt a brief, plain-text description of your solutions to each task; please prepare this adequately in
advance of the submission deadline; this is only intended to help us follow your code and does not need
to be detailed.
assignment1.py A python file containing working code for your solutions. The autograder will not execute
your code; this file is required so that we can assign partial grades in the event of incorrect solutions,
check for plagiarism, etc. Your solution should clearly document which sections correspond to
each task. We may occasionally run code to confirm that your outputs match submitted answers, so
please ensure that your code generates the submitted answers.1
Along with two files corresponding to your predictions:
predictions Played.csv, predictions Hours.csv Files containing your predictions for each (test) instance
(you should submit two of the above three files). The provided baseline code demonstrates how to
generate valid output files.
To begin, download the files for this assignment from:
https://cseweb.ucsd.edu/classes/fa23/cse258-a/files/assignment1.tar.gz
Files
train.json.gz 175,000 instances to be used for training. This data should be used for both the ‘play prediction’
and ‘time played prediction’ tasks. It is not necessary to use all observations for training, for example if
doing so proves too computationally intensive.
userID The ID of the user. This is a hashed user identifier from Steam.
gameID The ID of the game. This is a hashed game identifier from Steam.
text Text of the user’s review of the game.
date Date when the review was entered.
hours How many hours the user played the game.
hours transformed log2
(hours+1). This transformed value is the one we are trying to predict.
pairs Played.csv Pairs on which you are to predict whether a game was played.
pairs Hours.csv Pairs (userIDs and gameIDs) on which you are to predict time played..
baselines.py A simple baseline for each task, described below.
Please do not try to collect these reviews from Steam, or to reverse-engineer the hashing function I used to
anonymize the data. Doing so will not be easier than successfully completing the assignment. We will run
the code of any solution suspected of violating the competition rules, and you may be penalized
if your code does produce your submitted solution.
1Don’t worry too much about dependencies if importing non-standard libraries.
1
Tasks
You are expected to complete the following tasks:
Play prediction Predict given a (user,game) pair from ‘pairs Played.csv’ whether the user would play the
game (0 or 1). Accuracy will be measured in terms of the categorization accuracy (fraction of correct
predictions). The test set has been constructed such that exactly 50% of the pairs correspond to played
games and the other 50% do not.
Time played prediction Predict how long a person will play a game (transformed as log2
(hours + 1), for
those (user,game) pairs in ‘pairs Hours.csv’. Accuracy will be measured in terms of the mean-squared
error (MSE).
A competition page has been set up on Kaggle to keep track of your results compared to those of other
members of the class. The leaderboard will show your results on half of the test data, but your ultimate score
will depend on your predictions across the whole dataset.
Grading and Evaluation
This assignment is worth 22% of your grade. You will be graded on the following aspects. Each of the two
tasks is worth 10 marks (i.e., 10% of your grade), plus 2 marks for the written report.
• Your ability to obtain a solution which outperforms the leaderboard baselines on the unseen portion of
the test data (5 marks for each task). Obtaining full marks requires a solution which is substantially
better than baseline performance.
• Your ranking for each of the tasks compared to other students in the class (3 marks for each task).
• Obtain a solution which outperforms the baselines on the seen portion of the test data (i.e., the leaderboard). This is a consolation prize in case you overfit to the leaderboard. (2 mark for each task).
Finally, your written report should describe the approaches you took to each of the tasks. To obtain good
performance, you should not need to invent new approaches (though you are more than welcome to!) but
rather you will be graded based on your decision to apply reasonable approaches to each of the given tasks (2
marks total).
Baselines
Simple baselines have been provided for each of the tasks. These are included in ‘baselines.py’ among the files
above. They are mostly intended to demonstrate how the data is processed and prepared for submission to
Gradescope. These baselines operate as follows:
Play prediction Find the most popular games that account for 50% of interactions in the training data.
Return ‘1’ whenever such a game is seen at test time, ‘0’ otherwise.
Time played prediction Return the global average time, or the user’s average if we have seen them before
in the training data.
Running ‘baselines.py’ produces files containing predicted outputs (these outputs can be uploaded to Gradescope). Your submission files should have the same format.

标签:files,code,CSE,158,game,will,258,each,your
From: https://www.cnblogs.com/whenjava/p/17850119.html

相关文章

  • ECSE 4670 计算机通信网络
    在这个由多部分组成的任务中,我们最终将构建一个简单但可靠的文件传输UDP上的应用程序。然而,为了实现这一目标,我们将首先实施在分配的A部分中,通过UDP“ping”应用程序。此UDPPing器应用程序将帮助您熟悉UDP套接字并开发一个简单的请求￾超时的响应协议。在作业的B部分,我们将使用这......
  • CSE 167 3DOpenGL 开发
    我们将在本作业中开发一个用于检查3D模型的交互式界面。正如您可能从以前的家庭作业中了解到的那样,渲染需要在数百万像素和数十亿三角形。这会给性能带来重大挑战,尤其是在我们希望与内容实时交互。为了让事情变得更快,计算机图形学的先驱们提出了使用特定领域硬件加速渲染的解决......
  • 【主流技术】详解 Spring Boot 2.7.x 集成 ElasticSearch7.x 全过程(二)
    目录前言一、添加依赖二、yml配置三、注入依赖四、CRUD常用APIES实体类documents操作常见条件查询(重点)分页查询排序构造查询测试调用五、文章小结前言ElasticSearch简称es,是一个开源的高扩展的分布式全文检索引擎,目前最新版本已经到了8.11.x了。它可以近乎实时的存储、......
  • ElasticSearch之安装
    参照InstallingElasticsearch,完成验证集群的部署。操作步骤下载软件包和摘要文件。wgethttps://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.1-linux-x86_64.tar.gzwgethttps://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.......
  • 如何确定Elasticsearch的副本和分片设置
    Elasticsearch是一个开源的分布式搜索和分析引擎,它使用分片和副本来实现数据的分布式存储和高可用性。在配置Elasticsearch的副本和分片时,需要考虑数据的大小、查询负载、硬件资源等多个因素。本文将详细介绍如何确定Elasticsearch的副本和分片设置。分片和副本的概念在Elasticsear......
  • Elasticsearch 系列(二)- ES的基本概念
    本章将和大家分享Elasticsearch的一些基本概念。话不多说,下面我们直接进入主题。一、什么是LuceneLucene是Apache的开源搜索引擎类库,提供了搜索引擎的核心API。1、Lucene的优势:易扩展、高性能(基于倒排索引)2、Lucene的缺点:只限于Java语言开发、学习曲线陡峭、不支持水平扩展......
  • Node.js精进(12)——ElasticSearch
    ElasticSearch(简称ES)是一款基于Lucene的分布式、可扩展、RESTful风格的全文检索和数据分析引擎,擅长实时处理PB级别的数据。一、基本概念1)LuceneLucene是一款开源免费、成熟权威、高性能的全文检索库,是ES实现全文检索的核心基础,而检索的关键正是倒排索引。2)倒......
  • Elasticsearch入门
    1、什么是Elasticsearch?Elasticsearch是基于Lucene的Restful的分布式实时全文搜索引擎,每个字段都被索引并可被搜索,可以快速存储、搜索、分析海量的数据。全文检索是指对每一个词建立一个索引,指明该词在文章中出现的次数和位置。当查询时,根据事先建立的索引进行查找,并将查找......
  • org.elasticsearch.client.transport.NoNodeAvailableException: None of the configu
    org.elasticsearch.client.transport.NoNodeAvailableException:Noneoftheconfigurednodesareavailableelasticsearch有两个端口:http_port和transport.tcp.port①http_port是ES节点与外部通讯使用的端口。它是http协议的RESTful接口(各种CRUD操作都是走的该端口)默认9200......
  • 统一日志管理方案:Spring项目logback日志与logstash和Elasticsearch整合
    原创/朱季谦 最近在做一个将分布式系统的日志数据通过logstash传到kafka的功能,做完之后决定业余搭一个ELK日志分析系统,将logstash采集到的日志传给Elasticsearch。经过一番捣鼓,也把这个过程给走通了,于是写了这篇总结,可按照以下步骤搭建logstash采集spring日志数据并传输给Elastics......