首页 > 其他分享 >ARS Reinforcement Learning using Gymnasium

ARS Reinforcement Learning using Gymnasium

时间:2024-11-20 18:41:00浏览次数:1  
标签:ARS Learning task gaps questions report using your page

ARS - Coursework Guide – 24/25

Version History 1.0

29/09/24First version.1.1

Fleshed out marking criteria for task 2 reportSummary Title:Reinforcement Learning using Gymnasium environments

Hand-in:Programs AND a written report will need to be submitted online via Moodle. Checkthe module’s Moodle page for the precise deadline.

Late policy:The coursework deadlines (task 1 and task 2) are absolute. Late submissions aresubject to a 5% deduction of the overall coursework mark per day.

Informal Description The coursework consists of two tasks as described below. Your aim is to build several reinforcementlearning agents and to design, implement and un several basic research-based experiments. Youwill hand-in software and a report that discusses your work on these tasks. Briefly, task 1 is aboutimplementing some asic RL prototypes (with noise injection and basic modularity) for your chosenenvironment(s) and identification of key literature, gaps, and research questions, whereas task 2 iabout designing, developing and running experiments based on the research questions identified intask 1.

ims and Outcomes

  • If you take the labs seriously, at the end of the semester you should be:o comfortable with implementing and modifying reinforcement learning agents,o capable of adapting your RL solutions to different kinds of robotic problems withwelldefined states, actions and rewardso comfortable with neural network approaches for the mapping of complex highdimensional states to actions (if you choose to use neural network based Rsolutions),o comfortable with setting up experiments pertaining to noise and studying andmitigating its impact, comfortable with designing modular AI solutions,o capable of scanning the literature in order to understand modernRL techniques, andincorporating/extending these in your own solutions,o capable of identifying gaps, and/or weaknesses/limitations in state-of-theartresearch, and using this to define research 代 写ARS Reinforcement Learning using Gymnasium questions for guiding your research,o capable of studying and evaluating algorithm performance objectively,o capable of designing innovative algorithms and experiments, and reporting theresults of these in a clear and well-structured manner.Rough Timetable Laboratory notesYou will work individually.
  • We need to start working hard from the very first day to make the most of the lab sessions.In the first week you will learn the basics of Gymnasium, will experiment with severalenvironments, and will even try some small heuristics on simple control problems(e.g.cartpole).Rough time estimation:o Total hours: 20 credits ≈ 200 hourso Subtract lectures (22 hours) and labs (20 hours) = 200 – 42 = 158 ivide the remainder by 12 weeks = 158 / 12 ≈ 13 hours per week for everythingelse, e.g.: studying, researching, reading, thinking, coding, testing, analyzing, writing.Getting Started Preliminary steps
  • Check the following three main Gymnasium resources:o Farama’s general documentation page for Gymnasium.o Basic usage page in the above documentation.o Gymnasium GitHub page – includes installation instructions.Install Gymnasium.For the purpose of the coursework it is sufficient to work with the “classic control” set ofenvironments, however do feel free to install and use other categories of environments (e.g.

MuJoCo and Atari), if you wish.Go through the Basic Usage page.

You can install Gym on your own machines, or in your local directory in UNM’s HPC, or youcan also use Google Colaboratory. Please note that in the past there were ways to renderenvironments properly in Colab (e.g. have a look at this tutorial) however this may changefrom time to time. For an example of a Jupyter notebook for the cartpole example, refer to the module’s Moodle page. I suggest not bothering with rendering, except for someexercises, since performance metrics are the key concern.mentioned, if you want to use any of the MuJoCo environments you can. Deep Mind

recently bought MuJoCo and made it open source, which means there are no more licensingissues. You are not required to use MuJoCo, but if you really want to, you are t,and get the environments setup.To see what environments are available use:mport gymnasium as gymprint(gym.envs.registry.keys())To better understand someGymnasium environments consult this Wiki or scroll to“environments” in the Gymnasium’s GitHub page, and search for your environment. Forexample for the cart pole environment have a look at this page.

ry to come up with some heuristic solutions for Cart Pole

Try to come up with some simple heuristics to keep the pole up based on yourunderstanding of the environment. You can start from and modify the (failing) heuristicexample provided in the Moodle page (i.e. sol-H1-cart-pole-v0).

Difficult? Let's see whether reinforcement learning helps.Have a look at a Q-learning solutioExample: s1cart-pole-v0-sol1.Try to run the code.

  • Read the code. Try to understand it as much as possible, although note, it will only fullymake sense once we have done Q-Learning in the lectures.Task Description
  • Requirements for Task 1:

o Title. Prototypes, literature, gaps, and research questions.o Prototypes: Environment selection. Select two environments to work on throughoutthe whole assignment. Select one environment from within the controlcategory (e.g. CartPolev1) and one environment from any category(including the controlone).Please recall that different environmentsmay impose significant changes to your reinforcement learning

 since, for example, they may involve continual action spaces,

or other representational differences. To simplify matters you might

want to constrain yourself to environments with discrete action spaces.

Core method required: reinforcement learning. If you want to use other

methods for other integrated modules, that is fine.▪ Additional requirements: (1) noise injection at the inputs and/or

outputs, (2) some modularity (e.g. RL component and denoisingcomponent).▪ Aim: for each environment develop at least one viable proof of conceptbased on RL.o Literature: Steps:Explore the recent RL literature in relation to the topic of noisepaperswillbeyour “core/seed” papers, you should still study the literaturemore broadly(i.e. your report should citeotherpapers apartom the core papers).Select your gaps for further investigation. Justify your choices.

  • Design at least 2 research questions based on your selectedgaps.▪ Aim: clearly outline 1-3 selected papers, overall gaps, selected gaps, andresearch questions. Note that it is crucial for the papers, gaps andresearch questions to be 100% credible, i.e.: (1) the papers must berecent and good, (2) the gaps must be genuine open problems, and (3)the research questions must sit squarely in the gaps andmustpoint inuseful directions.▪ Constraint 1: Every student must have a different set of core papersand/or a different set of gaps and/or a different set of researchquestions (RQs). Once a student has defined their selected papers, gaps,nd RQs, they must email them to me, in order for me to check andpprove them. Please note that this processwill operate on a “firstcome first served” basis. Please also note that if two students share thesame papers, they can stillbedifferent in terms of the chosen gaps orRQs, however, it is preferable if all elements are distinct.▪ Constraint 2: The selected research questions must include, or focus on,

(Requirements for Task 2:o Title. Research questions and experiments.o Environment selection. You must use the same two environment you selected

for task 1.

o Core method required: reinforcement learning. As before, if you want to use

other methods for other integrated modules, that is fine.

o Goals. Keywords: novel experiments and insights. The aim of this task is for yoo design, develop, run, and analyze, experiments that address the researchquestions your listed in task 1. The mains tasks would be: (1) design experimentsassessanswered the research questions, (6) eitherproceed backto step 1 with adjustments to the experiments/solutions, orproceed with additional experiments (depending on ime and completionstatus). Document your findings.Requirements for all tasks (i.e. tasks 1 and 2): o Performance. Define one or more valid performance measures, apart from thedefault/compulsory one, i.e.: the average number of episodes needed before

 a problem (see below for more information).

o Evaluation. Run your experiments and report your results for both of your

chosen environments consistently.

o Four I’s. Try to maximize your work along the following dimensions: (1)informedness (i.e. it is based on a solid understanding of the literature), (2)innovativeness (i.e.novel), (3) inventiveness (i.e. not technically trivial), (4)impactfulness (e.g. generates new knowledge).o Core themes. The core themes for both tasks are: (1) reinforcement learning, (2)noise, (3) modularity. Please note that the research questions can be exclusivelyaboutnoise, or modularity, or both, however, the models must always includeelements of noise and modularity.

  • Demo. Show and explain the performance of your solutions, and the results of yourexperiments.Performance Evaluation
  • you will be injecting noise into your sensor data and/or actions, your results are directly comparable to solutions on external leaderboards (e.g.:https://github.com/openai/gym/wiki/Leaderboard). Your focus will be on internalcomparisons (i.e. your own experimental conditions) and innovation.One key performance measure that you should recall is the number of episodes requiredbefore solving the problem. In other words, here you are interested in the speedoflearning. Care must be taken in being explicit and consistent regarding what constituteshaving solved theproblem.Assessment – OverallComponentMarks (100) Description

Main Criteria Task 1 - demo5mo of work sopages)summarizing task1Are the core papers (1-3) well explained? Are the overall gapswell identified and explained? Are the selected gapsjustifiedproperly? Are the research questions grounded in the gaps,and are they clear, concrete, and heading in the rightdirection?Task 2 - demo5Demo of work sofar.Evidence ofunderstanding of the base code. Good explanationof gaps, question, experimental design, results, analyses, andconclusions. Solid argumentation vis-à-vis the 4 I’s. Strongjustifications and arguments. Clear communication.Task 2 - paper50Mini-conferencepaper (4 pages)summarizing all ofthe work done onboth tasks.Are the structure, grammar and argumentation of thepaper/report good? Are the introduction,background,methods, results and analyses, clear, comprehensive andinsightful? Does the paper show critical and creativethinking?Task 2 - software20Multiple filesorganized with aclear structure.Is the code complete? Is the code well-designed, clean,elegant, and well commented? Is the codecomplex/challengingenough?Assessment Criteria for theReport (task 1) and Paper (task 2) 1st an excellent, well-written report/paper demonstrating extensive understanding andgood insight.2:1 a comprehensive, well-written report/paper demonstrating thorough understanding andsome insight2:2 a competent report/paper demonstrating good understanding of the implementation.

3rd an adequate report/paper covering all specified topics at a basic level of understanding.

  • F an inadequate report/paper failing to cover the specified topics.Report guide (task 1)
  • The report for task 1 has no fixed format, as long as it is well structured and well organized.The only constraint is that it should be 1-2 pages long. No appendices areallowed, and to befair to all, no material on page 3 onwards (if you exceed 2 pages) will be included in theassessment. The font size of the main text should not be smaller than 11.

This report will exclusively focus on: (1) a very brief summary of your prototypes, (2) briefsummaries of your selected core papers, and why they were chosen, (3) lengthier explanations on the weaknesses/gaps of the papers, (4) an explanation and justification ofyour selected gaps, and (5) an explanation and justification ofyourresearch questions, andhow they are grounded in the gaps.

Paper Guide (task 2) You should design your final report as a conference paper. The paper should contain:

[8 marks] Introduction (about 1 page). Brief explanation of the motivation and mainconcepts, a problem statement, an extremely brief overview of the key papersand theirgaps, the research questions, and a brief summary of your main contributions. Key marking : (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation,(5) Insightfulness, (6) Critical and creative thinking[8 marks] Background (about 0.5 pages). Brief overview of the field and the key papersclosely related to your work (this will include the core 1-3 papers and other relevant papers). core selected papers with their gaps, and why there were chosen selected, must beclearly explained. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.

  • [8 marks] Methods (about 1 page). A detailed and concise description of how yomplemented task 2 (e.g. algorithms and experimental design). Key marking criteria: (1)Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation.

[10 marks] Results (about 1 page). An overview of your key results encompassinperformance measures and other results leading to insights about the problem and/or yoursolutions. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness.

[10 marks] Discussion (about 0.5 pages). Your interpretation of the results, your conclusions,and proposed future work. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.

[6 marks] References & Appendices (not included in the word count). Key marking criteria:(1) Consistency of references, (2) Comprehensiveness of references, (3) Structure and clarity appendices, (4) Insightfulness of appendices.Note: Writing a concise report/paper is a core part of the assignment. The total number of pages for paper (i.e. main sections, excluding references and Appendices) cannot exceed 4 pages (with aminimum page margin of 2.5cm on each side), using single line spacing, a two-column format, and aminimum font size of 11

标签:ARS,Learning,task,gaps,questions,report,using,your,page
From: https://www.cnblogs.com/comp9021T2/p/18558450

相关文章

  • python: generator IDAL and DAL using sql server 2019
     createIDAL#encoding:utf-8#版权所有2024©涂聚文有限公司#许可信息查看:言語成了邀功盡責的功臣,還需要行爲每日來值班嗎#描述:#Author:geovindu,GeovinDu涂聚文.#IDE:PyCharm2023.1python3.11#OS:windows10#Datetime:2024/......
  • OSTrack:Joint Feature Learning and Relation Modeling for Tracking: A One-Stream F
    Abstract问题:传统的双流跟踪框架对目标提取的特征不够具体。特征提取和关系建模是分开进行的,导致算法在区分目标和背景方面的能力有限。两流、两阶段框架容易受到性能-速度困境的影响。解决:提出一种新的单流跟踪框架,OSTrack通过桥接具有双向信息流的模板搜索图像来统一特......
  • [Whole Web] Optimize DNS parsing
    Whenuserstrytovisitourapplication, example.comforthefirsttime,DNSneedtoparsingandfindoutapplication IPaddress.Afterfirsttimevisiting,theIPaddressanddomainnamemappingwillbecachedforsubsequencevisit.  SooptimizingDNSp......
  • 联邦学习开山之作Communication-Efficient Learning of Deep Networks from Decentral
    1介绍1.1背景越来越多的手机和平板电脑成为许多人的主要计算设备。这些设备上强大的传感器(包括摄像头、麦克风和GPS),加上它们经常被携带的事实,意味着它们可以访问前所未有的大量数据,其中大部分本质上是私人的。根据这些数据学习的模型持有承诺通过支持更智能的应用程序来大大提......
  • 2024-11-16:哈沙德数。用go语言,如果一个整数能够被它的各个数位上数字的和整除, 我们称
    2024-11-16:哈沙德数。用go语言,如果一个整数能够被它的各个数位上数字的和整除,我们称这个整数为哈沙德数(Harshadnumber)。给定一个整数x,如果x是哈沙德数,则返回x各个数位的数字和;如果不是,则返回-1。输入:x=18。输出:9。解释:x各个数位上的数字之和为9。18能被9......
  • ECON705  Housing Affordability Analysis
    ECON705 Individual ReportHousing Affordability Analysis2024/25ObjectivesThis assignment is designed to simulate real-world economic challenges, focusing on a critical issue:housing affordability. It mirrors tasks you might encounter ......
  • 【Chapter 4】Machine Learning Regression Case_Second hand Car Price Prediction-X
    文章目录一、XGBoostAlgorithm二、ComparisonofalgorithmimplementationbetweenPythoncodeandSentosa_DSMLcommunityedition(1)Datareadingandstatisticalanalysis(2)dataprocessing(三)Featureselectionandcorrelationanalysis(4)Samplepartit......
  • 如何手写实现 JSON Parser
    JSON.parse是我们在前端开发中经常会用到API,如果我们要自己实现一个JSON.parse,我们应该怎么实现呢?今天我们就试着手写一个JSONParser,了解下其内部实现原理。JSON语法JSON是一种语法,用来序列化对象、数组、数值、字符串、布尔值和null。语法规则如下:数据使用名/值对表示。......
  • 【深度学习】Deep Learning Fundamentals - Classic Edition
    DeepLearningFundamentals-ClassicEditionsite:https://deeplizard.com/learn/video/gZmobeGL0Yg1.MachineLearning机器学习:使用算法分析数据、从数据中学习,然后对新数据做出决定和预测。过程:写一个算法机器在特定的数据集上执行算法之后,机器可以用它从未见过的......
  • Proj. CDeepFuzz Paper Reading: Checker Bug Detection and Repair in Deep Learning
    3.TensorGuard:ARAG-BasedMulti-agentframeworktodetectandfixDLCheckerBugsRAGDesignrelevantcontextualinformationfromalargecorpusofcodechangesInput:therootcauseofthecheckerbugqueriedOutput:codechangeBasedon:Sentence-tra......