日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

COMP9414代寫、Python語言編程代做

時間:2024-07-06  來源:  作者: 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:

env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 13 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









 

標簽:

掃一掃在手機打開當(dāng)前頁
  • 上一篇:FINS5510代寫、代做Python/c++程序語言
  • 下一篇:代寫公式指標 代寫指標股票公式定制開發(fā)
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網(wǎng)下載 歐冠直播 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

              9000px;">

                        欧美在线看片a免费观看| 99久久精品免费看国产免费软件| 日韩av网站免费在线| 色综合久久久久网| 中文字幕一区二区视频| 一区二区三区四区在线| 豆国产96在线|亚洲| 久久久久久久久久电影| 激情欧美一区二区三区在线观看| 日韩一卡二卡三卡| 麻豆一区二区三| 精品国产凹凸成av人网站| 另类专区欧美蜜桃臀第一页| 日韩美女视频在线| 国产综合久久久久久鬼色| www国产精品av| 国产成人精品亚洲午夜麻豆| 国产欧美一区二区三区网站 | 日韩不卡在线观看日韩不卡视频| 3d动漫精品啪啪一区二区竹菊| 午夜亚洲福利老司机| 色中色一区二区| 免费的成人av| 亚洲一区免费在线观看| 中文字幕+乱码+中文字幕一区| 欧美二区在线观看| 色综合天天综合狠狠| 风流少妇一区二区| 久久国产欧美日韩精品| 日韩精品一区第一页| 亚洲综合色自拍一区| 一色桃子久久精品亚洲| 国产亚洲精品中文字幕| 欧美变态tickle挠乳网站| 在线不卡欧美精品一区二区三区| 在线一区二区三区做爰视频网站| 成熟亚洲日本毛茸茸凸凹| 国产精品1区二区.| 极品少妇xxxx精品少妇偷拍| 秋霞成人午夜伦在线观看| 亚洲一区二区三区在线看 | 国产喂奶挤奶一区二区三区| 欧美一区二区三区不卡| 5566中文字幕一区二区电影 | 欧美一级二级三级乱码| 欧美日韩精品欧美日韩精品一综合| 国产精品一区在线观看乱码| 天堂蜜桃91精品| 午夜精品福利一区二区三区av| 亚洲精品国产一区二区精华液| 久久精品欧美一区二区三区麻豆| 欧美变态tickling挠脚心| 日韩欧美在线综合网| 2022国产精品视频| 国产午夜精品久久| 成人欧美一区二区三区白人| 亚洲免费观看在线观看| 夜夜揉揉日日人人青青一国产精品| 亚洲精品乱码久久久久久黑人| 亚洲尤物视频在线| 日韩av电影天堂| 久久精品国产久精国产爱| 久久超碰97中文字幕| 国产福利一区二区| 91视视频在线观看入口直接观看www | 欧美亚洲一区三区| 3d动漫精品啪啪| 久久综合九色综合97_久久久| 久久亚洲私人国产精品va媚药| 26uuu国产电影一区二区| 国产香蕉久久精品综合网| 1000精品久久久久久久久| 性久久久久久久| 国产精品一色哟哟哟| 99久久精品国产麻豆演员表| 制服丝袜中文字幕一区| 国产午夜亚洲精品理论片色戒| 亚洲欧美一区二区三区国产精品 | 一区二区三区在线看| 肉色丝袜一区二区| 成人性生交大合| 色哟哟精品一区| 日韩你懂的在线播放| 亚洲视频资源在线| 麻豆精品视频在线| 99精品视频一区二区三区| 日韩一区二区在线看| 中文字幕制服丝袜成人av| 石原莉奈在线亚洲二区| 99久久国产免费看| 精品动漫一区二区三区在线观看| 亚洲人精品一区| 国产在线视频精品一区| 欧美美女bb生活片| 亚洲视频 欧洲视频| 国产成人在线视频播放| 欧美日韩国产欧美日美国产精品| 国产午夜精品一区二区| 麻豆一区二区在线| 欧美性高清videossexo| 国产精品人人做人人爽人人添| 奇米影视一区二区三区小说| 91精彩视频在线观看| 国产日韩欧美a| 婷婷久久综合九色综合伊人色| 不卡视频一二三四| 欧美成人官网二区| 天天av天天翘天天综合网色鬼国产 | 国产日韩精品一区二区三区| 青青国产91久久久久久| 欧美日韩国产综合一区二区三区| 综合亚洲深深色噜噜狠狠网站| 国产精品自在欧美一区| 精品久久久久av影院| 日韩电影在线一区| 欧美精品一卡两卡| 亚洲综合一区二区三区| 91久久精品网| 亚洲欧美成人一区二区三区| 成人精品在线视频观看| 国产日韩精品视频一区| 国产激情一区二区三区| 久久伊人中文字幕| 国产精品夜夜嗨| 国产区在线观看成人精品| 高清久久久久久| 国产欧美日韩视频在线观看| 成人综合在线观看| 国产精品高潮久久久久无| 99国产欧美另类久久久精品| 成人欧美一区二区三区白人| 色综合天天天天做夜夜夜夜做| 亚洲人123区| 91浏览器在线视频| 午夜在线成人av| 日韩精品一区二区三区在线播放| 久久精品国产免费| 日本一二三不卡| av午夜一区麻豆| 亚洲精品视频免费观看| 欧美伦理视频网站| 久草这里只有精品视频| 中文字幕乱码久久午夜不卡| 91小视频免费看| 日韩高清一区在线| 国产调教视频一区| 色欧美片视频在线观看在线视频| 亚洲第一在线综合网站| 亚洲精品在线观看网站| 97久久久精品综合88久久| 日韩黄色在线观看| 久久精品视频在线看| 在线观看免费成人| 精品亚洲国产成人av制服丝袜| 亚洲视频图片小说| 精品av综合导航| 欧日韩精品视频| 国产美女久久久久| 天天色综合成人网| 中文字幕日韩av资源站| 日韩亚洲欧美成人一区| 成人一级视频在线观看| 日本特黄久久久高潮| 亚洲欧美自拍偷拍色图| 欧美电视剧在线看免费| 欧洲一区二区av| 国产精品一二三四五| 日韩精品一级中文字幕精品视频免费观看 | 在线视频你懂得一区二区三区| 蜜臀va亚洲va欧美va天堂| 综合激情成人伊人| 久久久91精品国产一区二区三区| 欧美三级视频在线播放| 99re66热这里只有精品3直播 | 欧美性极品少妇| 国产精品99久久久久久久vr| 日韩中文字幕不卡| 亚洲激情五月婷婷| 国产精品萝li| 日韩欧美卡一卡二| 欧美体内she精高潮| 丰满放荡岳乱妇91ww| 久久99精品国产.久久久久| 五月婷婷激情综合网| 亚洲免费观看高清| 综合精品久久久| 国产精品久久久久久久裸模| 久久天天做天天爱综合色| 91精品国产91久久久久久一区二区 | 欧美人体做爰大胆视频| 91精彩视频在线| 91免费精品国自产拍在线不卡| 日韩三级中文字幕| 激情综合网av| 亚洲第四色夜色| 久久蜜桃av一区精品变态类天堂 | 91日韩在线专区| 成人午夜av电影| 成人一级片网址| 成人免费va视频|