日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

代做IEMS 5730、代寫 c++,Java 程序設計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網下載 歐冠直播 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

              在线精品亚洲| 欧美成年人网站| 亚洲国产精品第一区二区| 亚洲伦伦在线| 国模精品一区二区三区| 中文一区二区| 尤物精品国产第一福利三区| 欧美在线视频一区二区三区| 欧美日韩一区在线观看| 伊人精品视频| 久久综合中文色婷婷| 国内免费精品永久在线视频| 欧美一区二区日韩一区二区| 国产免费一区二区三区香蕉精| 日韩亚洲精品在线| 在线欧美一区| 影音先锋中文字幕一区| 久久国产精品网站| 欧美在线播放高清精品| 国产欧美69| 国产精品影片在线观看| 欧美呦呦网站| 午夜精品在线视频| 亚洲欧美视频在线观看| 国产视频一区在线观看一区免费| 亚洲欧美日韩直播| 中文在线资源观看网站视频免费不卡 | 国产精品国产三级国产专区53 | 久久久久综合一区二区三区| 韩日成人在线| 一区二区三区在线免费播放| 久久先锋影音| 另类人畜视频在线| 免费看成人av| 欧美国产国产综合| 欧美日韩精品在线观看| 亚洲一区自拍| 亚洲视频在线二区| 一区二区三区 在线观看视频| 国产精品久久久久av免费| 午夜欧美精品| 久久久水蜜桃| 欧美日韩不卡视频| 欧美三级资源在线| 国产一区二区| 日韩午夜在线| 欧美一区二区三区在线观看 | 久久精品国语| 久久综合久久久| 欧美日韩国产在线观看| 久久国产精品99精品国产| 99精品国产一区二区青青牛奶 | 国产精品久久久一区二区| 欧美一级播放| 蜜臀av在线播放一区二区三区| 一区二区三区www| 午夜免费电影一区在线观看| 亚洲日本欧美| 香蕉久久a毛片| 欧美国产精品一区| 国产精品亚洲а∨天堂免在线| 99精品视频免费观看| 狠狠色丁香婷婷综合久久片| 欧美精选在线| 国语自产精品视频在线看8查询8| 欧美乱人伦中文字幕在线| 久久免费视频这里只有精品| 小黄鸭精品密入口导航| 一区二区三区精品视频| 亚洲福利国产| 亚洲欧洲视频在线| 午夜精品久久久久久久男人的天堂| 亚洲精品免费在线观看| 国产一区二区日韩精品欧美精品| 欧美午夜片在线观看| 欧美成人一区二区三区片免费| 久久福利电影| 国产精品都在这里| 亚洲片国产一区一级在线观看| 国产专区欧美精品| 亚洲手机在线| 欧美精品尤物在线| 国内综合精品午夜久久资源| 精品成人在线| 欧美一区二区成人| 国产精品网站在线观看| 国产精品久久久久久久久久免费 | 夜色激情一区二区| 老司机久久99久久精品播放免费| 欧美一级在线视频| 国产美女精品一区二区三区| 国产老肥熟一区二区三区| 国产欧美视频在线观看| 国产亚洲成精品久久| 国产亚洲a∨片在线观看| 国产视频一区二区在线观看| 韩国福利一区| 久久精品视频在线免费观看| 99视频+国产日韩欧美| 亚洲伦理久久| 欧美黑人多人双交| 最新国产成人av网站网址麻豆| 亚洲人成网站精品片在线观看 | 免费在线欧美黄色| 国模精品一区二区三区| 国产精品白丝av嫩草影院| 欧美亚州韩日在线看免费版国语版| 国产精品v欧美精品∨日韩| 国产欧美日韩精品专区| 在线欧美视频| 欧美激情视频免费观看| 欧美午夜视频在线| 亚洲一二三区在线| 国产日韩精品在线观看| 亚洲国产欧美另类丝袜| 亚洲午夜伦理| 国产深夜精品福利| 免费在线看一区| 一本久久综合| 国产亚洲激情在线| 欧美xx69| 亚洲欧美日韩精品久久久| 美日韩精品视频免费看| 国产精品久久国产三级国电话系列| 国产精品一区二区视频| 国产视频久久| 欧美精品一区二区三区视频| 国产美女精品| 欧美激情精品久久久六区热门| 国产毛片精品国产一区二区三区| 亚洲精品美女91| 国产欧美日韩精品一区| 一本色道88久久加勒比精品 | 国产日韩欧美亚洲一区| 亚洲理论在线| 狠狠色丁香婷婷综合影院| 亚洲综合精品四区| 136国产福利精品导航网址| 欧美在线视频一区| 一区二区三区免费在线观看| 欧美高清视频| 久久激情视频久久| 中文精品视频一区二区在线观看| 久久综合精品国产一区二区三区| 国产农村妇女精品一区二区| 一区二区三区视频观看| 欧美日韩成人一区二区| 伊人久久久大香线蕉综合直播| 欧美一区二区女人| 一区二区三区精品视频在线观看| 免费欧美日韩国产三级电影| 国产精品国产成人国产三级| 日韩西西人体444www| 欧美激情一级片一区二区| 91久久夜色精品国产网站| 欧美成人小视频| 久久综合狠狠综合久久综青草 | 久久成人18免费网站| 国产毛片久久| 国产精品美女在线| 国产精品爱久久久久久久| 亚洲视频在线一区| 99视频精品| 日韩一级免费观看| 99热精品在线观看| 在线亚洲伦理| 亚洲宅男天堂在线观看无病毒| 国产精品久久久久久久久免费桃花| 亚洲一区二区在线| 中文一区字幕| 亚洲香蕉成视频在线观看| 国产精品视频一二三| 久久久蜜桃一区二区人| 亚洲国产成人tv| 亚洲黄色在线| av成人毛片| 亚洲一区欧美| 午夜视频一区二区| 久久国产日韩| 久久综合五月| 欧美日韩免费观看一区二区三区| 亚洲性视频网址| 亚洲国产一区二区视频| 国产精品久久激情| 国产色综合久久| 亚洲国产成人精品视频| 国产精品盗摄久久久| 久久美女性网| 欧美jizzhd精品欧美喷水| 亚洲欧美影院| 久热这里只精品99re8久| 亚洲理伦在线| 午夜精品免费视频| 媚黑女一区二区| 欧美视频福利| 狠狠综合久久av一区二区小说 | 午夜精品久久久久久久久| 亚洲国内精品| 亚洲欧美一级二级三级| 亚洲精品一区二区三区婷婷月|