日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

IEMS 5730代做、c++,Java語言編程代寫

時間:2024-03-12  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the
submitted homework.
I declare that the assignment submitted on Elearning system is original
except for source material explicitly acknowledged, and that the same or
related material has not been previously submitted for another course. I
also acknowledge that I am aware of University policy and regulations on
honesty in academic work, and of the disciplinary guidelines and
procedures applicable to breaches of such policy and regulations, as
contained in the website
http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________
Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must
be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has
discussed or worked with. If the answer includes content from any other source, the
student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
the homework.
If there is information you need to solve a problem, but the information is not stated in the
problem, try to find the data somewhere. If you cannot find it, state what data you need,
make a reasonable estimate of its value, and justify any assumptions you make. You will be
graded not only on whether your answer is correct, but also on whether you have done an
intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of
Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference
[1] and [2] to download the two datasets. Each line in these two files has the following format
(TAB separated):
bigram year match_count volume_count
An example for 1-grams would be:
circumvallate 1978 335 91
circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per
year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared.
Assume the data set contains all the 1-grams in the last 100 years, and the above
records are the only records for the word ‘circumvallate’. Then the average value is:
(335 + 261) / 2 = 298,
instead of
(335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
per year along with their corresponding average values sorted in descending order. If
multiple bigrams have the same average value, write down anyone you like (that is,
break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform
this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance
between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
compare the performance between Pig and Hive in terms of overall run-time and
explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small
subset of the data instead of the whole data set. Once your Hive commands/ scripts
work as desired, you can then run them up on the complete data set.
Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is
aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
homework, you will implement a similar-users-detection algorithm for the online movie rating
system. Basically, users who rate similar scores for the same movies may have common
tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this
homework, the similarity between a given pair of users (e.g. A and B) is measured as the
total number of movies both A and B have watched divided by the total number of
movies watched by either A or B. The following is the formal definition of similarity: Let
M(A) be the set of all the movies user A has watched. Then the similarity between user A
and user B is defined as:
………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
|𝑀(𝐴)∩𝑀(𝐵)|
|𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
by its unique userID and each movie is represented by its unique movieID. The format of the
data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the
list of the 10 pairs of users having the largest number of movies watched by
both users in the pair within the corresponding dataset. The format of your
answer should be as follows:
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:COMP 315代寫、Java程序語言代做
  • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網下載 歐冠直播 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

              国产亚洲视频在线观看| 日韩小视频在线观看| 欧美连裤袜在线视频| 亚洲小说春色综合另类电影| 尤物99国产成人精品视频| 欧美体内she精视频| 麻豆精品视频在线| 欧美一区二区三区在线| 夜夜爽www精品| 亚洲黄色视屏| 樱桃视频在线观看一区| 国内精品久久久| 国产日韩综合一区二区性色av| 欧美日韩亚洲激情| 欧美黄色影院| 美女久久一区| 免费观看成人网| 久久精品30| 久久精品理论片| 久久精品国产一区二区电影| 性做久久久久久免费观看欧美 | 久久婷婷一区| 久久丁香综合五月国产三级网站| 亚洲欧美日韩国产综合| 一区二区欧美国产| 韩国精品在线观看| 欧美色图首页| 欧美视频在线观看免费| 欧美色图五月天| 欧美三级电影大全| 国产精品一区二区男女羞羞无遮挡| 欧美三级第一页| 国产精品久久久久久久久免费樱桃| 欧美视频中文字幕在线| 国产精品日韩一区二区三区| 国产精品一区在线观看| 国产日韩视频| 亚洲第一天堂av| 日韩视频在线观看| 亚洲综合首页| 久久国产精品一区二区三区| 久久人人爽人人爽爽久久| 欧美成人精品不卡视频在线观看| 欧美日韩成人综合在线一区二区| 欧美日韩一区二区视频在线| 国产精品久久久久77777| 国产日产欧产精品推荐色| 国内精品久久久久国产盗摄免费观看完整版| 国产一区二区日韩| 亚洲免费高清| 欧美在线看片| 欧美黄色一区| 国产欧美va欧美va香蕉在| 一区二区三区中文在线观看 | 欧美成人综合在线| 欧美日韩一区二区三区四区五区| 国产日韩欧美精品在线| 亚洲精品综合| 欧美诱惑福利视频| 欧美激情一区二区三区成人| 国产精品无码永久免费888| 亚洲国产91精品在线观看| 在线亚洲美日韩| 蜜臀91精品一区二区三区| 欧美日韩亚洲一区在线观看| 国内精品国产成人| 中文日韩欧美| 欧美成人一区二区三区在线观看| 国产欧美精品一区 | 欧美人妖另类| 国产综合色在线| 亚洲午夜在线| 欧美金8天国| 黄色成人在线免费| 欧美亚洲一区二区在线观看| 欧美理论片在线观看| 亚洲国产成人精品视频| 久久精品理论片| 国产午夜精品久久久久久免费视 | 亚洲欧美精品伊人久久| 久热综合在线亚洲精品| 国产日产欧产精品推荐色| 亚洲视频 欧洲视频| 欧美高清视频在线| 亚洲国产小视频在线观看| 久久精品主播| 韩国三级电影一区二区| 久久精品官网| 国产一区二区欧美日韩| 午夜在线视频观看日韩17c| 国产精品高精视频免费| 亚洲视频在线一区观看| 欧美日韩免费区域视频在线观看| 在线观看欧美日韩| 免费不卡亚洲欧美| 在线欧美日韩精品| 久久综合狠狠综合久久激情| 在线成人亚洲| 欧美国产视频在线观看| 99xxxx成人网| 欧美性做爰毛片| 午夜精品国产更新| 国产一区二区你懂的| 老牛嫩草一区二区三区日本| 亚洲国产精品成人久久综合一区| 久久午夜国产精品| 亚洲日本视频| 国产精品久久久久久久午夜 | 国产农村妇女精品一区二区| 先锋影音网一区二区| 黄色成人av网| 欧美jizz19性欧美| 一本不卡影院| 国内精品久久久久影院色| 久久久综合激的五月天| 亚洲精品在线二区| 国产精品美女久久久久av超清| 午夜在线不卡| 91久久国产精品91久久性色| 欧美日韩视频在线一区二区观看视频| 亚洲午夜一区二区| 狠狠色狠狠色综合| 欧美揉bbbbb揉bbbbb| 亚洲欧美在线一区| 韩国女主播一区二区三区| 免费人成网站在线观看欧美高清| 在线视频欧美日韩精品| 国产欧美成人| 欧美日韩免费一区二区三区| 欧美在线啊v一区| 最近中文字幕mv在线一区二区三区四区 | 狠狠色综合播放一区二区| 欧美理论电影网| 久久精品综合| 亚洲一区二区高清视频| 亚洲国产导航| 国产日韩精品入口| 欧美日韩国产精品一区| 久久久久久**毛片大全| 亚洲乱码国产乱码精品精| 国产综合亚洲精品一区二| 欧美色网一区二区| 欧美激情精品久久久久久免费印度 | 久久欧美中文字幕| 国产精品99久久久久久久vr | 欧美日本韩国在线| 另类天堂视频在线观看| 欧美一区二区视频在线观看2020| 亚洲精品视频免费观看| 亚洲二区视频在线| 国产亚洲精品v| 国产精品日韩高清| 国产精品vvv| 欧美人与禽猛交乱配| 欧美高清视频在线| 麻豆免费精品视频| 久久久青草青青国产亚洲免观| 亚洲欧美日韩一区二区三区在线观看| 99在线精品视频| 亚洲伦理在线免费看| 亚洲三级视频在线观看| 亚洲人妖在线| 日韩一区二区免费高清| 亚洲精品日产精品乱码不卡| 亚洲激情视频在线观看| 亚洲人成高清| 夜夜躁日日躁狠狠久久88av| 亚洲国产精品久久久久秋霞蜜臀| 亚洲电影在线| 亚洲精品欧美专区| 亚洲天堂av电影| 亚洲欧美精品一区| 久久精品一区二区国产| 久久综合狠狠综合久久综青草| 美女日韩欧美| 欧美岛国激情| 国产精品观看| 国内视频一区| 亚洲国产欧美一区二区三区同亚洲 | 亚洲国产日韩欧美一区二区三区| 亚洲国产精品久久人人爱蜜臀 | 欧美日韩免费区域视频在线观看| 欧美午夜激情在线| 国产欧美日韩专区发布| 精品二区久久| 9人人澡人人爽人人精品| 午夜精品理论片| 久久午夜电影网| 欧美精品免费视频| 国产九区一区在线| 亚洲国产91色在线| 亚洲视频精选| 久久这里只有| 国产精品成av人在线视午夜片| 国产婷婷色一区二区三区四区| 在线日本成人| 亚洲一区二区三区在线观看视频| 久久久亚洲国产美女国产盗摄| 欧美精品激情blacked18| 国产欧美精品一区|