日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

代做IEMS 5730、代寫 c++,Java 程序設計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗證碼平臺 幣安官網下載 歐冠直播 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

              9000px;">

                        欧美性感一类影片在线播放| 日日欢夜夜爽一区| 91麻豆精品国产91久久久久| 亚洲精品一线二线三线无人区| 久久久国产精品麻豆| 亚洲午夜电影在线| 在线观看一区二区视频| 国产亚洲一区二区三区四区| 美女在线视频一区| 欧美一级二级在线观看| 天天影视涩香欲综合网| 欧美视频在线观看一区| 亚洲国产aⅴ天堂久久| 在线观看精品一区| 亚洲成人中文在线| 精品久久人人做人人爰| 久久精品国产澳门| 国产欧美日韩久久| 色综合一区二区| 日韩电影免费在线| 中文av一区特黄| 欧美另类高清zo欧美| 久久精品国产99| 国产欧美日本一区二区三区| 一本大道久久a久久精品综合| 一区二区三区不卡在线观看 | a4yy欧美一区二区三区| 中文字幕一区二区视频| 丁香一区二区三区| 久久色在线观看| 色婷婷久久综合| 精品一二三四区| 亚洲天堂a在线| 26uuu精品一区二区三区四区在线| 99国内精品久久| 奇米影视一区二区三区| 久久综合久久综合九色| 色94色欧美sute亚洲13| 99综合电影在线视频| 国产九九视频一区二区三区| 日本欧美久久久久免费播放网| 国产精品成人在线观看| 欧美一区二区三区成人| 欧美裸体bbwbbwbbw| 91视频www| 91亚洲资源网| 欧美在线一区二区三区| 99久久婷婷国产| 国产福利一区二区三区| 暴力调教一区二区三区| 国产精品看片你懂得| 国产精品国产三级国产a| 国产日韩v精品一区二区| 久久久久综合网| 国产午夜亚洲精品羞羞网站| 欧美午夜片在线观看| av午夜精品一区二区三区| av高清不卡在线| 在线观看网站黄不卡| 国产乱一区二区| 色婷婷亚洲精品| 精品噜噜噜噜久久久久久久久试看| 日韩一区二区三区免费看| 欧美日韩一区二区三区高清| 51精品秘密在线观看| 国产精品天美传媒| 男女男精品视频| 狠狠色丁香婷婷综合| 99精品在线免费| 91精品免费在线观看| 精品国偷自产国产一区| 国产农村妇女精品| 一区二区三区在线播| 极品尤物av久久免费看| 在线免费观看成人短视频| 久久久久久麻豆| 国产精品中文有码| 精品奇米国产一区二区三区| 亚洲特级片在线| 国产成人综合在线观看| 欧美日韩成人综合| 亚洲激情图片一区| 亚洲精品中文在线| 94-欧美-setu| 国产精品无遮挡| 成人丝袜高跟foot| 国产精品美女久久久久aⅴ| 国产一区二区在线观看免费| 国产女同互慰高潮91漫画| 欧美日韩国产影片| 成人三级伦理片| 亚洲午夜久久久久| 国产精品美女久久久久av爽李琼| 国产精品一区免费视频| 91精品国产色综合久久不卡电影| 首页国产欧美日韩丝袜| 成人免费毛片aaaaa**| 久久中文字幕电影| 狠狠色丁香婷婷综合| 日韩欧美一级二级| 国产麻豆欧美日韩一区| 国产精品美女久久久久久久久| 国产69精品一区二区亚洲孕妇| 久久国产精品露脸对白| 欧美一级日韩免费不卡| 欧美性色欧美a在线播放| caoporm超碰国产精品| 久久99精品国产.久久久久 | 丁香激情综合国产| 免费成人在线观看| 日本三级韩国三级欧美三级| 亚洲综合激情另类小说区| 国产欧美精品在线观看| 久久久久国产精品人| 久久婷婷综合激情| 26uuu亚洲| 国产精品嫩草影院com| 亚洲欧美综合色| 丝袜诱惑制服诱惑色一区在线观看| 午夜av电影一区| 一本一道久久a久久精品综合蜜臀| 欧美在线观看视频在线| 精品国产一区二区三区忘忧草| 亚洲欧美激情一区二区| 中文字幕av资源一区| 精品乱码亚洲一区二区不卡| 日韩欧美在线观看一区二区三区| 五月激情综合网| 亚洲va中文字幕| 精品一区二区三区不卡| 老司机午夜精品99久久| 久久99久久久久| 久久福利视频一区二区| 丁香婷婷综合五月| 高清视频一区二区| 99精品在线观看视频| 欧美日韩大陆一区二区| 91小视频免费观看| 波多野结衣欧美| 欧美日韩一卡二卡| 欧美成人精品福利| www日韩大片| 91免费观看视频| 欧美日韩视频第一区| 日韩精品中文字幕一区二区三区| 日韩免费在线观看| 国产调教视频一区| 国产女人水真多18毛片18精品视频| 91精品国产综合久久久久| 99久久精品费精品国产一区二区| 日韩午夜激情视频| 久久综合色一综合色88| 日日夜夜精品视频天天综合网| 国产精品一二二区| 国产午夜一区二区三区| 黄色资源网久久资源365| 欧美午夜片在线看| 日本午夜一本久久久综合| 日韩精品一区二区三区四区| 国产精品正在播放| 91丝袜呻吟高潮美腿白嫩在线观看| www.日韩精品| 日韩精品一区第一页| 91福利小视频| 亚洲日本va午夜在线电影| 久久99精品一区二区三区三区| 欧美精品 日韩| 午夜精品福利一区二区三区av| av综合在线播放| 午夜一区二区三区视频| 欧美精品视频www在线观看| 一区二区在线免费观看| 欧美性xxxxxxxx| 日韩精彩视频在线观看| 精品噜噜噜噜久久久久久久久试看| 色综合 综合色| 丝袜美腿亚洲一区二区图片| 日韩一区二区视频在线观看| 一区二区三区在线免费观看 | 国产无人区一区二区三区| 九九精品视频在线看| 久久九九99视频| 欧美日韩国产不卡| 狠狠色狠狠色综合| 亚洲成人自拍偷拍| 久久综合狠狠综合久久激情| 亚洲一区在线观看免费观看电影高清| 7777精品伊人久久久大香线蕉| 成人aaaa免费全部观看| 蜜臀av亚洲一区中文字幕| 久久精品视频一区二区三区| 欧美亚洲综合另类| 不卡欧美aaaaa| 美腿丝袜亚洲色图| 亚洲福利一二三区| 欧美变态tickling挠脚心| 国产高清精品久久久久| 欧美激情在线看| 2019国产精品| 欧美激情在线观看视频免费|