日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

代做CITS5508、代做 Python 語(yǔ)言程序

時(shí)間:2024-04-21  來(lái)源:  作者: 我要糾錯(cuò)



CITS5508 Machine Learning Semester 1, 2024
Assignment 2
Assessed, worth 15%. Due: 8pm, Friday 03th May 2024
Discussion is encouraged, but all work must be done and submitted individually. This assign- ment has 21 tasks, from which 20 are assessed, which total 45 marks.
You will develop Python code for classification tasks. You will use Grid Search and cross- validation to find the optimal hyperparameters of the model and discuss and interpret the differ- ent decisions and their impact on the model’s performance and interpretability.
1 Submission
Your submission consists of two files. The first file is a report describing your analysis/results. Your analysis should provide the requested plots, tables and your reflections about the results. Each deliverable task is indicated as D and a number. Your report should be submitted as a “.PDF” file. Name your file as assig1 <student id>.pdf (where you should replace <student id> with your student ID).
The second file is your Python notebook with the code supporting your analysis/results. Your code should be submitted as assig1 <student id>.ipynb, the Jupyter notebook exten- sion.
Submit your files to LMS before the due date and time. You can submit them multiple times. Only the latest version will be marked. Your submission will follow the rules provided in LMS.
Important:
• You must submit the first part of your assignment as an electronic file in PDF format (do not send DOCX, ZIP or any other file format). Only PDF format is accepted, and any other file format will receive a zero mark.
• You should provide comments on your code.
• You must deliver parts one and two to have your assignment assessed. That is, your submission should contain your analysis and your Jupyter notebook with all coding, both with appropriate formatting.
• Bysubmittingyourassignment,youacknowledgeyouhavereadallinstructionsprovided in this document and LMS.
• There is a general FAQ section and a section in your LMS, Assignments - Assignment 2 - Updates, where you will find updates or clarifications about the tasks when necessary. It is your responsibility to check this page regularly.
1

• You will be assessed on your thinking and process, not only on your results. A perfect performance without demonstrating understanding what you have done won’t provide you marks.
• Your answer must be concise. A few sentences (2-5) should be enough to answer most of the open questions. You will be graded on thoughtfulness. If you are writing long answers, rethink what you are doing. Probably, it is the wrong path.
• Youcanaskinthelaborduringconsultationifyouneedclarificationabouttheassignment questions.
• You should be aware that some algorithms can take a while to run. A good approach to improving their speed in Python is to use the vectorised forms discussed in class. In this case, it is strongly recommended that you start your assignment soon to accommodate the computational time.
• For the functions and tasks that require a random procedure (e.g. splitting the data into 80% training and 20% validation set), you should set the seed of the random generator to the value “5508” or the one(s) specified in the question.
2 Dataset
In this assignment, you are asked to train a few decision tree classifiers on the Breast cancer wisconsin (diagnostic) dataset available on Scikit-Learn and compare their performances.
Description about this dataset can be found on the Scikit-Learn web page:
https://scikit-learn.org/stable/datasets/toy dataset.html#breast-cancer-wisconsin-diagnostic-dataset
There are two classes in the dataset:
• malignant (212 instances, class value 0) and
• benign (357 instances, class value 1). Follow the example code given on the web page
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load breast cancer.html#sklearn.datasets. load breast cancer
to read the dataset and separate it into a feature matrix and a class vector. Your feature matrix should have 569 (rows) × 30 (columns) and your class vector should have 569 elements.
In all asked implementations using Decision Trees, Random Forests or data splits (such as when using train test split()), you should set random state as specified for results reproducibil- ity. You should aim to round your results to the second decimal place.
2

3 Tasks
First inspections on the dataset and preprocessing
D1 2 marks
Re-order the columns in your feature matrix (or dataframe) based on the column name. Provide a scatter plot to inspect the relationship between the first 10 features in the dataset. Use different colours in the visualisation to show instances coming from each class. (Hint: use a grid plot.)
D2 2 marks
Provide a few comments about what you can observe from the scatter plot:
• •
• •
What can be observed regarding the relationship between these features?
Can you observe the presence of clusters of groups? How do they relate to the target variable?
Are there any instances that could be outliers?
Are there features that could be removed? Why or why not?
1 mark
Compute and show the correlation matrix where each cell contains the correlation coefficient between the corresponding pair of features (Hint: you may use the heatmap function from the seaborn package).
D4 1 mark
Do the correlation coefficients support your previous observations?
D5
In a data science project, it’s crucial not just to remove highly correlated features but to consider the context and implications of feature selection carefully. Blindly dropping features may lead to the loss of valuable information or unintended bias in the model’s performance. Here, for the assignment context, we will drop a few features to simplify the classification tasks and speed up the computational time. Create a code that drop the features: mean perimeter, mean radius, worst radius, worst perimeter and radius error. These are features with a linear corre- lation higher than 0.97 in magnitude with some other features kept in the data.
After this process, your data matrix should be updated accordingly and contain 25 features. Task D5 must be performed; otherwise, your order deliverable tasks will be incorrect. However, there are no marks for task D5.
Fitting a Decision Tree model with default hyperparameters
D6 3 marks
Fit a decision tree classifier using default hyperparameters using 80% of the data. Remember to set the random generator’s state to the value “5508” (for both the split and class). Use the trained classifier to perform predictions on the training and test sets. Provide the accuracy, precision and recall scores for both sets and the confusion matrix for the test set.
D3
3

D7 2 marks
Comment on these results. Do you think your classifier is overfitting? If so, why this is happen- ing? If not, why not?
D8 2 marks
Display the decision tree built from the training process (like the one shown in Figure 6.1 of the textbook for the iris dataset).
D9 2 marks
Study the tree diagram and comment on the following:
• How many levels resulted from the model?
• Did the diagram help you to confirm whether the classifier has an overfitting issue? • What can you observe from the leaves?
• Is this an interpretable model?
D10 3 marks
Repeat the data split another four times, each using 80% of the data to train the model and the remaining 20% for testing. For these splits, set the seed of the random state to the values “5509”, “5510”, “5511” and “5512”. The random state of the model can be kept at “5508”.
For each of these four splits, fit the decision tree classifier and use the trained classifier to perform predictions on the test set. Provide three plots to show the accuracy, precision and recall scores for the test set for each split and comment on the consistency of the results in the five splits (including the original split with random state “5508”).
D11 3 marks
Investigate the impact of the training size on the performance of the model. You will do five different splits: 50%-50% (training the model on 50% of the data and testing on the remaining 50%), 60%-40%, 70%-30%, 80%-20% and 90%-10%. For each of these data splits, set back the seed of the random state to the value “5508”.
Provide three plots to show the accuracy, precision and recall scores for the test set for each data split and comment on the results. Did the performance behave as you expected?
Fitting a Decision Tree model with optimal hyperparameters
D12 4 marks
Create a training set using 80% of the data and a test set with the remaining 20%. Use a 10-fold cross-validation and grid-search to find the optimal combination of hyperparameters max depth — using values [2, 3, 4, 5], min samples split — using values [2, 4, 5, 10],andminsamplesleaf—usingvalues[2, 5]ofadecisiontreemodel.Remembertoset the seed of the random state of the data split function and model class to the value “5508”. For the cross-validation, set the value of the random state to “42”. Use accuracy for the scoring argument of the grid-search function.
With the optimal obtained hyperparameters, retrain the model and report: 4

• The optimal hyperparameters;
• The obtained accuracy, precision and recall on the training set;
• The obtained accuracy, precision and recall on the test set;
• The confusion matrix on the test set.
D13
2 marks
Comment: What was the impact of fine-tuning the hyperparameters as opposed to what you obtained in D6? Has fine-tuning done what you expected?
D14 3 marks Repeat the training of task D12 twice: one considering the scoring argument of the grid-search function as precision and the other recall.
For each of the scoring options (accuracy, precision, recall), provide the optimal hyperparame- ters according to the 10-fold cross-validation and grid-search, and, after retraining each model accordingly, provide the confusion matrix on the test set. Comment on the results, considering the problem.
Fitting a Decision Tree with optimal hyperparameters and a reduced feature set
D15 1 mark
Using the model with fine-tuned hyperparameters based on accuracy (the one you obtained in D12), display the feature importance for each feature obtained from the training process. You should sort the feature importances in descending order.
D16 3 marks
Using the feature importance you calculated in the previous task, trim the feature dimension of the data. That is, you should retain only those features whose importance values are above 1% (i.e., 0.01). You can either write your own Python code or use the function SelectFromModel from the sklearn.feature selection package to work out which feature(s) can be removed.
Report what features were retained and removed in the above process. Also report the total feature importance value that is retained after your dimension reduction step.
D17 3 marks
Compare the model’s performance (accuracy, precision, recall) on training and test sets when using the reduced set of features and the model trained on the complete set of features. Also, report the corresponding confusion matrices on the test sets. (You will need to consider whether you should repeat the cross-validation process to find the optimal hyperparameters).
D18 1 mark
Comment on your results. What was the impact (if any) of reducing the number of features?
5

Fitting a Random Forest
D19 3 marks
Considering all features and the 80%-20% data split you did before, use 10-fold cross-validation and grid-search to find a Random Forest classifier’s optimal hyperparameters n estimators (number of estimators) and max depth. Remember to set the seed of the random state of the data split function and model class to the value “5508”. Use n estimators:[10, 20, 50, 100, 1000], and max depth:[2, 3, 4, 5]. For the cross-validation, set the value of the random state to “42”. Use accuracy for the scoring argument of the grid-search function.
Keep the other hyperparameter values to their default values. Use the optimal values for the n estimators and max depth hyperparameters to retrain the model and report:
• The obtained optimal number of estimators and max depth;
• The obtained accuracy, precision and recall on the training set;
• The obtained accuracy, precision and recall on the test set;
• The confusion matrix on the test set.
D20
How do these performances compare with the ones you obtained in D12? What changed with the use of a Random Forest model? Is this result what you would expect?
D21 2 marks
Thinking about the application and the different models you created, discuss:
• Do you think these models are good enough and can be trusted to be used for real?
• Do you think a more complex model is necessary?
• Do you think using a machine learning algorithm for this task is a good idea? That is, should this decision process be automated? Justify.
• Are there considerations with the used dataset?


請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp

標(biāo)簽:

掃一掃在手機(jī)打開當(dāng)前頁(yè)
  • 上一篇:代做CPT206、c/c++,Python程序設(shè)計(jì)代寫
  • 下一篇:FIT5225 代做、代寫 java,c++語(yǔ)言程序
  • 無(wú)相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國(guó)家級(jí)風(fēng)景名勝區(qū)
    昆明西山國(guó)家級(jí)風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • NBA直播 短信驗(yàn)證碼平臺(tái) 幣安官網(wǎng)下載 歐冠直播 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

              9000px;">

                        午夜精品久久久久久久| 精品亚洲成av人在线观看| 麻豆传媒一区二区三区| 日韩一区二区免费视频| 国产在线精品一区二区夜色 | 国产99久久精品| 中文字幕五月欧美| 欧美日韩一区二区三区四区| 美国十次了思思久久精品导航| 欧美一区二区免费视频| 国产精品一区二区在线播放| 亚洲精品第1页| 日韩欧美一区二区不卡| 成人av资源在线| 婷婷丁香久久五月婷婷| 久久久www免费人成精品| 91猫先生在线| 经典三级视频一区| 一区二区三区加勒比av| 久久久久99精品国产片| 欧美性生活久久| 成人黄色小视频在线观看| 午夜电影久久久| 国产精品白丝在线| 久久免费看少妇高潮| 91免费看片在线观看| 狠狠色丁香久久婷婷综合丁香| 一区二区三区欧美视频| 国产精品网站一区| 26uuu精品一区二区| 在线看不卡av| 99综合电影在线视频| 麻豆精品蜜桃视频网站| 亚洲国产日韩av| 国产精品福利一区二区三区| 精品99一区二区| 欧美一级午夜免费电影| 欧美日韩国产首页| 欧美伊人久久久久久久久影院| 91在线精品秘密一区二区| 波多野结衣在线aⅴ中文字幕不卡| 久久av资源网| 国产在线播放一区二区三区| 麻豆成人免费电影| 久久99热这里只有精品| 国产尤物一区二区| 国产精品一区二区男女羞羞无遮挡| 日本视频一区二区三区| 麻豆91精品91久久久的内涵| 日韩av在线播放中文字幕| 日本不卡的三区四区五区| 日本欧美在线观看| 久久成人免费网站| 东方欧美亚洲色图在线| 国产suv精品一区二区883| 国产精品99久久久久久似苏梦涵| 国产成人免费在线| av一区二区三区在线| 成人福利视频网站| 91麻豆swag| 91官网在线观看| 亚洲自拍另类综合| 日韩情涩欧美日韩视频| 精品成人佐山爱一区二区| 欧美亚洲综合在线| 在线精品视频免费播放| 成人亚洲精品久久久久软件| 欧美va亚洲va| 国产日韩一级二级三级| 久久亚洲影视婷婷| 亚洲精品在线电影| 日韩一级二级三级| 欧美大胆一级视频| 久久亚洲精精品中文字幕早川悠里| 精品一区二区免费| 国产裸体歌舞团一区二区| 国产一区免费电影| 国产99久久久国产精品潘金网站| 丁香激情综合国产| 91蜜桃网址入口| 欧美三级电影在线看| 日本精品视频一区二区| 欧美午夜精品理论片a级按摩| 在线视频亚洲一区| 欧美性猛片xxxx免费看久爱 | 亚洲欧洲日韩在线| 亚洲嫩草精品久久| 亚洲成a人片在线不卡一二三区| 香港成人在线视频| 精品中文av资源站在线观看| 色综合亚洲欧洲| 51精品久久久久久久蜜臀| 91精品国产欧美一区二区成人| 日韩视频一区二区| 国产精品另类一区| 亚洲综合视频网| 极品少妇xxxx精品少妇偷拍| 成人免费高清在线观看| 在线亚洲一区二区| 日韩一区二区三区在线观看| 久久精品亚洲一区二区三区浴池| 国产精品美女久久久久高潮| 亚洲bdsm女犯bdsm网站| 国产一本一道久久香蕉| 国产在线播精品第三| 99久久久精品| 日韩精品专区在线影院观看| 国产精品视频看| 免费观看在线综合色| 99久久国产综合精品色伊| 日韩一区二区免费电影| 国产精品国产a| 亚洲精品欧美综合四区| av亚洲精华国产精华精华| 日韩欧美成人激情| 一区av在线播放| 成人国产亚洲欧美成人综合网| 日韩一区二区三区免费观看| 亚洲人成网站色在线观看| 亚洲电影在线免费观看| 一本色道亚洲精品aⅴ| 国产三级一区二区| 久久99精品久久久久久国产越南 | 色哟哟欧美精品| 欧美不卡在线视频| 午夜视黄欧洲亚洲| 欧美色视频一区| 一区二区在线观看视频| av电影天堂一区二区在线观看| 久久久精品综合| 国产精品乡下勾搭老头1| 精品国产乱码久久久久久老虎| 婷婷国产v国产偷v亚洲高清| 欧美性videosxxxxx| 欧美日韩不卡视频| 美腿丝袜亚洲三区| 91精品免费观看| 日韩福利视频网| 欧美精品久久99久久在免费线 | 精品1区2区在线观看| 一个色妞综合视频在线观看| av不卡免费在线观看| 中文字幕中文字幕中文字幕亚洲无线| 国产精品亚洲专一区二区三区| 精品国产髙清在线看国产毛片| 又紧又大又爽精品一区二区| 91尤物视频在线观看| 国产精品亲子乱子伦xxxx裸| 亚洲国产精品久久久久婷婷884 | 欧美主播一区二区三区| 一卡二卡三卡日韩欧美| 欧美日韩电影一区| 日本不卡免费在线视频| 国产精品美女www爽爽爽| 91网站在线播放| 亚洲美女精品一区| 国产成人免费视频一区| 国产精品大尺度| 欧美中文字幕不卡| 日本午夜一本久久久综合| 精品对白一区国产伦| 成人黄色电影在线| 免费精品视频在线| 久久久高清一区二区三区| 99精品黄色片免费大全| 久久久91精品国产一区二区精品 | 中文字幕永久在线不卡| 亚洲尤物视频在线| 久久婷婷成人综合色| av在线播放一区二区三区| 一区二区三区成人在线视频| 91精品国模一区二区三区| 国产精品亚洲一区二区三区妖精| 亚洲欧美日韩中文播放| 欧美国产视频在线| 欧美亚洲动漫另类| 精品一区二区在线视频| 亚洲另类中文字| 精品久久久久久久久久久久包黑料| 午夜成人免费电影| 亚洲一区二区三区视频在线播放| 欧美www视频| 在线视频欧美精品| 久久99深爱久久99精品| 自拍偷自拍亚洲精品播放| 欧美在线影院一区二区| 色一区在线观看| 国产91精品在线观看| 亚洲视频 欧洲视频| 精品国产人成亚洲区| 91成人免费在线| 91麻豆国产自产在线观看| 国产精华液一区二区三区| 亚洲成人动漫精品| 亚洲三级视频在线观看| 国产欧美日韩久久| 欧美日韩一区在线观看| 欧美午夜精品理论片a级按摩| 不卡的av在线| 懂色av一区二区三区免费观看|