国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

TCS3393 DATA MINING代做、代寫Python/Java編程

時間:2024-03-24  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



FACULTY OF ENGINEERING, BUILT-ENVIRONMENT, AND INFORMATION
TECHNOLOGY (FOEBEIT)
BACHELOR OF INFORMATION TECHNOLOGY (HONS)
JANUARY-MAY 2024 INTAKE
TCS3393 DATA MINING
GROUP ASSIGNMENT [2-3 members per group]
This assignment is worth 25% of the overall marks available for this module. This assignment
aims to help the student explore and analyse a set of data and reconstruct it into meaningful
representations for decision-making.
The online landscape is ever-evolving, with websites serving as crucial assets for businesses,
organizations, and individuals. As the internet continues to grow, the need for accurate and
efficient website classification becomes paramount. Understanding the nature of websites, their
content, and the user experience they provide is vital for various purposes, including online
security, marketing strategies, and content filtering.
Embarking on a data science project, you collaborate with a cybersecurity firm dedicated to
enhancing web security measures. The firm provides you with a rich dataset encompassing
various attributes of websites, including their URLs, user comments, and assigned categories.
Your objective is to develop a classification model capable of accurately categorizing websites
based on these variables.
The dataset includes information on the URLs of different websites, user comments associated
with those websites, and pre-existing categories assigned to them. The challenge lies in creating
a model that not only accurately classifies websites but also adapts to the dynamic nature of the
online environment, where new types of websites constantly emerge.
Introduction
2
Your goal is to implement advanced data analysis techniques to train a model that enhances the
efficiency of web classification.
Techniques
The techniques used to explore the dataset using various data exploration, manipulation,
transformation, and visualization techniques are covered in the course. As an additional feature,
you must explore further concepts which can improve the retrieval effects. The datasetprovided
for this assignment is related to the website classification.
Dataset
This dataset contains information on 1407 websites URL. It includes 3 variables that describe
various categories of websites. The dataset will be analyzed using subsets of these variables for
descriptive and quantitative analyses, depending on the specific models used.
Objective:
Develop a classification model to categorize websitesusing advanced data science techniques.The
model should robustly classify the website based on comments stated in the dataset.
Tasks:
1. Data Exploration:
• Conduct an initial exploration of the dataset to understand its structure, size, and
variables.
• Examine the distribution of website categories to identify any imbalances in the
dataset.
• Explore the distribution of URLs and user comments length to gain insights into the
data.
Assignment Task: Websites Classification
3
2. Descriptive Analysis:
A. Basic Exploration:
• Describe the structure of the dataset. How many observations and variables
does it contain?
• What are the data types of the variables in the dataset?
B. Statistical Summary:
• Provide a statistical summary of the 'Category' variable. What are the most
common website categories?
• Calculate basic descriptive statistics (mean, median, standard deviation) for
relevant numeric variables.
C. URL Analysis:
• Analyze the distribution of website URLs. Are there any patterns or
commonalities?
• Are there any outlier URLs that need special attention?
3. Data Preprocessing:
A. Cleaning Text Data:
• Explore the 'cleaned_website_text' variable. What preprocessing steps would
you take to clean text data for analysis?
• Implement text cleaning techniques and explain their importance in preparing
data for text-based analysis.
B. Handling Missing Values:
• Identify if there are any missing values in the dataset. Propose strategies for
handling missing values, specifically in the 'cleaned_website_text' column.
4. Visualization:
A. Category Distribution Visualization:
• Create a bar chart or pie chart to visually represent the distribution of website
categories.
• How does the visualization help in understanding the balance or imbalance of
the dataset?
B. Text Data Visualization:
• Generate word clouds or frequency plots for the 'cleaned_website_text'
variable. What insights can be gained from these visualizations?
4
5. Model Development
A. Data Mining Analysis:
• Split the dataset into training and testing sets for model evaluation.
• Implement various machine learning algorithms for classification, such as logistic
regression, decision trees, or random forests.
B. Training and Evaluation
• Evaluate the performance of each model using metrics like accuracy, precision, recall,
and F**score.
• Discuss the challenges and considerations specific to evaluating a model for website
classification.
6. Advanced Techniques:
i. Feature Engineering:
• Propose additional features that could enhance the model's performance.
How might these features capture more nuanced information about websites?
ii.Dynamic Nature of Websites:
• Given the dynamic nature of the online environment, how could the model
adapt to newly emerging website types? Discuss strategies for model
adaptation.
7. Create Dashboard, Report and Conclusions:
• Summarize the findings, including insights gained from exploratory data analysis and
the performance of the classification model.
• How interpretable is the chosen model? Can you explain the decision-making process
of the model in the context of website classification?
• Provide recommendations for further improvements or considerations in the dynamic
landscape of web classification.
• Reflect on the challenges encountered during the analysis. What potential
improvements or future work would you recommend to enhance the model's
performance?
This assignment allows students to apply knowledge of data exploration, preprocessing, data
modelling, and model building to solve a real-world problem in the business domain. It also
encourages them to explore additional concepts for improving model performance.
5
• The complete Python program (source code (ipynb)) and report must be submitted to
Blackboard.
• Python Script (Program Code):
o Name the file under your name and SUKD number.
o Start the first two lines in your program by typing your name and SUKD
number. For example:
# Nor Anis Sulaiman
#SUKD20231234
o For each question, give an ID and explain what you want to discover. For example:
a. Explore the distribution of website categories in the dataset. Are there any specific
categories that are more prevalent than others?
b. Visualize the distribution of URL lengths and user comments lengths. Are there patterns
or outliers that could be informative for the classification model?
c. What steps would you take to clean and preprocess the URLs and user comments for
effective analysis?
d. How might you handle any missing values in the dataset, and what impact could they
have on the classification model?
e. Provide descriptive statistics for key variables such as URL lengths and user comments
lengths. What insights can be derived from these statistics?
f. Explore potential additional features that could enhance the model's ability to classify
websites accurately.
g. How might the inclusion of features derived from URLs or user comments contribute
to the overall model performance?
h. Choose a classification algorithm suitable for website classification. Explain your
choice.
i. Implement the chosen algorithm using Python and relevant libraries. What
considerations should be taken into account during the model implementation phase?
j. Split the dataset into training and testing sets. How would you assess the performance
of the model using metrics like accuracy, precision, recall, and F**score?
k. Discuss potential challenges in evaluating the model's effectiveness and generalization
to new websites.
l. Create visualizations to interpret the model's predictions and showcase its classification
performance.
Deliverables
6
As part of the assessment, you must submit the project report in printed and softcopy form,
which should have the following format:
A) Cover Page:
All reports must be prepared with a front cover. A protective transparent plastic sheet can be
placed in front of the report to protect the front cover. The front cover should be presented with
the following details:
o Module
o Coursework Title
o Intake
o Student name and ID
o Date Assigned (the date the report was handed out).
o Date Completed (the date the report is due to be handed in).
B) Contents:
• Introduction and assumptions (if any)
• Data import / Cleaning / pre-processing / transformation
• Each question must start in a separate page and contains:
o Analysis Techniques - data exploration / manipulation / visualization
o Screenshot of source code with the explanation.
o Screenshot of output/plot with the explanation.
o Outline the findings based on the results obtained.
• The extra feature explanation must be on a separate page and contain:
Documents: Coursework Report
7
o Screenshot of source code with the explanation.
o Screenshot of output/plot with the explanation.
o Explain how adding this extra feature can improve the results.
C) Conclusion
• Depth and breadth of analysis
• Quality and depth of feedback on the analysis process
• Reflection on learning and areas for improvement
D) References
• The font size used in the report must be 12pt, and the font is Times New Roman. Full
source code is not allowed to be included in the report. The report must be typed and
clearly printed.
• You may source algorithms and information from the Internet or books. Proper
referencing of the resources should be evident in the document.
• All references must be made using the APA (American Psychological Association)
referencing style as shown below:
o The theory was first propounded in 1970 (Larsen, A.E. 1971), but since then has
been refuted; M.K. Larsen (1983) is among those most energetic in their
opposition……….
o /**Following source code obtained from (Danang, S.N. 2002)*/
int noshape=2;
noshape=GetShape();
• A list of references at the end of your document or source code must be specified in the
following format:
Larsen, A.E. 1971, A Guide to the Aquatic Science Literature, McGraw-Hill, London.
Larsen, M.K. 1983, British Medical Journal [Online], Available from
http://libinfor.ume.maine.edu/acquatic.htm (Accessed 19 November 1995)
Danang, S.N., 2002, Finding Similar Images [Online], The Code Project, *Available
from http://www.codeproject.com/bitmap/cbir.asp, [Accessed 14th *September 2006]
Further information on other types of citation is available in Petrie, A., 2003, UWE
Library Services Study Skills: How to reference [online], England, University of
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

掃一掃在手機打開當前頁
  • 上一篇:ECM1410代做、代寫java編程設計
  • 下一篇:代做CS 550、代寫c++,Java編程語言
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    流體仿真外包多少錢_專業CFD分析代做_友商科技CAE仿真
    流體仿真外包多少錢_專業CFD分析代做_友商科
    CAE仿真分析代做公司 CFD流體仿真服務 管路流場仿真外包
    CAE仿真分析代做公司 CFD流體仿真服務 管路
    流體CFD仿真分析_代做咨詢服務_Fluent 仿真技術服務
    流體CFD仿真分析_代做咨詢服務_Fluent 仿真
    結構仿真分析服務_CAE代做咨詢外包_剛強度疲勞振動
    結構仿真分析服務_CAE代做咨詢外包_剛強度疲
    流體cfd仿真分析服務 7類仿真分析代做服務40個行業
    流體cfd仿真分析服務 7類仿真分析代做服務4
    超全面的拼多多電商運營技巧,多多開團助手,多多出評軟件徽y1698861
    超全面的拼多多電商運營技巧,多多開團助手
    CAE有限元仿真分析團隊,2026仿真代做咨詢服務平臺
    CAE有限元仿真分析團隊,2026仿真代做咨詢服
    釘釘簽到打卡位置修改神器,2026怎么修改定位在范圍內
    釘釘簽到打卡位置修改神器,2026怎么修改定
  • 短信驗證碼 豆包網頁版入口 破天一劍 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    国产人妻人伦精品_欧美一区二区三区图_亚洲欧洲久久_日韩美女av在线免费观看
    狠狠色综合一区二区| 国产精品69久久| 色综合久久88| 久久精品视频99| 日韩有码在线播放| 久久久久久美女| 国产成人综合av| 久久视频在线观看中文字幕| 91九色综合久久| 91久久嫩草影院一区二区| 国产午夜福利视频在线观看| 国产一区二区在线观看免费播放| 黄在线观看网站| 免费在线一区二区| 国产深夜精品福利| 成人免费福利视频| 91精品国产乱码久久久久久蜜臀| 91精品国产91久久久久久最新| 免费国产成人看片在线| 国产在线观看不卡| 国产精品一区二区久久| 91精品久久久久久久久久另类| 91精品免费久久久久久久久| 久久99蜜桃综合影院免费观看| 精品国产一区二区三区久久狼5月 精品国产一区二区三区久久久狼 精品国产一区二区三区久久久 | 91成人福利在线| 成人a免费视频| 国产激情片在线观看| 国产成人一区二| 日韩中文字幕精品| 国产精品都在这里| 久久久久久国产精品美女| 手机看片福利永久国产日韩| 日韩欧美在线免费观看视频| 狠狠色综合欧美激情| 国产欧美自拍视频| 国产高清精品一区二区| 国产精品偷伦免费视频观看的| 成人97在线观看视频| 亚洲国产精品女人| 精品欧美一区二区在线观看视频| 精品婷婷色一区二区三区蜜桃| 成人一级生活片| 九色综合日本| 欧美大肥婆大肥bbbbb| 亚洲欧洲另类精品久久综合| 欧美又大又粗又长| 国产精品一国产精品最新章节| 久久婷婷开心| 国产精品久久激情| 日韩av电影在线观看| 国语自产精品视频在线看| av观看久久| 久久亚洲国产成人| 日韩精品一区二区三区外面| 美女被啪啪一区二区| 国产精品一区视频| 日韩在线免费视频观看| 亚洲一区免费看| 今天免费高清在线观看国语| 久久免费高清视频| 欧美精品久久久久久久久 | 在线观看污视频| 日韩人妻精品一区二区三区| 国产欧美va欧美va香蕉在线| 色狠狠久久aa北条麻妃| 亚洲综合视频一区| 好吊色欧美一区二区三区| 91国产精品91| 精品中文字幕在线2019| 日韩人妻精品一区二区三区| 99久久精品无码一区二区毛片| 国产精品黄页免费高清在线观看 | 91av在线精品| 欧美激情第三页| 狠狠干 狠狠操| 久久96国产精品久久99软件| 亚洲在线视频一区二区| 黄色小视频大全| 日韩中文娱乐网| 日本精品免费| 久久久综合香蕉尹人综合网| 一本一本a久久| 国产日韩欧美亚洲一区| 国产精品女人久久久久久| 日韩欧美精品在线不卡| 91九色国产视频| 中文字幕人成一区| 国产中文一区二区| 国产精品视频精品视频| 欧日韩免费视频| 日韩专区在线播放| 欧美亚洲黄色片| 久久久久久久97| 欧美一级在线播放| 国产精华一区二区三区| 亚洲欧美一区二区原创| 99精品在线免费视频| 中文字幕在线中文| 成人久久精品视频| 一本二本三本亚洲码| 国产伦精品一区二区三区视频孕妇| 国产精品成人一区二区三区| 精品视频一区二区三区四区| 两个人的视频www国产精品| 黄色大片中文字幕| 欧美成人精品一区二区| 国产欧美日韩伦理| 欧美激情中文字幕乱码免费| 不卡日韩av| 亚洲欧洲一二三| 久久免费视频在线观看| 日本精品国语自产拍在线观看| 久久久久久久999精品视频| 欧美一区二区综合| 国产精品免费一区二区三区| 国产人妻777人伦精品hd| 一区二区三区在线观看www| 97人人模人人爽人人喊中文字| 亚洲天堂第一区| 国产精品999999| 欧美牲交a欧美牲交aⅴ免费下载| 国产精品丝袜视频| 欧美 日韩 国产精品| 久久99精品视频一区97| 91精品国产91久久久久久久久 | 日本精品一区二区三区不卡无字幕 | 日本精品久久久久久久| 国产成人精品电影久久久| 亚洲日本精品国产第一区| 国产xxx69麻豆国语对白| 欧美激情精品久久久久久小说| 国产亚洲综合视频| 亚洲欧洲精品在线观看| 久久久精品日本| 国产精品永久免费在线| 日韩免费在线看| 一区二区三区日韩视频| 亚洲成人av动漫| 久久福利视频网| 久久久亚洲国产精品| 欧美日韩大片一区二区三区| 美女视频久久黄| 91久久久久久久久久久久久| 日韩无套无码精品| 精品国产第一页| 色婷婷综合成人av| 国产日韩在线播放| 日韩欧美电影一区二区| 国产老熟妇精品观看| 人人爽久久涩噜噜噜网站| 欧美成年人在线观看| 久久亚洲中文字幕无码| 精品日韩在线播放| 日本一欧美一欧美一亚洲视频| 欧美不卡视频一区发布| 国产激情美女久久久久久吹潮| 国产视频一区二区不卡| 亚洲黄色成人久久久| 久久av.com| 久久精品99久久久香蕉| 91九色在线观看视频| 精品亚洲第一| 日韩av成人在线观看| 中文字幕精品—区二区日日骚| 国产成人精品无码播放| 国产精彩视频一区二区| 国产精品一线二线三线 | 中文字幕无码精品亚洲35| 国产精品日韩久久久久| 九色一区二区| 99精品人妻少妇一区二区| 国产视频99| 国产主播在线一区| 欧日韩免费视频| 日产精品久久久一区二区| 亚洲天堂av免费在线观看| 欧美另类99xxxxx| 久久激情视频免费观看| 久久精精品视频| 国产成人在线视频| 7777精品久久久久久| y111111国产精品久久婷婷| 国产欧美日韩一区| 国产一区二区黄色| 国产日韩中文字幕在线| 黄色av免费在线播放| 欧美日韩在线成人| 欧美日韩在线观看一区| 僵尸世界大战2 在线播放| 欧美连裤袜在线视频| 欧美精品一区三区在线观看| 欧美一区免费视频| 欧美激情精品久久久久久小说 | 国产欧美在线视频| 国产亚洲精品网站| 国产日韩欧美中文| 国产精品一二三在线| 国产精品一二三在线观看 | 国产女主播自拍|