首页 > 其他分享 >3DA3 C02 Predictive Data Analytics

3DA3 C02 Predictive Data Analytics

时间:2024-10-20 10:48:07浏览次数:6  
标签:customer Predictive orders id products C02 csv Data order

Assignment 1, Commerce 3DA3 C02 -

Predictive Data AnalyticsTo complete this assignment, please create a Jupyter notebook. The code in your jupyternotebook should provide answers to questions asked in the assignment. Please submitthe assignment by uploading the file(s) into the "Assignment 1" folder onAvenuetoLearn. You can find this folder under "Assessments>Assignments" on the course page.The deadline for submission is 11:59PM on Monday Oct. 21.

BackgroundIn the past decade, we witnessed the rise of online grocery shopping. With the

convenience of ordering groceries from the comfort of home, more people are turning todigital platforms for their everyday needs. This shift has been further fueled by factorssuch as busy lifestyles, the increasing use of mobile devices, and the covid-19pandemic, which underscored the importance of contactless shopping.For online grocery platforms, conducting data analysis on sales records is critical forunderstanding customer behavior, enhancing the overall shopping experience, and makedata-driven decisions that lead to higher customer satisfaction and profitability.

ata: We will make use of two datasets from the transaction records of an onlinegrocery delivery platform, stored in the files orders.csv (click to download) and

order_products.csv (click to download).The dataset in orders.csv includes the following columns:

order_id: This is the unique identifier of every customer ordercustomer_id: This is the unique identifier of every customer who placed the order

order_dow: This indicates the day of the week, on which the order took place. 0

stands for Sunday, 1-5 indiates Monday-Friday, and 6 indicates a Saturday.

order_hour_of_day: This indicates during which hour the order took place; for

example, 14 indicates that the order was placed between 14:00 and 14:59.

days_since_prior_order: This indicates how many days have passed since the

customer's last order

coupon_use: This shows if the customer used a coupon to (partially) pay for theorderThe dataset in order_products.csv records which products are purchased in an order. ItImagine that you are a data analyst at the grocery delivery platform. Based on thedatasets, please answer the following questions/tasks.

Questions 0.In the first cell of your Jupyter notebook, please create the following as markdown. Addyour first and last name, and your Student ID.includes the following columns:order_id: This is the order idenfitier (same as in order.csv).product_id: This is the identifier of a product that is purchased in the corresponding

orderquantity: This is the quantity of the product purchased in the corresponding order.unit_price: This is the unit price (in dollars) of the product purchased in thecorresponding ordercustomer_id: This is the identifier of the customer who purchased the product.Please note that order_id in order_products.csv does not need to be unique. If tworows in order_products.csv share the same order_id, it means that in the same order,the products in those two rows are both purchased.For example, suppose that the following row exists in order.csv:order_id customer_id order_dow order_hour_of_day days_since_prior_order coupon_useand the following two rows exist in order_products.csv:order_id product_id quantity unit price customer_idthen we know that in the same order (order_id O1234), 1 unit of product P0217 and 2units of product P0219 are purchased. And this order O1234 is the same order as theorder O1234 in order.csv.Important: For the remaining questions, please make sure to create a markdown cellbefore you answer each question and in it indicate the question number, e.g.,

Question 1, Question 2, etc.

For each question, you should use one or more code cells to present your codes. Pleasemake sure that you run each cell and display all the requested results. Please alsoensure that you will use markdown cells to provide necessary explanations of yourcodes and results.The Jupyter notebook should be a easy-to-read report that presents your analysis andresults. The grading will be based on both the correctness of your coding and thereadability of your notebook.

Question 1.

Import the two .csv files and assign them to a dataframe called df_orders anddf_order_products , respectively. Then,use a line of codes to review the first few rows of the dataframes. The result shouldbe clearly displayed 代 写3DA3 C02 Predictive Data Analytics  in the notebook after you run the code cells.get the structures of the dataframes (number of rows, column types, etc.) using theinfo() function. Review the first few rows of the dataframe.In a markdown cell,explain the results returned by this function as comprehensive as youcan..

Question 2.For the DataFrame df_orders loaded from orders.csv, perform the following steps in

the given order.

  1. Find how many missing value each column contains.
  2. For any missing value in the column order_id , replace it with 'unknown_order'
  3. For any missing value in the column customer_id , replace it with

'unknown_customer'4. For any missing value in the column days_since_prior_order , replace it withthe mean value of the column

  1. After completing the above steps, repeat the codes in Step 1 to check again thenumber of missing values in each column
  1. For any remaining missing values, drop all rows containing a missing value

Question 3.

The grocery delivery platform is interested in assessing if offering coupons will increasecustomers' purchase frequency. To that end, let us again make use of the DataFramedf_orders (loaded from orders.csv) to perform the following tasks.

  1. Select all rows in df_orders where use of a coupon is yes , and assign thoserows as a new DataFrame named df_orders_coupon .
  1. Calculate the mean value of 'days_since_prior_order' in df_orders_coupon .
  2. Select all rows in df_orders where use of a coupon is no , and assign those rowsas a new DataFrame named df_orders_no_coupon .
  1. Calculate the mean value of 'days_since_prior_order' in df_orders_no_coupon .Based on your findings of the above steps, answer the following question in amarkdown cell:Is the use of coupon associated with higher/lower order frequency?Please briefly explain your answer in the markdown cell.

Questions 4.The platform is also interested in measuring the total number of orders received on each

day of the week. To do this, they would like you to complete the following tasks.Divide the order id's in the 'order_id' column of the DataFrame df_orders (loadedfromorders.csv) into groups, based on the day of the week ('order_dow') when theorder is placed. The result should be a Groupby object.Construct and display the content of a pandas Series, which should show the totalnumber of orders for each day of the week.

Question 5.As observed, each row of the data in order_products.csv is the sales information of a

product in a certain order. The information includes the per-unit price and number ofunits ordered, but it does not directly provide the revenue.Let us now create anewcolumn named 'revenue' in the DataFramedf_order_products constructed from order_products.csv. For each row, thecolumn should contain the corresponding revenue, calcuated as 'quantity' 'unit price'.

See the following two-row example for a demonstration.order_id product_id quantity unit_price customer_id revenue

C621739.98After you have added the new column, further complete the following tasks:Display the first few rows of the updated df_order_products DataFrame.Calculate the total revenue by summing up revenues in each row.

Question 6

From time to time, there will be customers who would like to review their purchaserecord. To do that, they will need to supply their customer id.Suppose a customer with the id '0421MWMT' just contacted Customer Service andwould like to see all their purchases. Perform the following tasks for the customer.Select all rows related to this customer's purchases in the DataFramedf_order_products (loaded from order_products.csv), and assign them to anew DataFrame named 'df_cust_inquiry'. Display the content of this DataFrame.Calculate the customer's total purchase in dollar amount.×

标签:customer,Predictive,orders,id,products,C02,csv,Data,order
From: https://www.cnblogs.com/goodlunn/p/18475760

相关文章

  • 论文翻译:arxiv-2024.Dillon Bowen.Scaling Laws for Data Poisoning in LLMs
    ScalingLawsforDataPoisoninginLLMshttps://arxiv.org/pdf/2408.02946论文主要研究了大型语言模型在数据中毒威胁下的脆弱性,发现模型规模越大,对有害行为的学习速度越快,强调了在更大模型中建立健全数据保护措施的必要性。在大型语言模型(LLMs)中数据投毒的规模法则......
  • Task03:数据类型和操作 Data Types and Opeartors
    Python数据类型与表达式:数据转换视角下的高效编程策略一、引言1.1研究背景在当今的编程领域,Python以其简洁性、易读性和强大的功能而备受青睐。Python数据类型与表达式在编程中具有至关重要的地位。Python的数据类型丰富多样,包括整型、浮点型、布尔型、None类型以及......
  • jar包运行报错1.无主属性清单 2.外部Jar包未导入 3.Data Source url报错解决 4.端口占
    相信大家mvnpackage打包成jar包后放到服务器上面运行后遇到一些很头疼的问题,怎么按照百度、gpt、csdn上面的博客修改就是成功不了但是!今天!博主带着自己尝试多次的血泪经验为大家解答以上三大问题!接下来以“代码+解析”的方式解析大家的问题一、无主属性问题报错如下图这......
  • “System.Data”中不存在类型或命名空间名“TypedTableBase<>”
    报错信息表明您正在尝试使用 System.Data 命名空间中不存在的类型 TypedTableBase<>。这通常意味着您的项目缺少必要的引用或者是因为某些原因无法识别这个类型。解决方法: 确认项目中是否已经引用了包含 TypedTableBase<> 类型的程序集。通常,这个类型位于 System.Dat......
  • 《Pytorch深度学习实践》P8 Dataset and DataLoader 笔记+代码+作业:DataLoader的使用
     b站的up主刘二大人的《PyTorch深度学习实践》P8笔记+代码,视频链接。所需糖尿病数据可以在视频评论区下方的网盘资源里下载(转载评论区):链接:https://pan.baidu.com/s/1cUI518pgLWY1oCn2DkjuEQ?pwd=kif1 提取码:kif1或者是点击链接下载:【免费】b站的up主刘二大人的《PyTorc......
  • 启动service报错ORA-44317: database open read-only
    ADG(RAC)备库环境,srvctl添加service服务成功,启动service时报错ORA-44317:databaseopenread-only。这是预期行为,使用“srvctladdservice-d<db_name>-s<service_name>”创建服务时,将在OCR中创建和注册服务,但在使用“srvctlstartservice-d<db_name>-s<service_n......
  • 出现WrongArgumentException: Malformed database URL, failed to parse the connecti
    目录1.问题所示2.原理分析3.解决方法1.问题所示编辑数据源的时候,后端出现如下BugThelastpacketsentsuccessfullytotheserverwas0millisecondsago.Thedriverhasnotreceivedanypacketsfromtheserver.com.mysql.cj.jdbc.exceptions.Com......
  • C++ WM\_COPYDATA 实现进程通信
    基于MFC用于存储数据的自定义结构体:structMSG_STRUCT{ wchar_tmsg[256];};发送端,发送按钮响应事件的代码voidCSendWM_COPYDATADlg::OnBnClickedSendmessage(){ MSG_STRUCTsmsg; GetDlgItemText(IDC_MESSAGE,smsg.msg,256); HWNDhTargetWnd=NULL; hTargetWnd=::Fi......
  • 深度学习_多层感知机基于Heart Disease UCI 数据集中的processed.cleveland.data训练
    多层感知机(Muti-Layerperceptron)#1.数据导入importpandasaspdnames=["age","sex","cp","trestbps","chol","fbs","restecg",......
  • 数据预处理-DataFrame切片
    此Blog仅作为日常学习工作中记录使用,Blog中有不足之处欢迎指出以kaggle中房屋预测的训练集为例,说明DataFrame切片常用操作一、读入数据importnumpyasnpimportpandasaspdfile_path='***\kaggle_house_pred_train.csv'data=pd.read_csv(file_path)data.columns......