数据分析实战—在线销售数据分析

import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
warnings.filterwarnings('ignore', category=UserWarning, module='seaborn')

导入数据集

df = pd.read_csv("/kaggle/input/online-sales-dataset-popular-marketplace-data/Online Sales Data.csv")
df.head(3)

每个类别中最畅销的产品

colors = sns.color_palette('pastel')

top_selling_products = df.groupby('Product Name')['Total Revenue'].sum().sort_values(ascending=False).head(10)
top_selling_products.plot(kind='bar', color=colors)
plt.title('Top 10 Selling Products')
plt.show()

哪类产品的总收入最高？

哪个产品名称产生的总收入最高？

在给定的时间段内，总收入的月度销售趋势是什么？

数值变量之间的关系是什么

哪个地区的总收入最高？

region_revenue = df.groupby('Region')['Total Revenue'].sum().reset_index().sort_values(by='Total Revenue', ascending=False)
print(region_revenue)

plt.figure(figsize=(6, 3))
sns.barplot(x='Region', y='Total Revenue', data=region_revenue)
plt.title('Total Revenue by Region')
plt.xlabel('Region')
plt.ylabel('Total Revenue')
plt.show()

不同地区的平均单价有何不同？

region_unit_price = df.groupby('Region')['Unit Price'].mean().reset_index().sort_values(by='Unit Price', ascending=False)

print(region_unit_price)

          Region  Unit Price
2  North America   353.87225
1         Europe   190.90425
0           Asia   164.41025

plt.figure(figsize=(10, 6))
sns.barplot(x='Region', y='Unit Price', data=region_unit_price)
plt.title('Average Unit Price by Region')
plt.xlabel('Region')
plt.ylabel('Average Unit Price')
plt.show()

每笔交易售出的单位的分布是多少？

plt.figure(figsize=(10, 6))
sns.histplot(df['Units Sold'], bins=10, kde=True)
plt.title('Distribution of Units Sold per Transaction')
plt.xlabel('Units Sold')
plt.ylabel('Frequency')
plt.show()

/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):

哪类产品每笔交易的平均销量最高？

category_units_sold = df.groupby('Product Category')['Units Sold'].mean().reset_index().sort_values(by='Units Sold', ascending=False)

print(category_units_sold)

  Product Category  Units Sold
2         Clothing       3.625
1            Books       2.850
5           Sports       2.200
3      Electronics       1.650
4  Home Appliances       1.475
0  Beauty Products       1.150

plt.figure(figsize=(10, 6))
sns.barplot(x='Product Category', y='Units Sold', data=category_units_sold)
plt.title('Average Units Sold per Transaction by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Average Units Sold')
plt.show()

客户最常用的支付方式是什么？

# Counting the number of transactions per Payment Method
payment_methods = df['Payment Method'].value_counts().reset_index(name='Count').rename(columns={'index': 'Payment Method'})
print(payment_methods)

  Payment Method  Count
0    Credit Card    120
1         PayPal     80
2     Debit Card     40

不同产品类别的单位价格如何不同？

plt.figure(figsize=(14, 7))
sns.boxplot(x='Product Category', y='Unit Price', data=df)
plt.title('Unit Prices by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Unit Price')
plt.xticks(rotation=45)
plt.show()

不同地区的总收入有何不同？

plt.figure(figsize=(14, 7))
sns.boxplot(x='Region', y='Total Revenue', data=df)
plt.title('Total Revenue by Region')
plt.xlabel('Region')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()

销售的单位数与总收入之间是否有相关性？

plt.figure(figsize=(14, 7))
sns.scatterplot(x='Units Sold', y='Total Revenue', data=df)
plt.title('Units Sold vs. Total Revenue')
plt.xlabel('Units Sold')
plt.ylabel('Total Revenue')
plt.show()

单价与总收入有什么关系？

plt.figure(figsize=(14, 7))
sns.scatterplot(x='Unit Price', y='Total Revenue', data=df)
plt.title('Unit Price vs. Total Revenue')
plt.xlabel('Unit Price')
plt.ylabel('Total Revenue')
plt.show()

不同产品类别的销售单位的分配情况有何不同？

plt.figure(figsize=(12, 8))
sns.boxplot(x='Units Sold', y='Product Category', data=df, palette='pastel')

# Set title and labels
plt.title('Units Sold by Product Category', fontsize=15, fontweight='bold')
plt.xlabel('Units Sold')
plt.ylabel('Product Category')

# Customize grid and background
plt.grid(True, linestyle='--', linewidth=0.5)
plt.gca().set_facecolor('white')
for spine in plt.gca().spines.values():
    spine.set_edgecolor('black')

# Show the plot
plt.show()

在不同地区所使用的支付方式的分布情况如何？

conf_matrix = pd.crosstab(df['Region'], df['Payment Method'], margins=True)

print(conf_matrix)

Payment Method  Credit Card  Debit Card  PayPal  All
Region                                              
Asia                     40          40       0   80
Europe                    0           0      80   80
North America            80           0       0   80
All                     120          40      80  240


numeric_df = df.select_dtypes(include='number')
corr_matrix = numeric_df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()

在每个地区的总收入中是否有任何异常值？

plt.figure(figsize=(14, 7))
sns.boxplot(x='Region', y='Total Revenue', data=df)
plt.title('Outliers in Total Revenue by Region')
plt.xlabel('Region')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()

总收入随时间的推移而出现的趋势是什么？

df['Month'] = df['Date'].dt.month_name()
plt.figure(figsize=(18, 6))
sns.lineplot(data=df, x='Month', y='Total Revenue', linewidth=2.5)
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.title('Revenue Over Time')
plt.xticks(rotation=45)
plt.show()

/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):

不同的支付方式的总收入如何随时间而变化？

plt.figure(figsize=(18, 6))
sns.lineplot(data=df, x='Month', y='Total Revenue', hue='Payment Method', linewidth=2.5)
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.title('Revenue Over Time')
plt.xticks(rotation=45)
plt.show()

/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)

标签：数据分析,实战,plt,在线,Revenue,df,Region,Total,data
From： https://blog.csdn.net/m_aolifande/article/details/139550652

数据分析实战—在线销售数据分析

关于数据集

列：

前言

引入文件

导入数据集

每个类别中最畅销的产品

哪类产品的总收入最高？

哪个产品名称产生的总收入最高？

在给定的时间段内，总收入的月度销售趋势是什么？

数值变量之间的关系是什么

哪个地区的总收入最高？

不同地区的平均单价有何不同？

每笔交易售出的单位的分布是多少？

哪类产品每笔交易的平均销量最高？

客户最常用的支付方式是什么？

不同产品类别的单位价格如何不同？

不同地区的总收入有何不同？

销售的单位数与总收入之间是否有相关性？

单价与总收入有什么关系？

不同产品类别的销售单位的分配情况有何不同？

在不同地区所使用的支付方式的分布情况如何？

在每个地区的总收入中是否有任何异常值？

总收入随时间的推移而出现的趋势是什么？

不同的支付方式的总收入如何随时间而变化？

相关文章

赞助商

阅读排行