目录
关于数据集
该数据集提供了不同产品类别的在线销售交易的全面概述。每行代表一个交易记录,其中包含详细信息,例如订单 ID、日期、类别、产品名称、销售数量、单价、总价、区域和付款方式。
列:
- 订单编号:每个销售订单的唯一标识符。
- 日期:销售交易记录的日期。
- 类别:所售产品的广泛类别(例如,电子产品、家用电器、服装、书籍、美容产品、运动用品)。
- 中文名称:所售产品的具体名称或型号。
- 数量:交易中销售的产品单位数。
- 单价:一个单位产品的价格。
- 总价:销售交易产生的总收入(数量 * 单价)。
- 地区:发生交易的地理区域(例如,北美、欧洲、亚洲)。
- 付款方式:用于付款的方式(例如,信用卡、PayPal、借记卡)。
前言
- 1. 分析一段时间内的销售趋势,以确定季节性模式或增长机会。
- 2. 探索不同产品类别在不同地区的受欢迎程度。
- 3. 调查支付方式对销量或收入的影响。
- 4. 确定每个类别中最畅销的产品,以优化库存和营销策略。
- 5. 评估特定产品或品类在不同地区的表现,以相应地定制营销活动。
数据集
引入文件
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
warnings.filterwarnings('ignore', category=UserWarning, module='seaborn')
导入数据集
df = pd.read_csv("/kaggle/input/online-sales-dataset-popular-marketplace-data/Online Sales Data.csv")
df.head(3)
每个类别中最畅销的产品
colors = sns.color_palette('pastel')
top_selling_products = df.groupby('Product Name')['Total Revenue'].sum().sort_values(ascending=False).head(10)
top_selling_products.plot(kind='bar', color=colors)
plt.title('Top 10 Selling Products')
plt.show()
哪类产品的总收入最高?
哪个产品名称产生的总收入最高?
在给定的时间段内,总收入的月度销售趋势是什么?
数值变量之间的关系是什么
哪个地区的总收入最高?
region_revenue = df.groupby('Region')['Total Revenue'].sum().reset_index().sort_values(by='Total Revenue', ascending=False)
print(region_revenue)
plt.figure(figsize=(6, 3))
sns.barplot(x='Region', y='Total Revenue', data=region_revenue)
plt.title('Total Revenue by Region')
plt.xlabel('Region')
plt.ylabel('Total Revenue')
plt.show()
不同地区的平均单价有何不同?
region_unit_price = df.groupby('Region')['Unit Price'].mean().reset_index().sort_values(by='Unit Price', ascending=False)
print(region_unit_price)
Region Unit Price 2 North America 353.87225 1 Europe 190.90425 0 Asia 164.41025
plt.figure(figsize=(10, 6))
sns.barplot(x='Region', y='Unit Price', data=region_unit_price)
plt.title('Average Unit Price by Region')
plt.xlabel('Region')
plt.ylabel('Average Unit Price')
plt.show()
每笔交易售出的单位的分布是多少?
plt.figure(figsize=(10, 6))
sns.histplot(df['Units Sold'], bins=10, kde=True)
plt.title('Distribution of Units Sold per Transaction')
plt.xlabel('Units Sold')
plt.ylabel('Frequency')
plt.show()
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
哪类产品每笔交易的平均销量最高?
category_units_sold = df.groupby('Product Category')['Units Sold'].mean().reset_index().sort_values(by='Units Sold', ascending=False)
print(category_units_sold)
Product Category Units Sold 2 Clothing 3.625 1 Books 2.850 5 Sports 2.200 3 Electronics 1.650 4 Home Appliances 1.475 0 Beauty Products 1.150
plt.figure(figsize=(10, 6))
sns.barplot(x='Product Category', y='Units Sold', data=category_units_sold)
plt.title('Average Units Sold per Transaction by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Average Units Sold')
plt.show()
客户最常用的支付方式是什么?
# Counting the number of transactions per Payment Method
payment_methods = df['Payment Method'].value_counts().reset_index(name='Count').rename(columns={'index': 'Payment Method'})
print(payment_methods)
Payment Method Count 0 Credit Card 120 1 PayPal 80 2 Debit Card 40
不同产品类别的单位价格如何不同?
plt.figure(figsize=(14, 7))
sns.boxplot(x='Product Category', y='Unit Price', data=df)
plt.title('Unit Prices by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Unit Price')
plt.xticks(rotation=45)
plt.show()
不同地区的总收入有何不同?
plt.figure(figsize=(14, 7))
sns.boxplot(x='Region', y='Total Revenue', data=df)
plt.title('Total Revenue by Region')
plt.xlabel('Region')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()
销售的单位数与总收入之间是否有相关性?
plt.figure(figsize=(14, 7))
sns.scatterplot(x='Units Sold', y='Total Revenue', data=df)
plt.title('Units Sold vs. Total Revenue')
plt.xlabel('Units Sold')
plt.ylabel('Total Revenue')
plt.show()
单价与总收入有什么关系?
plt.figure(figsize=(14, 7))
sns.scatterplot(x='Unit Price', y='Total Revenue', data=df)
plt.title('Unit Price vs. Total Revenue')
plt.xlabel('Unit Price')
plt.ylabel('Total Revenue')
plt.show()
不同产品类别的销售单位的分配情况有何不同?
plt.figure(figsize=(12, 8))
sns.boxplot(x='Units Sold', y='Product Category', data=df, palette='pastel')
# Set title and labels
plt.title('Units Sold by Product Category', fontsize=15, fontweight='bold')
plt.xlabel('Units Sold')
plt.ylabel('Product Category')
# Customize grid and background
plt.grid(True, linestyle='--', linewidth=0.5)
plt.gca().set_facecolor('white')
for spine in plt.gca().spines.values():
spine.set_edgecolor('black')
# Show the plot
plt.show()
在不同地区所使用的支付方式的分布情况如何?
conf_matrix = pd.crosstab(df['Region'], df['Payment Method'], margins=True)
print(conf_matrix)
Payment Method Credit Card Debit Card PayPal All Region Asia 40 40 0 80 Europe 0 0 80 80 North America 80 0 0 80 All 120 40 80 240
numeric_df = df.select_dtypes(include='number')
corr_matrix = numeric_df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()
在每个地区的总收入中是否有任何异常值?
plt.figure(figsize=(14, 7))
sns.boxplot(x='Region', y='Total Revenue', data=df)
plt.title('Outliers in Total Revenue by Region')
plt.xlabel('Region')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()
总收入随时间的推移而出现的趋势是什么?
df['Month'] = df['Date'].dt.month_name()
plt.figure(figsize=(18, 6))
sns.lineplot(data=df, x='Month', y='Total Revenue', linewidth=2.5)
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.title('Revenue Over Time')
plt.xticks(rotation=45)
plt.show()
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
不同的支付方式的总收入如何随时间而变化?
plt.figure(figsize=(18, 6))
sns.lineplot(data=df, x='Month', y='Total Revenue', hue='Payment Method', linewidth=2.5)
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.title('Revenue Over Time')
plt.xticks(rotation=45)
plt.show()
/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key) /opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning. data_subset = grouped_data.get_group(pd_key)
标签:数据分析,实战,plt,在线,Revenue,df,Region,Total,data From: https://blog.csdn.net/m_aolifande/article/details/139550652