Python数据分析-移动设备使用情况和用户行为分析

标签：数据分析 set 16 Python axes 用户 df fontsize

一、研究背景

在信息化飞速发展的今天，移动设备已成为人们生活和工作中的必备工具。智能手机普及率持续增长，用户使用行为不断增多，从娱乐、社交到办公、学习，手机的使用已渗透到各个年龄段和社会群体。移动设备使用情况的多样化，为研究用户行为模式和手机使用偏好提供了丰富的数据基础。

随着大数据技术的应用，越来越多的研究开始关注用户行为数据的挖掘及其对产品设计、市场营销的潜在影响。通过分析用户的移动设备使用数据，例如应用程序的使用时间、屏幕开启时间、电池消耗和流量使用情况，可以更好地了解用户需求和行为习惯。这不仅能帮助应用开发者和运营商优化产品和服务，还能在用户体验提升、流量管理及设备创新等方面带来新的机会。

二、研究意义

本研究的主要意义在于：

用户需求的精准识别：通过数据分析，我们可以识别不同用户群体的需求。例如，青少年用户可能偏好社交类应用，中年用户可能更倾向于办公类应用。基于这些信息，应用开发者可以优化应用功能，满足特定群体的需求。
市场营销的优化：在当今个性化营销兴起的背景下，基于用户的行为数据划分出不同的用户行为类别（如轻度使用、适度使用和极端使用）可用于定向广告投放。品牌可以根据用户的行为特点进行个性化推送，从而提高广告转化率。
流量和电池管理的优化：移动运营商可以根据用户的流量使用模式进行流量管理和资源分配。例如，识别出极端使用的用户群体，可以帮助运营商为这些用户提供更高效的流量套餐或电池优化方案，以提升用户体验和满意度。
设备设计和功能改进：对于智能手机制造商，了解不同性别、年龄段用户的使用偏好，可以为设备设计提供数据支持。例如，电池消耗较高的用户可能更关注续航能力，而应用使用较频繁的用户可能更在意处理器性能。基于这些数据，制造商可以设计更符合用户需求的产品。

三、实证分析

代码和数据集

此数据集提供了对移动设备使用模式和用户行为分类的全面分析。它包含 700 个用户数据样本，包括应用程序使用时间、屏幕开启时间、电池消耗和数据消耗等指标。每个条目都分为五个用户行为类之一，范围从轻度到极端使用，允许进行深入的分析和建模。

主要特点：

User ID：每个用户的唯一标识符。
Device Model（设备型号）：用户智能手机的型号。
操作系统：设备的操作系统（iOS 或 Android）。
应用程序使用时间：每天在移动应用程序上花费的时间，以分钟为单位。
屏幕开启时间：屏幕每天处于活动状态的平均小时数。
电池消耗：每日电池消耗量（以 mAh 为单位）。
已安装的应用程序数：设备上可用的应用程序总数。
数据使用量：每日移动数据消耗量（以 MB 为单位）。
年龄：用户的年龄。
性别：用户的性别（男性或女性）。
用户行为类：根据使用模式（1 到 5）对用户行为进行分类。

导入基本数据分析的包

import pandas as pd
import matplotlib.pyplot as plt
import csv
import seaborn as sns
import warnings
import numpy as np
warnings.filterwarnings('ignore')

读取数据集并查看前五行

df=pd.read_csv("user_behavior_dataset.csv")
df.head()

查看缺失值情况

df.isnull().sum()

未发现缺失值

对数值型数据进行描述性统计分析

查看一下数据具体结构和类型

接下来进行数据预处理，将性别、操作系统、用户行为转换为分类 dtype

df['Gender'] = df['Gender'].astype('category')
df['Operating System'] = df['Operating System'].astype('category')
df['User Behavior Class'] = df['User Behavior Class'].astype('category')

age_bins = [18, 25, 35, 45, 55, 65]
age_labels = [
    '18-24_Gen_Z', 
    '25-34_Mille(Early)', 
    '35-44_Mille(Late)', 
    '45-54_Gen_X(Early)', 
    '55-64_Gen_X(Late)'
]

然后查看一下列名

接下来对数据集进行EDA和可视化展示

import seaborn as sns
import matplotlib.pyplot as plt 
import seaborn as sns 
import datetime
%matplotlib inline
plt.rcParams['font.sans-serif'] = ['KaiTi']  #中文
plt.rcParams['axes.unicode_minus'] = False   #负号
import pandas as pd


# 创建一个2x2的子图网格
custom_colors = ['#2ca02c', '#d62728']
fig, axes = plt.subplots(2, 2, figsize=(16, 10), dpi = 200)
axes = axes.flatten()

# === 子图1：按年龄范围（男性和女性）划分的设备型号受欢迎程度 ===
axes[0].set_title('设备模型分布', fontsize = 14)
sns.countplot(x='Device Model', data = df, ax = axes[0])
axes[0].set_xlabel('Age Group', fontsize = 12)
axes[0].set_ylabel('Count', fontsize = 12)
axes[0].tick_params(axis = 'x', rotation = 0)
axes[0].grid(True, linestyle = '--', alpha = 0.3)

# === 子图2：按性别划分的设备型号受欢迎程度 ===
df_Device_Model_count = df['Device Model'].value_counts().reset_index()
# 修改列名
df_Device_Model_count.columns = ['Device Model', 'count']
axes[1].set_title('设备型号共享', fontsize = 14)
axes[1].pie(df_Device_Model_count['count'], labels = df_Device_Model_count['Device Model'],
            autopct = '%1.1f%%', startangle = 140, wedgeprops = dict(width = 0.6), colors = custom_colors,
            pctdistance = 0.85)

# === 子图3：各年龄段设备型号热度（女性） ===
df_Operating_System_count = df['Operating System'].value_counts().reset_index()
# 修改列名
df_Operating_System_count.columns = ['Operating System', 'count']
custom_colors = ['#2ca02c', '#d62728']
axes[2].set_title('操作系统分发', fontsize = 14)
axes[2].pie(df_Operating_System_count['count'], labels = df_Operating_System_count['Operating System'],
            autopct = '%1.1f%%', startangle = 140, wedgeprops = dict(width = 0.6), colors = custom_colors,
            pctdistance = 0.85)

# === 子图4：各年龄段（男性）设备型号受欢迎程度 ===
df_gender_count = df['Gender'].value_counts().reset_index()
# 修改列名
df_gender_count.columns = ['Gender', 'count']
axes[3].set_title('性别分布', fontsize = 14)
axes[3].pie(df_gender_count['count'], labels = df_gender_count['Gender'],
            autopct = '%1.1f%%', startangle = 140, wedgeprops = dict(width = 0.6), colors = custom_colors,
            pctdistance = 0.85)

plt.tight_layout(pad = 3)
plt.show()

fig, axes = plt.subplots(2, 2, figsize=(24, 18),dpi=200)
axes = axes.flatten()  

# Subplot 1:
axes[0].set_title('操作系统分布', fontsize=20)
sns.barplot(data=df_grouped, x='Operating System',y='No_of_apps', ax=axes[0])
axes[0].set_xlabel('Age Group', fontsize=16)
axes[0].set_ylabel('Number of Users', fontsize=16)
axes[0].tick_params(axis='x', rotation=45)
axes[0].grid(True, linestyle='--', alpha=0.3)

# Subplot 2:
axes[1].set_title('用户年龄分布', fontsize=20)
sns.histplot(df['Age'], bins=20, kde=True, ax=axes[1])
axes[1].set_xlabel('Age', fontsize=16)
axes[1].set_ylabel('Frequency', fontsize=16)
axes[1].tick_params(axis='x', rotation=45)
axes[1].grid(True, linestyle='--', alpha=0.3)

# Subplot 3:
axes[2].set_title('热图', fontsize=20)
sns.heatmap(df.select_dtypes(include=['int', 'float']).corr(), annot=True, cmap="coolwarm",ax=axes[2])

# Subplot 4:
axes[3].set_title('每日应用使用情况与每日屏幕 - 准时', fontsize=20)
sns.scatterplot(data=df,x='App Usage Time (min/day)', y='Screen On Time (hours/day)', hue='Age Range',ax=axes[3])
axes[3].set_xlabel('Daily App Usage', fontsize=16)
axes[3].set_ylabel('Screen On Time (hours/day)', fontsize=16)
axes[3].tick_params(axis='x', rotation=45)
axes[3].grid(True, linestyle='--', alpha=0.3)


plt.tight_layout(pad=3)
plt.show()

接下来展示每日应用程序使用情况与每日屏幕开启时间

根据 Age group 和 devivce model 再次对数据进行分组

df_Gender_devices=df.groupby(['Gender','Device Model']).agg(Count=('Gender','count')).reset_index()
df_gender_=df_Gender_devices.pivot(index='Gender', columns='Device Model', values='Count').reset_index()
df_female_devices_age=df[df['Gender']=='Female'].groupby(['Age Range', 'Device Model']).agg(Count=('Age Range', 'count')).reset_index()
df_male_devices_age=df[df['Gender']=='Male'].groupby(['Age Range', 'Device Model']).agg(Count=('Age Range', 'count')).reset_index()

性别和年龄移动设备首选项。
以下是基于 Genger 和年龄组的不同手机型号偏好的分析，非常有趣。注意到以下观察结果

品牌忠诚度：不同年龄组和性别的设备种类繁多，表明消费者具有很强的品牌忠诚度和不同的偏好。
三星 Galaxy S21：在各个年龄段中显示出一致的受欢迎程度，但在 18-24 岁的 Z 世代中有一个明显的例外，这表明在这个年轻人群中存在独特的偏好趋势。
小米 11 也很受欢迎，并且在同年龄段的性别中保持稳定。

fig, axes = plt.subplots(2, 2, figsize=(24, 12), dpi=200)
axes = axes.flatten()

# === 子图1：按年龄范围（男性和女性）划分的设备模型受欢迎程度 ===
axes[0].set_title('按年龄范围（男、女）划分的设备型号受欢迎程度', fontsize=20)
df_Age_range_phoneModel.plot(kind='bar', x='Age Range', ax=axes[0])
axes[0].set_xlabel('年龄组', fontsize=16)
axes[0].set_ylabel('数量', fontsize=16)
axes[0].tick_params(axis='x', rotation=0)
axes[0].grid(True, linestyle='--', alpha=0.3)

# === 子图2：按性别划分的设备模型受欢迎程度 ===
axes[1].set_title('按性别划分的设备型号受欢迎程度', fontsize=20)
df_gender_.plot(kind='bar', x='Gender', ax=axes[1])
axes[1].set_xlabel('性别', fontsize=16)
axes[1].set_ylabel('数量', fontsize=16)
axes[1].tick_params(axis='x', rotation=0)
axes[1].grid(True, linestyle='--', alpha=0.3)


# === 子图3：按年龄段（女性）划分的设备模型受欢迎程度 ===
axes[2].set_title('按年龄段（女性）划分的设备型号受欢迎程度', fontsize=20)
df_female_devices.plot(kind='bar', x='Age Range', ax=axes[2])
axes[2].set_xlabel('年龄组', fontsize=16)
axes[2].set_ylabel('数量', fontsize=16)
axes[2].tick_params(axis='x', rotation=0)
axes[2].grid(True, linestyle='--', alpha=0.3)

# === 子图4：各年龄段（男性）设备型号受欢迎程度 ===
axes[3].set_title('按年龄段（男性）划分的设备型号受欢迎程度', fontsize=20)
df_male_devices.plot(kind='bar', x='Age Range', ax=axes[3])
axes[3].set_xlabel('年龄组', fontsize=16)
axes[3].set_ylabel('数量', fontsize=16)
axes[3].tick_params(axis='x', rotation=0)
axes[3].grid(True, linestyle='--', alpha=0.3)

plt.tight_layout(pad=3)
plt.show()

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# 创建一个2x2的子图网格
fig, axes = plt.subplots(2, 2, figsize=(24, 18))
axes = axes.flatten()  # 展平以便于索引：从axes[0]到axes[5]

# 子图1：
axes[0].set_title('按年龄范围划分的用户数量', fontsize=20)
sns.countplot(data=df, x='Age Range', ax=axes[0])
axes[0].set_xlabel('年龄组', fontsize=16)
axes[0].set_ylabel('用户数量', fontsize=16)
axes[0].tick_params(axis='x', rotation=45)
axes[0].grid(True, linestyle='--', alpha=0.3)

# 子图2：
axes[1].set_title('男性和女性平均每日应用使用时长（分钟）', fontsize=20)
sns.countplot(x='User Behavior Class', hue='Operating System', data=df, palette='deep', ax=axes[1])
axes[1].set_xlabel('用户行为类别', fontsize=16)
axes[1].set_ylabel('用户数量', fontsize=16)
axes[1].tick_params(axis='x', rotation=45)
axes[1].grid(True, linestyle='--', alpha=0.3)

# 子图3：
# 用于平均应用使用时长的透视表
pivot_app_usage = df.pivot_table(
    index='Gender',
    columns='Operating System',
    values='App Usage Time (min/day)',
    aggfunc='mean',
    fill_value=0
)
axes[2].set_title('按性别和操作系统划分的平均应用使用时长', fontsize=20)
sns.heatmap(pivot_app_usage, annot=True, fmt=".1f", cmap='YlGnBu', ax=axes[2])
axes[2].set_xlabel('操作系统', fontsize=16)
axes[2].set_ylabel('设备型号', fontsize=16)

# 子图4：
# 用于平均应用使用时长的透视表
pivot_app_usage = df.pivot_table(
    index='Device Model',
    columns='Operating System',
    values='App Usage Time (min/day)',
    aggfunc='mean',
    fill_value=0
)
axes[3].set_title('按设备型号和操作系统划分的平均应用使用时长', fontsize=20)
sns.heatmap(pivot_app_usage, annot=True, fmt=".1f", cmap='YlGnBu', ax=axes[3])
axes[3].set_xlabel('操作系统', fontsize=16)
axes[3].set_ylabel('设备型号', fontsize=16)
axes[3].tick_params(axis='x', rotation=45)
axes[3].grid(True, linestyle='--', alpha=0.3)

plt.tight_layout(pad=3)
plt.show()

四、研究结论

本研究通过对移动设备使用数据的分析，得出以下结论：

不同性别和年龄段用户的设备型号和操作系统偏好：
- 男性用户偏向选择安卓系统的设备，而女性用户更偏爱iOS系统。
- 年龄较小的用户（18-24岁）更倾向于选择品牌认可度高的设备，例如iPhone或三星Galaxy系列，而年龄稍长的用户在选择设备时则更关注性价比和功能性。
用户行为类别与设备使用的关系：
- 在较高频率使用的用户中，屏幕开启时间和电池消耗较高，表明该群体对设备性能和续航有较高需求。
- 极端使用类别的用户往往应用程序使用时长显著高于其他类别，这部分用户可能需要更高的存储容量和数据处理能力。
年龄段和性别在使用模式上的差异：
- 年轻用户（18-34岁）在数据消耗、屏幕开启时间和应用使用时长上显著高于年长用户。尤其是年轻男性用户在游戏类和社交类应用上的活跃度最高，而女性用户在生活服务类和社交类应用上的使用频次较高。
- 老年用户（45岁以上）对应用使用时间的要求较低，但他们在使用新闻和健康类应用时表现出较高的兴趣。
流量消耗与电池使用之间的相关性：
- 数据显示，用户的流量使用量和电池消耗量之间存在显著的正相关关系。尤其是在重度使用者中，这一关系更加显著，表明电池续航和流量套餐的设计对于高频使用者来说尤为重要。
品牌忠诚度：
- 各年龄段的用户在设备品牌上表现出不同程度的忠诚度。Z世代（18-24岁）的用户对苹果设备表现出更高的忠诚度，而千禧一代则在三星和小米设备上表现出一定的品牌忠诚。

综上所述，本研究通过数据分析展示了移动设备用户的使用偏好和行为模式，不仅揭示了不同群体在移动设备使用上的差异，也为应用开发、运营管理和设备设计提供了参考依据。

标签：数据分析,set,16,Python,axes,用户,df,fontsize
From： https://blog.csdn.net/m0_62638421/article/details/143315628

Python数据分析-移动设备使用情况和用户行为分析

一、研究背景

二、研究意义

三、实证分析

四、研究结论

相关文章

赞助商

阅读排行