首页 > 其他分享 >数据的描述性分析

数据的描述性分析

时间:2022-10-12 22:00:56浏览次数:55  
标签:分析 ... United countries 描述性 York areas 数据 Population

常用统计函数表:

  • 计数
    value_counts 针对一维频数表
    crosstab 针对二维列联表
    pivot_table 针对多维透视表
  • 计量
    mean 算均值
    median 算中位数
    quantile 算分位数
    std 算标准差
import pandas as pd
BSdata=pd.read_excel('data/BSdata.xlsx','Sheet1');BSdata #读取数据
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source
0 1 Total, all countries or areas 2010 Population mid-year estimates (millions) 6956.82 NaN United Nations Population Division, New York, ...
1 1 Total, all countries or areas 2010 Population mid-year estimates for males (milli... 3507.70 NaN United Nations Population Division, New York, ...
2 1 Total, all countries or areas 2010 Population mid-year estimates for females (mil... 3449.12 NaN United Nations Population Division, New York, ...
3 1 Total, all countries or areas 2010 Sex ratio (males per 100 females) 101.70 NaN United Nations Population Division, New York, ...
4 1 Total, all countries or areas 2010 Population aged 0 to 14 years old (percentage) 27.00 NaN United Nations Population Division, New York, ...
5 1 Total, all countries or areas 2010 Population aged 60+ years old (percentage) 11.00 NaN United Nations Population Division, New York, ...
6 1 Total, all countries or areas 2010 Population density 53.50 NaN United Nations Population Division, New York, ...
7 1 Total, all countries or areas 2015 Population mid-year estimates (millions) 7379.80 NaN United Nations Population Division, New York, ...
8 1 Total, all countries or areas 2015 Population mid-year estimates for males (milli... 3720.70 NaN United Nations Population Division, New York, ...
9 1 Total, all countries or areas 2015 Population mid-year estimates for females (mil... 3659.10 NaN United Nations Population Division, New York, ...
10 1 Total, all countries or areas 2015 Sex ratio (males per 100 females) 101.70 NaN United Nations Population Division, New York, ...
11 1 Total, all countries or areas 2015 Population aged 0 to 14 years old (percentage) 26.20 NaN United Nations Population Division, New York, ...
12 1 Total, all countries or areas 2015 Population aged 60+ years old (percentage) 12.20 NaN United Nations Population Division, New York, ...
13 1 Total, all countries or areas 2015 Population density 56.70 NaN United Nations Population Division, New York, ...
14 1 Total, all countries or areas 2015 Surface area (thousand km2) 136162.00 NaN United Nations Statistics Division, New York, ...
15 1 Total, all countries or areas 2019 Population mid-year estimates (millions) 7713.47 NaN United Nations Population Division, New York, ...
16 1 Total, all countries or areas 2019 Population mid-year estimates for males (milli... 3889.03 NaN United Nations Population Division, New York, ...
17 1 Total, all countries or areas 2019 Population mid-year estimates for females (mil... 3824.43 NaN United Nations Population Division, New York, ...
18 1 Total, all countries or areas 2019 Sex ratio (males per 100 females) 101.70 NaN United Nations Population Division, New York, ...
19 1 Total, all countries or areas 2019 Population aged 0 to 14 years old (percentage) 25.60 NaN United Nations Population Division, New York, ...
20 1 Total, all countries or areas 2019 Population aged 60+ years old (percentage) 13.20 NaN United Nations Population Division, New York, ...
21 1 Total, all countries or areas 2019 Population density 59.30 NaN United Nations Population Division, New York, ...
22 1 Total, all countries or areas 2019 Surface area (thousand km2) 130094.00 NaN United Nations Statistics Division, New York, ...
23 1 Total, all countries or areas 2021 Population mid-year estimates (millions) 7874.97 Projected estimate (medium fertility variant). United Nations Population Division, New York, ...
24 1 Total, all countries or areas 2021 Population mid-year estimates for males (milli... 3970.24 Projected estimate (medium fertility variant). United Nations Population Division, New York, ...

1 计数数据汇总分析

# 【1】频数:绝对数
T1=BSdata['Year'].value_counts();T1
2015    8
2019    8
2010    7
2021    2
Name: Year, dtype: int64
# 【2】频率:相对数
T1/sum(T1)*100
2015    32.0
2019    32.0
2010    28.0
2021     8.0
Name: Year, dtype: float64

2 计量数据汇总分析

  • 集中趋势:均值、中位数、众数
  • 离散程度:方差、标准差、变异系数
# 反映数据集中趋势
# 均数(算术平均值)
X=BSdata['Value']
X.mean()
12911.647199999998
# 中位数
X.median()
3449.12

如果均值和中位数差不多,则说明数据是对称的、正态的

# 反映数据离散程度
# 极差
X.max()-X.min() # 简单,但受极大值和极小值影响很大
136151.0
# 方差 - 离均差平方和除n-1
X.var() # 无偏估计 即除以n-1
1317422274.184596
# 标准差 - 方差的开方
X.std()
36296.31212925903
# 四分位数间距(IQR)
X.quantile(0.75)-X.quantile(0.25)
3916.74
# 偏度 - 离均差立方和除以n
X.skew()
3.267375071429257
# 峰度 - 离均差四次方的和
X.kurt()
9.528076655103652

3 汇总性统计量

默认为计算计量数据的基本统计量

BSdata.describe()
Region/Country/Area Year Value
count 25.0 25.000000 25.000000
mean 1.0 2015.360000 12911.647200
std 0.0 3.935734 36296.312129
min 1.0 2010.000000 11.000000
25% 1.0 2010.000000 53.500000
50% 1.0 2015.000000 3449.120000
75% 1.0 2019.000000 3970.240000
max 1.0 2021.000000 136162.000000
BSdata[['Unnamed: 1','Series','Footnotes','Source']].describe() # 计数数据统计
Unnamed: 1 Series Footnotes Source
count 25 25 2 25
unique 1 8 1 3
top Total, all countries or areas Population mid-year estimates (millions) Projected estimate (medium fertility variant). United Nations Population Division, New York, ...
freq 25 4 2 14

-自编计算基本统计量函数

def stats(x):
    stat=[x.count(),x.min(),x.quantile(.25),x.mean(),x.median(),x.quantile(.75),x.max(),x.max()-x.min(),x.var(),x.std(),x.skew(),x.kurt()]
    stat=pd.Series(stat,index=['Count','Min','Q1(25%)','Mean','Median','Q3(75%)','Max','Range','Var','Std','Skew','Kurt'])
    return stat
stats(BSdata.Year)
Count        25.000000
Min        2010.000000
Q1(25%)    2010.000000
Mean       2015.360000
Median     2015.000000
Q3(75%)    2019.000000
Max        2021.000000
Range        11.000000
Var          15.490000
Std           3.935734
Skew         -0.247878
Kurt         -1.361406
dtype: float64
stats(BSdata.Value)
Count      2.500000e+01
Min        1.100000e+01
Q1(25%)    5.350000e+01
Mean       1.291165e+04
Median     3.449120e+03
Q3(75%)    3.970240e+03
Max        1.361620e+05
Range      1.361510e+05
Var        1.317422e+09
Std        3.629631e+04
Skew       3.267375e+00
Kurt       9.528077e+00
dtype: float64

标签:分析,...,United,countries,描述性,York,areas,数据,Population
From: https://www.cnblogs.com/luna2333/p/16786258.html

相关文章

  • python基础-较复杂数据类型预览
    1.初识列表  列表就是队列;  列表是一种有序的,且内容可重复的数据类型;  用list代表列表,也可以用list()定义一个列表,同时定义列表可以直接使用[];  python中列......
  • Sql server2008如何导入Excel文件数据
    sqlserver中如何使用Excel文件导入数据?1)右键选择一个数据库-->任务-->导入数据2)弹出sqlserver导入导出向导,直接下一步3)数据源选择EXCEL,路径选择你需要导入的......
  • 【AutoML】如何选择最合适的数据增强操作
    大家好,欢迎来到专栏《AutoML》。在这个专栏中,我们会讲述AutoML技术在深度学习中的应用。这一期讲述在数据增强中的应用,这也是AutoML技术最早期的应用之一。作者&编辑|言有......
  • 【每周CV论文推荐】 CV领域中数据增强相关的论文推荐
    欢迎来到《每周CV论文推荐》。在这个专栏里,还是本着有三AI一贯的原则,专注于让大家能够系统性完成学习,所以我们推荐的文章也必定是同一主题的。数据增强在每一个深度学习项目......
  • 常见交换排序分析
    冒泡排序冒泡排序的基本思想是从后往前或者从前往后,进行两两相邻比较元素的值,如果是所排序的逆序,那么就进行交换。这种排序的效果就像水中的气泡从在较深处由于压强大气泡......
  • 实验1c语言开发环境使用和数据类型,运算符和表达式
    1.试验任务1(1)在垂直方向上打印两个字符小人的源代码,以及运行结果截图 \\在垂直方向上打印两个字符小人#include<stdio.h>intmain(){printf("o\n");pr......
  • CVE-2022-22965漏洞分析
    ASpringMVCorSpringWebFluxapplicationrunningonJDK9+maybevulnerabletoremotecodeexecution(RCE)viadatabinding.Thespecificexploitrequires......
  • SpringBoot 自定义注解 实现多数据源
    SpringBoot自定义注解实现多数据源前置学习需要了解注解、Aop、SpringBoot整合Mybatis的使用。数据准备基础项目代码:https://gitee.com/J_look/spring-boot-all-dem......
  • 如何保证缓存与数据库的双写一致性?
    一般来说,如果允许缓存可以稍微的跟数据库偶尔有不一致的情况,也就是说如果你的系统不是严格要求“缓存+数据库”必须保持一致性的话,最好不要做这个方案,即:读请求和写请求......
  • 陆地观测卫星数据服务(CRESDA)订单ftp地址错误—已解决不能下载问题
    陆地观测卫星数据服务订单ftp地址错误问题:本人在陆地观测卫星数据网站上申请GF1-WFV10幅数据,订单完成后返回的FTP地址出现无法连接服务器现象。(数据订单申请已通过)一、情......