首页 > 其他分享 >SciTech-Mathematics-Probability+Statistics-Descriptive stats + percentiles in numpy and scipy.stats

SciTech-Mathematics-Probability+Statistics-Descriptive stats + percentiles in numpy and scipy.stats

时间:2024-07-19 21:08:09浏览次数:15  
标签:arr Statistics stats Descriptive df scipy percentile numpy

Descriptive Stats + percentiles in numpy and scipy.stats

https://dev.to/sayemmh/descriptive-stats-percentiles-in-numpy-and-scipystats-59a7

DEV Community
Sayem Hoque, Posted on Oct 13, 2022 • Updated on Nov 16, 2022

Descriptive stats + percentiles in numpy and scipy.stats
To get the measures of central tendency in a pandas df, we can use the built in functions to calculate mean, median, mode:

import pandas as pd
import numpy as np


# Load the data
df = pd.read_csv("data.csv")

df.mean()
df.median()
df.mode()

To measure dispersion, we can use built-in functions to calculate std. deviation, variance, interquartile range, and skewness.

A low std. deviation means the data tends to be closer bunched around the mean, and vice versa if the std. deviation is high. The iqr is the difference between the 75th and 25th percentile. To calculate this, scipy.stats is used. Skew refers to how symmetric a distribution is about its' mean. A perfectly symmetric distribution would have equivalent mean, median, and mode.

from scipy.stats import iqr

df.std()
iqr(df['column1'])
df.skew()
from scipy import stats

stats.percentileofscore([1, 2, 3, 4], 3)
>> 75.0

The result of the percentileofscore function is the percentage of values within a distribution that are equal to or below the target. In this case, [1, 2, 3] are <= to 3, so 3/4 are below.

numpy.percentile is actually not the inverse of stats.percentileofscore. numpy.percentile takes in a parameter q to return the q-th percentile in an array of elements. The function sorts the original array of elements, and computes the difference between the max and minimum element. Once that range is calculated, the percentile is computed by finding the nearest two neighbors q/100 away from the minimum. A list of input functions can be used to control the numerical method applied to interpolate the two nearest neighbors. The default method is linear interpolation, taking the average of the nearest two neighbors.

Example:

arr = [0,1,2,3,4,5,6,7,8,9,10]
print("50th percentile of arr : ",
       np.percentile(arr, 50))
print("25th percentile of arr : ",
       np.percentile(arr, 25))
print("75th percentile of arr : ",
       np.percentile(arr, 75))

>>> 50th percentile of arr :  5
>>> 25th percentile of arr :  2.5
>>> 75th percentile of arr :  7.5

Now, using scipy.stats, we can compute the percentile at which a particular value is within a distribution of values. In this example, we are trying to see the percentile score for cur within the non-null values in the column ep_30.
non_nan = features[~features['ep_30'].isnull()]['ep_30']
cur = features['ep_30'][-1]

print(f'''Cur is at the {round(stats.percentileofscore(non_nan, cur, kind='mean'), 2)}th percentile of the distribution.''')

This is at the 7.27th percentile of the distribution.

标签:arr,Statistics,stats,Descriptive,df,scipy,percentile,numpy
From: https://www.cnblogs.com/abaelhe/p/18312362

相关文章

  • SciTech-Mathmatics-Statistics-NumPy and Statistics: Descriptive Statistics
    StatisticsFromNumPyOfficialDocs.https://numpy.org/doc/stable/reference/routines.statistics.htmlOrderstatisticsnumpy.percentilenumpy.percentile(a,q,axis=None,out=None,overwrite_input=False,method='linear',keepdims=False,*,weig......
  • MATH1041 Statistics for Life
    MATH1041 Statistics for Life and Social SciencesTerm 2, 2024MATH1041 AssignmentData:  Together with this document, you should have received your unique dataset in an e-mail sent to your  official university email address. The......
  • SciTech-POLIR-Statistics-重要的统计数据来源 与 数据建模
    重要数据来源:官员与人口分布建模:9000万“党员”方可称为“人民”,因为党员才有“参政议政”资格;其他13亿多“群众”大多数是“经营、生产与劳动”,无法(或只有极少数)“参政议政”被“代表”.对于“政事”,只在“人民”(党员)的组织内决策、分级授权与分发及公布?因为全......
  • ArcPy|使用ArcPy实现区域统(ZonalStatisticsAsTable)计并将统计结果添加到原有要素的属
    ArcPy|使用ArcPy实现按区域统计并将统计结果添加到原有要素的属性表​ 使用ArcPy快速实现按区域统计,并将统计结果添加到原有属性表,因为ArcPy的接口中并没有直接添加这一选项,这里是选择输出了一个中间dbf文件,再将中间dbf文件与原有要素连接来实现。#-*-coding:utf-8-*-"""P......
  • Docker top和stats区别
    dockertop需要指定容器,且不是动态显示容器资源使用情况dockerstats动态打印所有容器资源使用情况[root@localhost~]#dockertop--helpUsage:dockertopCONTAINER[psOPTIONS]DisplaytherunningprocessesofacontainerAliases:dockercontainertop,do......
  • Zgo - stats.go
     packagemainimport("fmt""math""os""slices""strconv")funcmain(){args:=os.Argsiflen(args)==1{fmt.Println("Needoneormorearguments!")......
  • Linux命令 (network statistics -all numeric programs | Global Regular Expression P
    文章目录1、第一种解释2、第二种解释3、第三种解释4、第四种解释5、第五种解释6、netstat--help在Windows中,杀死端口占用的博客链接在Linux中,grep的英文全称是GlobalRegularExpressionPrint全局正则表达式打印。它用于在文本中搜索与指定模式匹配的行,并将这......
  • 【Python快速上手(三十)】- 详解Python random 模块和 statistics 模块
    目录Python快速上手(三十)-详解Pythonrandom模块和statistics模块1.Pythonrandom模块1.1生成随机数1.2随机选择和打乱1.3随机分布1.4种子和状态2.Pythonstatistics模块2.1均值和中位数2.2众数2.3方差和标准差2.4协方差和相关性2.5分位数和百分位数2.6......
  • SciTech-Mathmatics-ProbabilitiesAndStatistics-Distribution-is-all-you-need: 概率
    Distribution-is-all-you-need概率统计到深度学习,四大技术路线图谱,都在这里!https://github.com/graykode/distribution-is-all-you-need自然语言处理路线图:数学基础->语言基础->模型和算法项目作者:Tae-HwanJung,Github:graykode,2019-09-3013:35,选自Github自然......
  • SciTech-Statistics-英语授课:Business Statistics商务统计
    WhatIsaPopulationParameter?ByAlaneLim,AlaneLim,ScienceExpertPh.D.,MaterialsScienceandEngineering,NorthwesternUniversityB.A.,Chemistry,JohnsHopkinsUniversityB.A.,CognitiveScience,JohnsHopkinsUniversityAlaneLimholdsaPh.D.......