首页 > 其他分享 >3.2-DataFrame基本操作

3.2-DataFrame基本操作

时间:2023-10-18 15:37:55浏览次数:33  
标签:31 30 NaN DataFrame 3.2 2021 2022 2023 基本操作

3.2-DataFrame基本操作    

数据概要

  • 头部数据、尾部数据
  • 索引、列名
  • 查看数值
  • 查看统计摘要

数据查询

  • 列数据
  • 行数据
  • 行列切片
  • 按值筛选
  • 按条件筛选(布尔值)

其他

  • 转置
  • 排序
  In [ ]:
import pandas as pd
import numpy as np
  In [ ]:
# 创建一个dataframe:带时间戳的价格数据
dates = pd.date_range("20210101",periods=30,freq="M")
dates
  Out[ ]:
DatetimeIndex(['2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30',
               '2021-05-31', '2021-06-30', '2021-07-31', '2021-08-31',
               '2021-09-30', '2021-10-31', '2021-11-30', '2021-12-31',
               '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
               '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
               '2022-09-30', '2022-10-31', '2022-11-30', '2022-12-31',
               '2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
               '2023-05-31', '2023-06-30'],
              dtype='datetime64[ns]', freq='M')
  In [ ]:
data = pd.DataFrame(np.random.randn(30,3),columns=list('ABC'),index=dates)
data
  Out[ ]:  
 ABC
2021-01-31 -1.005821 -0.747159 -0.590444
2021-02-28 0.106087 -0.611014 -2.492806
2021-03-31 0.923487 -1.901083 -1.139865
2021-04-30 0.045023 -0.501125 0.834619
2021-05-31 -0.015439 -0.328349 0.905197
2021-06-30 0.366951 -0.421883 1.579878
2021-07-31 1.337484 1.290041 -0.466970
2021-08-31 -0.373738 -0.220213 -0.529416
2021-09-30 0.740679 -0.795566 -0.392513
2021-10-31 -0.759147 0.166461 2.225352
2021-11-30 0.120085 -0.969381 0.050001
2021-12-31 -1.328895 0.311472 0.237954
2022-01-31 0.211936 0.477653 -0.097692
2022-02-28 0.135520 0.445589 1.909404
2022-03-31 0.876071 1.117198 0.629551
2022-04-30 0.863037 -1.707017 0.470066
2022-05-31 -0.979964 0.257285 0.898436
2022-06-30 -1.423223 0.259646 -0.650481
2022-07-31 1.580251 -0.314205 0.639193
2022-08-31 1.954733 -1.515528 0.143653
2022-09-30 -0.722134 0.845884 -0.299418
2022-10-31 -0.448377 -1.045969 0.244326
2022-11-30 -0.092980 -1.089742 0.561777
2022-12-31 2.820850 -0.080729 0.770422
2023-01-31 -1.482163 0.365914 1.351397
2023-02-28 -0.364066 -0.182885 -0.922139
2023-03-31 -0.589401 0.592518 -0.119778
2023-04-30 0.705069 0.808626 2.058423
2023-05-31 0.659801 1.853893 1.030405
2023-06-30 0.363107 -0.512096 0.169748
  In [ ]:
# 头部数据
data.head()
  Out[ ]:  
 ABC
2021-01-31 -1.005821 -0.747159 -0.590444
2021-02-28 0.106087 -0.611014 -2.492806
2021-03-31 0.923487 -1.901083 -1.139865
2021-04-30 0.045023 -0.501125 0.834619
2021-05-31 -0.015439 -0.328349 0.905197
  In [ ]:
# 头部前3条
data.head(3)
  Out[ ]:  
 ABC
2021-01-31 -1.005821 -0.747159 -0.590444
2021-02-28 0.106087 -0.611014 -2.492806
2021-03-31 0.923487 -1.901083 -1.139865
  In [ ]:
# 尾部数据
data.tail()
  Out[ ]:  
 ABC
2023-02-28 -0.364066 -0.182885 -0.922139
2023-03-31 -0.589401 0.592518 -0.119778
2023-04-30 0.705069 0.808626 2.058423
2023-05-31 0.659801 1.853893 1.030405
2023-06-30 0.363107 -0.512096 0.169748
  In [ ]:
# 尾部3条
data.tail(3)
  Out[ ]:  
 ABC
2023-04-30 0.705069 0.808626 2.058423
2023-05-31 0.659801 1.853893 1.030405
2023-06-30 0.363107 -0.512096 0.169748
  In [ ]:
# 索引
data.index
  Out[ ]:
DatetimeIndex(['2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30',
               '2021-05-31', '2021-06-30', '2021-07-31', '2021-08-31',
               '2021-09-30', '2021-10-31', '2021-11-30', '2021-12-31',
               '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
               '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
               '2022-09-30', '2022-10-31', '2022-11-30', '2022-12-31',
               '2023-01-31', '2023-02-28', '2023-03-31', '2023-04-30',
               '2023-05-31', '2023-06-30'],
              dtype='datetime64[ns]', freq='M')
  In [ ]:
# 列名
data.columns
  Out[ ]:
Index(['A', 'B', 'C'], dtype='object')
  In [ ]:
# 查看数值(array)
data.values
  Out[ ]:
array([[-1.00582066, -0.74715913, -0.59044421],
       [ 0.10608719, -0.61101411, -2.49280605],
       [ 0.92348709, -1.901083  , -1.13986527],
       [ 0.04502284, -0.50112509,  0.83461856],
       [-0.01543869, -0.32834889,  0.90519706],
       [ 0.36695114, -0.42188302,  1.57987849],
       [ 1.33748358,  1.29004108, -0.46697028],
       [-0.37373787, -0.22021339, -0.52941598],
       [ 0.74067882, -0.79556616, -0.39251251],
       [-0.75914684,  0.16646147,  2.22535186],
       [ 0.12008512, -0.96938058,  0.05000075],
       [-1.32889545,  0.31147169,  0.23795395],
       [ 0.2119362 ,  0.47765278, -0.0976922 ],
       [ 0.13551963,  0.44558949,  1.90940387],
       [ 0.87607052,  1.11719757,  0.62955101],
       [ 0.8630371 , -1.70701661,  0.47006564],
       [-0.97996414,  0.25728477,  0.89843618],
       [-1.42322307,  0.25964647, -0.65048082],
       [ 1.580251  , -0.3142048 ,  0.63919311],
       [ 1.95473317, -1.51552846,  0.1436534 ],
       [-0.7221338 ,  0.84588397, -0.29941785],
       [-0.44837652, -1.04596934,  0.24432642],
       [-0.0929797 , -1.08974158,  0.56177711],
       [ 2.82084974, -0.08072931,  0.77042241],
       [-1.48216309,  0.36591366,  1.35139741],
       [-0.3640665 , -0.18288453, -0.92213874],
       [-0.58940142,  0.59251817, -0.11977769],
       [ 0.70506868,  0.80862606,  2.05842293],
       [ 0.65980106,  1.85389319,  1.03040533],
       [ 0.36310659, -0.51209611,  0.16974761]])
  In [ ]:
data.to_numpy()
  Out[ ]:
array([[-1.00582066, -0.74715913, -0.59044421],
       [ 0.10608719, -0.61101411, -2.49280605],
       [ 0.92348709, -1.901083  , -1.13986527],
       [ 0.04502284, -0.50112509,  0.83461856],
       [-0.01543869, -0.32834889,  0.90519706],
       [ 0.36695114, -0.42188302,  1.57987849],
       [ 1.33748358,  1.29004108, -0.46697028],
       [-0.37373787, -0.22021339, -0.52941598],
       [ 0.74067882, -0.79556616, -0.39251251],
       [-0.75914684,  0.16646147,  2.22535186],
       [ 0.12008512, -0.96938058,  0.05000075],
       [-1.32889545,  0.31147169,  0.23795395],
       [ 0.2119362 ,  0.47765278, -0.0976922 ],
       [ 0.13551963,  0.44558949,  1.90940387],
       [ 0.87607052,  1.11719757,  0.62955101],
       [ 0.8630371 , -1.70701661,  0.47006564],
       [-0.97996414,  0.25728477,  0.89843618],
       [-1.42322307,  0.25964647, -0.65048082],
       [ 1.580251  , -0.3142048 ,  0.63919311],
       [ 1.95473317, -1.51552846,  0.1436534 ],
       [-0.7221338 ,  0.84588397, -0.29941785],
       [-0.44837652, -1.04596934,  0.24432642],
       [-0.0929797 , -1.08974158,  0.56177711],
       [ 2.82084974, -0.08072931,  0.77042241],
       [-1.48216309,  0.36591366,  1.35139741],
       [-0.3640665 , -0.18288453, -0.92213874],
       [-0.58940142,  0.59251817, -0.11977769],
       [ 0.70506868,  0.80862606,  2.05842293],
       [ 0.65980106,  1.85389319,  1.03040533],
       [ 0.36310659, -0.51209611,  0.16974761]])
  In [ ]:
# 查看统计摘要
data.describe()
  Out[ ]:  
 ABC
count 30.000000 30.000000 30.000000
mean 0.140827 -0.138392 0.300276
std 1.009620 0.888840 1.014414
min -1.482163 -1.901083 -2.492806
25% -0.554145 -0.713123 -0.369239
50% 0.113086 -0.201549 0.241140
75% 0.731776 0.425671 0.882482
max 2.820850 1.853893 2.225352
  In [ ]:
# 列数据
data[['A','B']]
  Out[ ]:  
 AB
2021-01-31 -1.005821 -0.747159
2021-02-28 0.106087 -0.611014
2021-03-31 0.923487 -1.901083
2021-04-30 0.045023 -0.501125
2021-05-31 -0.015439 -0.328349
2021-06-30 0.366951 -0.421883
2021-07-31 1.337484 1.290041
2021-08-31 -0.373738 -0.220213
2021-09-30 0.740679 -0.795566
2021-10-31 -0.759147 0.166461
2021-11-30 0.120085 -0.969381
2021-12-31 -1.328895 0.311472
2022-01-31 0.211936 0.477653
2022-02-28 0.135520 0.445589
2022-03-31 0.876071 1.117198
2022-04-30 0.863037 -1.707017
2022-05-31 -0.979964 0.257285
2022-06-30 -1.423223 0.259646
2022-07-31 1.580251 -0.314205
2022-08-31 1.954733 -1.515528
2022-09-30 -0.722134 0.845884
2022-10-31 -0.448377 -1.045969
2022-11-30 -0.092980 -1.089742
2022-12-31 2.820850 -0.080729
2023-01-31 -1.482163 0.365914
2023-02-28 -0.364066 -0.182885
2023-03-31 -0.589401 0.592518
2023-04-30 0.705069 0.808626
2023-05-31 0.659801 1.853893
2023-06-30 0.363107 -0.512096
  In [ ]:
# 行数据
data.iloc[0:10]
  Out[ ]:  
 ABC
2021-01-31 -1.005821 -0.747159 -0.590444
2021-02-28 0.106087 -0.611014 -2.492806
2021-03-31 0.923487 -1.901083 -1.139865
2021-04-30 0.045023 -0.501125 0.834619
2021-05-31 -0.015439 -0.328349 0.905197
2021-06-30 0.366951 -0.421883 1.579878
2021-07-31 1.337484 1.290041 -0.466970
2021-08-31 -0.373738 -0.220213 -0.529416
2021-09-30 0.740679 -0.795566 -0.392513
2021-10-31 -0.759147 0.166461 2.225352
  In [ ]:
# 行列切片
data.loc['20210101':'20220101','A':'B']   # 20210101到20220101的A/B两列数据
  Out[ ]:  
 AB
2021-01-31 -1.005821 -0.747159
2021-02-28 0.106087 -0.611014
2021-03-31 0.923487 -1.901083
2021-04-30 0.045023 -0.501125
2021-05-31 -0.015439 -0.328349
2021-06-30 0.366951 -0.421883
2021-07-31 1.337484 1.290041
2021-08-31 -0.373738 -0.220213
2021-09-30 0.740679 -0.795566
2021-10-31 -0.759147 0.166461
2021-11-30 0.120085 -0.969381
2021-12-31 -1.328895 0.311472
  In [ ]:
# 按值筛选
## 小数取2位
data = round(data,2)
## A列中数值是0.74的那一行
data[data['A']==0.74]
## A列中数值是0.74的那一行的A列
data[data['A']==0.74]['A']
  Out[ ]:
2021-09-30    0.74
Freq: M, Name: A, dtype: float64
  In [ ]:
# 按条件筛选(布尔值)
data[data['A']>0.5]
  Out[ ]:  
 ABC
2021-03-31 0.92 -1.90 -1.14
2021-07-31 1.34 1.29 -0.47
2021-09-30 0.74 -0.80 -0.39
2022-03-31 0.88 1.12 0.63
2022-04-30 0.86 -1.71 0.47
2022-07-31 1.58 -0.31 0.64
2022-08-31 1.95 -1.52 0.14
2022-12-31 2.82 -0.08 0.77
2023-04-30 0.71 0.81 2.06
2023-05-31 0.66 1.85 1.03
  In [ ]:
data[data>0.5]   # 不满足条件的那一个数值会变成nan
  Out[ ]:  
 ABC
2021-01-31 NaN NaN NaN
2021-02-28 NaN NaN NaN
2021-03-31 0.92 NaN NaN
2021-04-30 NaN NaN 0.83
2021-05-31 NaN NaN 0.91
2021-06-30 NaN NaN 1.58
2021-07-31 1.34 1.29 NaN
2021-08-31 NaN NaN NaN
2021-09-30 0.74 NaN NaN
2021-10-31 NaN NaN 2.23
2021-11-30 NaN NaN NaN
2021-12-31 NaN NaN NaN
2022-01-31 NaN NaN NaN
2022-02-28 NaN NaN 1.91
2022-03-31 0.88 1.12 0.63
2022-04-30 0.86 NaN NaN
2022-05-31 NaN NaN 0.90
2022-06-30 NaN NaN NaN
2022-07-31 1.58 NaN 0.64
2022-08-31 1.95 NaN NaN
2022-09-30 NaN 0.85 NaN
2022-10-31 NaN NaN NaN
2022-11-30 NaN NaN 0.56
2022-12-31 2.82 NaN 0.77
2023-01-31 NaN NaN 1.35
2023-02-28 NaN NaN NaN
2023-03-31 NaN 0.59 NaN
2023-04-30 0.71 0.81 2.06
2023-05-31 0.66 1.85 1.03
2023-06-30 NaN NaN NaN
  In [ ]:
# 针对上面的结果去除NaN
data[data>0.5].dropna()
  Out[ ]:  
 ABC
2022-03-31 0.88 1.12 0.63
2023-04-30 0.71 0.81 2.06
2023-05-31 0.66 1.85 1.03
  In [ ]:
# 去除重复值
data.drop_duplicates()
  Out[ ]:  
 ABC
2021-01-31 -1.01 -0.75 -0.59
2021-02-28 0.11 -0.61 -2.49
2021-03-31 0.92 -1.90 -1.14
2021-04-30 0.05 -0.50 0.83
2021-05-31 -0.02 -0.33 0.91
2021-06-30 0.37 -0.42 1.58
2021-07-31 1.34 1.29 -0.47
2021-08-31 -0.37 -0.22 -0.53
2021-09-30 0.74 -0.80 -0.39
2021-10-31 -0.76 0.17 2.23
2021-11-30 0.12 -0.97 0.05
2021-12-31 -1.33 0.31 0.24
2022-01-31 0.21 0.48 -0.10
2022-02-28 0.14 0.45 1.91
2022-03-31 0.88 1.12 0.63
2022-04-30 0.86 -1.71 0.47
2022-05-31 -0.98 0.26 0.90
2022-06-30 -1.42 0.26 -0.65
2022-07-31 1.58 -0.31 0.64
2022-08-31 1.95 -1.52 0.14
2022-09-30 -0.72 0.85 -0.30
2022-10-31 -0.45 -1.05 0.24
2022-11-30 -0.09 -1.09 0.56
2022-12-31 2.82 -0.08 0.77
2023-01-31 -1.48 0.37 1.35
2023-02-28 -0.36 -0.18 -0.92
2023-03-31 -0.59 0.59 -0.12
2023-04-30 0.71 0.81 2.06
2023-05-31 0.66 1.85 1.03
2023-06-30 0.36 -0.51 0.17
  In [ ]:
# 转置
data.T
  Out[ ]:  
 2021-01-312021-02-282021-03-312021-04-302021-05-312021-06-302021-07-312021-08-312021-09-302021-10-31...2022-09-302022-10-312022-11-302022-12-312023-01-312023-02-282023-03-312023-04-302023-05-312023-06-30
A -1.01 0.11 0.92 0.05 -0.02 0.37 1.34 -0.37 0.74 -0.76 ... -0.72 -0.45 -0.09 2.82 -1.48 -0.36 -0.59 0.71 0.66 0.36
B -0.75 -0.61 -1.90 -0.50 -0.33 -0.42 1.29 -0.22 -0.80 0.17 ... 0.85 -1.05 -1.09 -0.08 0.37 -0.18 0.59 0.81 1.85 -0.51
C -0.59 -2.49 -1.14 0.83 0.91 1.58 -0.47 -0.53 -0.39 2.23 ... -0.30 0.24 0.56 0.77 1.35 -0.92 -0.12 2.06 1.03 0.17

3 rows × 30 columns

  In [ ]:
# 排序
data.sort_values(by='A',ascending=False)  # A列降序
  Out[ ]:  
 ABC
2022-12-31 2.82 -0.08 0.77
2022-08-31 1.95 -1.52 0.14
2022-07-31 1.58 -0.31 0.64
2021-07-31 1.34 1.29 -0.47
2021-03-31 0.92 -1.90 -1.14
2022-03-31 0.88 1.12 0.63
2022-04-30 0.86 -1.71 0.47
2021-09-30 0.74 -0.80 -0.39
2023-04-30 0.71 0.81 2.06
2023-05-31 0.66 1.85 1.03
2021-06-30 0.37 -0.42 1.58
2023-06-30 0.36 -0.51 0.17
2022-01-31 0.21 0.48 -0.10
2022-02-28 0.14 0.45 1.91
2021-11-30 0.12 -0.97 0.05
2021-02-28 0.11 -0.61 -2.49
2021-04-30 0.05 -0.50 0.83
2021-05-31 -0.02 -0.33 0.91
2022-11-30 -0.09 -1.09 0.56
2023-02-28 -0.36 -0.18 -0.92
2021-08-31 -0.37 -0.22 -0.53
2022-10-31 -0.45 -1.05 0.24
2023-03-31 -0.59 0.59 -0.12
2022-09-30 -0.72 0.85 -0.30
2021-10-31 -0.76 0.17 2.23
2022-05-31 -0.98 0.26 0.90
2021-01-31 -1.01 -0.75 -0.59
2021-12-31 -1.33 0.31 0.24
2022-06-30 -1.42 0.26 -0.65
2023-01-31 -1.48 0.37 1.35
  In [ ]:
data.sort_index(ascending=False)
  Out[ ]:  
 ABC
2023-06-30 0.36 -0.51 0.17
2023-05-31 0.66 1.85 1.03
2023-04-30 0.71 0.81 2.06
2023-03-31 -0.59 0.59 -0.12
2023-02-28 -0.36 -0.18 -0.92
2023-01-31 -1.48 0.37 1.35
2022-12-31 2.82 -0.08 0.77
2022-11-30 -0.09 -1.09 0.56
2022-10-31 -0.45 -1.05 0.24
2022-09-30 -0.72 0.85 -0.30
2022-08-31 1.95 -1.52 0.14
2022-07-31 1.58 -0.31 0.64
2022-06-30 -1.42 0.26 -0.65
2022-05-31 -0.98 0.26 0.90
2022-04-30 0.86 -1.71 0.47
2022-03-31 0.88 1.12 0.63
2022-02-28 0.14 0.45 1.91
2022-01-31 0.21 0.48 -0.10
2021-12-31 -1.33 0.31 0.24
2021-11-30 0.12 -0.97 0.05
2021-10-31 -0.76 0.17 2.23
2021-09-30 0.74 -0.80 -0.39
2021-08-31 -0.37 -0.22 -0.53
2021-07-31 1.34 1.29 -0.47
2021-06-30 0.37 -0.42 1.58
2021-05-31 -0.02 -0.33 0.91
2021-04-30 0.05 -0.50 0.83
2021-03-31 0.92 -1.90 -1.14
2021-02-28 0.11 -0.61 -2.49
2021-01-31 -1.01 -0.75 -0.59
  In [ ]:  

标签:31,30,NaN,DataFrame,3.2,2021,2022,2023,基本操作
From: https://www.cnblogs.com/mlzxdzl/p/17772465.html

相关文章

  • ElasticSearch Java API 基本操作
    前言ElasticSearchJavaAPI是ES官方在8.x版本推出的新javaapi,也可以适用于7.17.x版本的es。本文主要参考了相关博文,自己手动编写了下相关操作代码,包括更新mappings等操作的java代码。代码示例已上传github。版本elasticsearch版本:7.17.9,修改/elasticsearch-7.17.9/config/......
  • Day4 链表的基本操作2
    Day4链表剩下的基本操作Lc24给你一个链表,两两交换其中相邻的节点,并返回交换后链表的头节点。你必须在不修改节点内部的值的情况下完成本题(即,只能进行节点交换)。//画个图,弄个新节点,然后按照顺序进行连接,最主要的是连的时候思路要清晰classSolution{public:ListNode*......
  • Vue3.2中setup语法糖的使用教程分享
    这篇文章主要为大家详细介绍了Vue3.2中setup语法糖的具体使用方法,文中的示例代码讲解详细,对我们深入了解Vue有一定的帮助,需要的可以参考一下目录2、data数据的使用3、method方法的使用4、watchEffect的使用5、watch的使用6、computed计算属性的使用7、props父子传值的使用8、emit......
  • 实验五 队列的基本操作及应用
    实验五队列的基本操作及应用作业要求:实验时间:第7、8周实验目的:掌握队列的初始化、判空、取队头元素、出队、入队、输出队列元素等基本操作实验要求:1、认真阅读和掌握教材上和本实验相关的算法。2、上机将链队列或循环队列的相关算法实现。3、实现下面实验内容要求的功能,并......
  • pandas教程01: pandas的安装和基本操作
    pandas是Python中常用的数据处理库,主要用来处理表格数据,类似于下面这种:好好干文化有限公司员工薪资表姓名年龄性别年薪奖金久九刘35男18260042000傅儿待24男996000040000000舍处28女6000018000大家想一想,无论是日常办公使用的excel还是数据库,是......
  • numpy基本操作
      1.3.1索引¶单个元素索引:一维数组、负数索引二维数组的索引1.3.2切片¶切片跨步索引数组:针对多为数组的索引索引结合切片 In [1]:importnumpyasnp In [2]:#一维数组索引array1=np.array([1,2,3,4,5])array......
  • IntelliJ IDEA 2023.2.3 最新版安装激活教程
    1.下载安装IntelliJIDEA建议大家直接在官网下载最新版本,登陆官网,我下载的是的2023.2.3,2023最新版本以及2021版本以上的版本都支持。一步一步确定安装,然后打开这里提示输入激活码,先关闭应用!!!2.下载激活工具打开下载好的工具windows的样子打开scripts文件夹......
  • r - How do I order by row.names in dataframe R语言 排序
     new_df<-df[order(row.names(df)),]REF:https://stackoverflow.com/questions/20295787/how-can-i-use-the-row-names-attribute-to-order-the-rows-of-my-dataframe-in-rhttps://stackoverflow.com/questions/25194196/how-do-i-order-by-row-names-in-dataframe......
  • IntelliJ IDEA 2023.2社区版插件汇总
    参考插件帝:https://gitee.com/zhengqingya/idea-config/IDEA插件市场:https://plugins.jetbrains.com/突发小技巧:使用插件时要注意插件的版本兼容性,并根据自己的实际需求选择合适的插件。同时,不要过度依赖插件,保持简洁和高效的开发环境才是最重要的。@目录1.SmartTomc......
  • postgreSQL基本操作
    一、使用psql工具连接到数据库psql-Upostgres#postgres是具体的用户名,应具体情况具体分析效果如图:二、有关于数据库的基本指令pg不同于mysql,它的指令更为精简1、获取所有数据库信息\l2、选定数据库\c${yourdatebasename}3、获取所有表的信息\dt#此指令需......