标签:packages apple satisfied Pandas 3.1 数据结构 data pandas Out
3.1-Pandas数据结构
3.1.1 认识Pandas库¶
- 基于Numpy的一种工具,为解决数据分析任务而创建的,纳入了大量库和一些标准的数据模型,提供了高效地操作大型数据集所需的工具
- 基本上你能用Excel或者Bi工具进行的数据处理,Pandas也都能实现,而且更快
In [ ]:
pip install pandas
Requirement already satisfied: pandas in c:\users\dengzl\.conda\envs\data_analysis\lib\site-packages (2.1.1)
Requirement already satisfied: numpy>=1.22.4 in c:\users\dengzl\.conda\envs\data_analysis\lib\site-packages (from pandas) (1.26.0)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\dengzl\.conda\envs\data_analysis\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\dengzl\.conda\envs\data_analysis\lib\site-packages (from pandas) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\users\dengzl\.conda\envs\data_analysis\lib\site-packages (from pandas) (2023.3)
Requirement already satisfied: six>=1.5 in c:\users\dengzl\.conda\envs\data_analysis\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
3.1.2 数据结构:Series、DataFrame¶
区别
- series,只是一个一维数据结构,它由index和value组成
- dataframe,是一个二维结构,处理拥有index和value之外,还拥有column. 联系
- dataframe由多个series组成,无论是行还是列,单独拆分出来都是一个series
In [ ]:
# 创建一张二维表格 ---> 判断数据结构/属性
data = {'apples':[3,2,0,1],'oranges':[0,3,7,2]}
data
Out[ ]:
{'apples': [3, 2, 0, 1], 'oranges': [0, 3, 7, 2]}
In [ ]:
type(data)
Out[ ]:
dict
In [ ]:
import pandas as pd
data = pd.DataFrame(data)
data
Out[ ]:
| apples | oranges |
0 |
3 |
0 |
1 |
2 |
3 |
2 |
0 |
7 |
3 |
1 |
2 |
In [ ]:
type(data)
Out[ ]:
pandas.core.frame.DataFrame
In [ ]:
apple = data['apples']
apple
Out[ ]:
0 3
1 2
2 0
3 1
Name: apples, dtype: int64
In [ ]:
type(apple)
Out[ ]:
pandas.core.series.Series
In [ ]:
apple = pd.Series(apple, index=[3,2,1,0])
apple
Out[ ]:
3 1
2 0
1 2
0 3
Name: apples, dtype: int64
In [ ]:
apple.reset_index() # 重置索引
Out[ ]:
| index | apples |
0 |
3 |
1 |
1 |
2 |
0 |
2 |
1 |
2 |
3 |
0 |
3 |
In [ ]:
apple.reset_index(drop=True) # 重置索引并删除原有索引
Out[ ]:
0 1
1 0
2 2
3 3
Name: apples, dtype: int64
3.1.3 Numpy v.s. Pandas¶
In [ ]:
# 两者的互相转换
import numpy as np
data2 = np.arange(15).reshape(3,5)
data2
Out[ ]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [ ]:
pd.DataFrame(data2)
Out[ ]:
| 0 | 1 | 2 | 3 | 4 |
0 |
0 |
1 |
2 |
3 |
4 |
1 |
5 |
6 |
7 |
8 |
9 |
2 |
10 |
11 |
12 |
13 |
14 |
In [ ]:
data3 = pd.DataFrame(data2,columns=['a','b','c','d','e']) # numpy转为pandans
data3
Out[ ]:
| a | b | c | d | e |
0 |
0 |
1 |
2 |
3 |
4 |
1 |
5 |
6 |
7 |
8 |
9 |
2 |
10 |
11 |
12 |
13 |
14 |
In [ ]:
data3.values # pandans转为numpy
Out[ ]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [ ]:
标签:packages,
apple,
satisfied,
Pandas,
3.1,
数据结构,
data,
pandas,
Out
From: https://www.cnblogs.com/mlzxdzl/p/17772462.html