首页 > 编程语言 >Python进阶—Pandas

Python进阶—Pandas

时间:2022-12-29 10:06:39浏览次数:47  
标签:index 01 进阶 Python df1 df2 K0 pd Pandas


Pandas再来一次


文章目录

  • ​​一、Series和DataFrame​​
  • ​​二、选择数据​​
  • ​​三、赋值及操作(增、删、改)​​
  • ​​四、处理丢失数据​​
  • ​​五、读取并写入文件​​
  • ​​六、concat拼接​​
  • ​​七、merge合并​​
  • ​​八、pandas的plot​​



配合 ​​机器学习​​食用更佳。


一、Series和DataFrame

  • series的创建(列表、数组、字典)、values/index、索引(默认、指定) 取一个值 取多个值、遍历
  • dataframe的创建、values/index/columns、df.sort_index(axis=0) 行排序和列排序、df.sort_values(by=’’) # 根据值排序,可以指定行还是列
  • df.describe()/df.T
import pandas as pd
import numpy as np
s1 = pd.Series([1,2,3,-5]) # 创建一个series,索引为默认值
print(s1)
0    1
1 2
2 3
3 -5
dtype: int64
s1.values # series的值
array([ 1,  2,  3, -5])
s1.index # series的索引
RangeIndex(start=0, stop=4, step=1)
for i in s1.index:
print(i)
0
1
2
3
s1[0] # 根据索引取值
1
s2 = pd.Series([1,2,3,4],index=['a','b','c','d']) # 指定索引
print(s2)
a    1
b 2
c 3
d 4
dtype: int64
s2[0]
1
s2['a']
1
s2[['a','b','c','d']]
a    1
b 2
c 3
d 4
dtype: int64
s2[[0,1,2]]
a    1
b 2
c 3
dtype: int64
for i in s1:
print(i)
1
2
3
-5
for i in s2:
print(i)
1
2
3
4
'a' in s2
True
# Series可以看成是一个定长的有序字典
dic1 = {'apple':5,'pen':3}
s3 = pd.Series(dic1)
print(s3)
apple    5
pen 3
dtype: int64
# DataFrame
data = {'year':[2014,2015,2016,2017],
'income':[1000,2000,3000,4000],
'pay':[5000,20000,30000,40000]}
df1 = pd.DataFrame(data)
df1



year

income

pay

0

2014

1000

5000

1

2015

2000

20000

2

2016

3000

30000

3

2017

4000

40000

df2 = pd.DataFrame(np.arange(12).reshape(3,-1))
print(df2)
0  1   2   3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
df3 = pd.DataFrame(np.arange(12).reshape(3,4),index=['a','b','c'],columns=['A','B','C','D'])
print(df3)
A  B   C   D
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
df1.index
RangeIndex(start=0, stop=4, step=1)
df1.columns
Index(['year', 'income', 'pay'], dtype='object')
df1.values
array([[ 2014,  1000,  5000],
[ 2015, 2000, 20000],
[ 2016, 3000, 30000],
[ 2017, 4000, 40000]])
df1.describe()



year

income

pay

count

4.000000

4.000000

4.000000

mean

2015.500000

2500.000000

23750.000000

std

1.290994

1290.994449

14930.394056

min

2014.000000

1000.000000

5000.000000

25%

2014.750000

1750.000000

16250.000000

50%

2015.500000

2500.000000

25000.000000

75%

2016.250000

3250.000000

32500.000000

max

2017.000000

4000.000000

40000.000000

print(df1)
year  income    pay
0 2014 1000 5000
1 2015 2000 20000
2 2016 3000 30000
3 2017 4000 40000
print(df1.T)
0      1      2      3
year 2014 2015 2016 2017
income 1000 2000 3000 4000
pay 5000 20000 30000 40000
df1.sort_index(axis=1) # columns索引排序



income

pay

year

0

1000

5000

2014

1

2000

20000

2015

2

3000

30000

2016

3

4000

40000

2017

df1.sort_index(axis=0)



year

income

pay

0

2014

1000

5000

1

2015

2000

20000

2

2016

3000

30000

3

2017

4000

40000

df3.sort_values(by='a',axis=1) # 根据某一列排序



A

B

C

D

a

0

1

2

3

b

4

5

6

7

c

8

9

10

11

df3



A

B

C

D

a

0

1

2

3

b

4

5

6

7

c

8

9

10

11

二、选择数据

  • 取行、取列、取多行、取多列、取元素、取True元素、切片
  • df[] # 只能取列 可以取多列
  • df.loc[] # 按照标签取
  • df.iloc[] # 按照位置取
  • df[df>6] # 取True元素
dates = pd.date_range('20170101',periods=6)
df1 = pd.DataFrame(np.arange(24).reshape(6,4),index=dates,columns=['A','B','C','D'])
print(df1)
A   B   C   D
2017-01-01 0 1 2 3
2017-01-02 4 5 6 7
2017-01-03 8 9 10 11
2017-01-04 12 13 14 15
2017-01-05 16 17 18 19
2017-01-06 20 21 22 23
df1['A'] # 获取一列 == df1.A  df1[['A','B']]  取多列
2017-01-01     0
2017-01-02 4
2017-01-03 8
2017-01-04 12
2017-01-05 16
2017-01-06 20
Freq: D, Name: A, dtype: int64
df1[0:2] # 切片 前两行



A

B

C

D

2017-01-01

0

1

2

3

2017-01-02

4

5

6

7

df1['20170102':'20170104']



A

B

C

D

2017-01-02

4

5

6

7

2017-01-03

8

9

10

11

2017-01-04

12

13

14

15

# 通过标签选择数据
df1.loc['20170102'] # 取行
A    4
B 5
C 6
D 7
Name: 2017-01-02 00:00:00, dtype: int64
df1.loc['20170101','A'] # 取元素
0
df1.loc[:,'A'] # 取列
2017-01-01     0
2017-01-02 4
2017-01-03 8
2017-01-04 12
2017-01-05 16
2017-01-06 20
Freq: D, Name: A, dtype: int64
df1.loc[:,['A','B']]



A

B

2017-01-01

0

1

2017-01-02

4

5

2017-01-03

8

9

2017-01-04

12

13

2017-01-05

16

17

2017-01-06

20

21

df1.loc[::2,['A','B']]



A

B

2017-01-01

0

1

2017-01-03

8

9

2017-01-05

16

17

# 通过位置选择数据
df1.iloc[2] # 第二行
A     8
B 9
C 10
D 11
Name: 2017-01-03 00:00:00, dtype: int64
df1.iloc[[1,2]]



A

B

C

D

2017-01-02

4

5

6

7

2017-01-03

8

9

10

11

df1.iloc[1:3,2:4]



C

D

2017-01-02

6

7

2017-01-03

10

11

df1 > 6



A

B

C

D

2017-01-01

False

False

False

False

2017-01-02

False

False

False

True

2017-01-03

True

True

True

True

2017-01-04

True

True

True

True

2017-01-05

True

True

True

True

2017-01-06

True

True

True

True

df1.A > 6
2017-01-01    False
2017-01-02 False
2017-01-03 True
2017-01-04 True
2017-01-05 True
2017-01-06 True
Freq: D, Name: A, dtype: bool
df1[df1>6]



A

B

C

D

2017-01-01

NaN

NaN

NaN

NaN

2017-01-02

NaN

NaN

NaN

7.0

2017-01-03

8.0

9.0

10.0

11.0

2017-01-04

12.0

13.0

14.0

15.0

2017-01-05

16.0

17.0

18.0

19.0

2017-01-06

20.0

21.0

22.0

23.0

三、赋值及操作(增、删、改)

  • 增:在尾部增加一行、一列、在指定位置插入一列(insert) 添加一行只能先拆开后合并
  • 删:del pop
  • 改:找到相应位置改即可
df1



A

B

C

D

2017-01-01

0

1

2

3

2017-01-02

4

5

6

7

2017-01-03

8

9

10

11

2017-01-04

12

13

14

15

2017-01-05

16

17

18

19

2017-01-06

20

21

22

23

df1.loc['20170101','A'] = 10000
df1



A

B

C

D

2017-01-01

10000

1

2

3

2017-01-02

4

5

6

7

2017-01-03

8

9

10

11

2017-01-04

12

13

14

15

2017-01-05

16

17

18

19

2017-01-06

20

21

22

23

df1[df1>10] = -1
df1



A

B

C

D

2017-01-01

-1

1

2

3

2017-01-02

4

5

6

7

2017-01-03

8

9

10

-1

2017-01-04

-1

-1

-1

-1

2017-01-05

-1

-1

-1

-1

2017-01-06

-1

-1

-1

-1

df1['E'] = 10 # 添加一列
df1



A

B

C

D

E

2017-01-01

-1

1

2

3

10

2017-01-02

4

5

6

7

10

2017-01-03

8

9

10

-1

10

2017-01-04

-1

-1

-1

-1

10

2017-01-05

-1

-1

-1

-1

10

2017-01-06

-1

-1

-1

-1

10

df1['F'] = pd.Series([1,2,3,4,5,6],index=dates)#添加一列
df1



A

B

C

D

E

F

2017-01-01

-1

1

2

3

10

1

2017-01-02

4

5

6

7

10

2

2017-01-03

8

9

10

-1

10

3

2017-01-04

-1

-1

-1

-1

10

4

2017-01-05

-1

-1

-1

-1

10

5

2017-01-06

-1

-1

-1

-1

10

6

s1 = pd.Series([1,2,3,4],index=['A','B','C','D'],)
df2 = df1.append(s1,ignore_index=True) # 增加一行
df2



A

B

C

D

E

F

0

-1.0

1.0

2.0

3.0

10.0

1.0

1

4.0

5.0

6.0

7.0

10.0

2.0

2

8.0

9.0

10.0

-1.0

10.0

3.0

3

-1.0

-1.0

-1.0

-1.0

10.0

4.0

4

-1.0

-1.0

-1.0

-1.0

10.0

5.0

5

-1.0

-1.0

-1.0

-1.0

10.0

6.0

6

1.0

2.0

3.0

4.0

NaN

NaN

df2.loc[7,['A','B','C']] = [1,2,3] # 增加一行
df2



A

B

C

D

E

F

0

-1.0

1.0

2.0

3.0

10.0

1.0

1

4.0

5.0

6.0

7.0

10.0

2.0

2

8.0

9.0

10.0

-1.0

10.0

3.0

3

-1.0

-1.0

-1.0

-1.0

10.0

4.0

4

-1.0

-1.0

-1.0

-1.0

10.0

5.0

5

-1.0

-1.0

-1.0

-1.0

10.0

6.0

6

1.0

2.0

3.0

4.0

NaN

NaN

7

1.0

2.0

3.0

NaN

NaN

NaN

s2 = pd.Series([1,2,4,5,6],index=['A','B','C','D','E']) # 增加一行,必须指定name的请客
s2.name = 'wang'
df2.append(s2)



A

B

C

D

E

F

0

-1.0

1.0

2.0

3.0

10.0

1.0

1

4.0

5.0

6.0

7.0

10.0

2.0

2

8.0

9.0

10.0

-1.0

10.0

3.0

3

-1.0

-1.0

-1.0

-1.0

10.0

4.0

4

-1.0

-1.0

-1.0

-1.0

10.0

5.0

5

-1.0

-1.0

-1.0

-1.0

10.0

6.0

6

1.0

2.0

3.0

4.0

NaN

NaN

7

1.0

2.0

3.0

NaN

NaN

NaN

wang

1.0

2.0

4.0

5.0

6.0

NaN

df1



A

B

C

D

E

F

2017-01-01

-1

1

2

3

10

1

2017-01-02

4

5

6

7

10

2

2017-01-03

8

9

10

-1

10

3

2017-01-04

-1

-1

-1

-1

10

4

2017-01-05

-1

-1

-1

-1

10

5

2017-01-06

-1

-1

-1

-1

10

6

df1.insert(1,'Z',df2['E']) # 在第一列插入索引为G的df2中的E列
df1



A

Z

G

B

C

D

E

F

2017-01-01

-1

NaN

NaN

1

2

3

10

1

2017-01-02

4

NaN

NaN

5

6

7

10

2

2017-01-03

8

NaN

NaN

9

10

-1

10

3

2017-01-04

-1

NaN

NaN

-1

-1

-1

10

4

2017-01-05

-1

NaN

NaN

-1

-1

-1

10

5

2017-01-06

-1

NaN

NaN

-1

-1

-1

10

6

df2['E']
0    10.0
1 10.0
2 10.0
3 10.0
4 10.0
5 10.0
6 NaN
7 NaN
Name: E, dtype: float64
g = df1.pop('G') # 弹出G列
g
2017-01-01   NaN
2017-01-02 NaN
2017-01-03 NaN
2017-01-04 NaN
2017-01-05 NaN
2017-01-06 NaN
Freq: D, Name: G, dtype: float64
del df1['Z'] # 删除Z列
df1



A

B

C

D

E

F

2017-01-01

-1

1

2

3

10

1

2017-01-02

4

5

6

7

10

2

2017-01-03

8

9

10

-1

10

3

2017-01-04

-1

-1

-1

-1

10

4

2017-01-05

-1

-1

-1

-1

10

5

2017-01-06

-1

-1

-1

-1

10

6

df2 = df1.drop(['A','B'],axis=1)#删除AB列
df2



C

D

E

F

2017-01-01

2

3

10

1

2017-01-02

6

7

10

2

2017-01-03

10

-1

10

3

2017-01-04

-1

-1

10

4

2017-01-05

-1

-1

10

5

2017-01-06

-1

-1

10

6

df1



A

B

C

D

E

F

2017-01-01

-1

1

2

3

10

1

2017-01-02

4

5

6

7

10

2

2017-01-03

8

9

10

-1

10

3

2017-01-04

-1

-1

-1

-1

10

4

2017-01-05

-1

-1

-1

-1

10

5

2017-01-06

-1

-1

-1

-1

10

6

四、处理丢失数据

  • dropna
  • fillna
  • isnull
  • np.any
  • np.all
dates = np.arange(20170101,20170105)
df1 = pd.DataFrame(np.arange(12).reshape(4,3),index=dates,columns=['A','B','C'])
df1



A

B

C

20170101

0

1

2

20170102

3

4

5

20170103

6

7

8

20170104

9

10

11

df2 = pd.DataFrame(df1,index=dates,columns=['A','B','C','D','E'])
df2



A

B

C

D

E

20170101

0

1

2

NaN

NaN

20170102

3

4

5

NaN

NaN

20170103

6

7

8

NaN

NaN

20170104

9

10

11

NaN

NaN

s1 = pd.Series([3,4,6],index=dates[:3])
s2 = pd.Series([32,5,2],index=dates[1:])
df2['D'] = s1
df2['E'] = s2
df2



A

B

C

D

E

20170101

0

1

2

3.0

NaN

20170102

3

4

5

4.0

32.0

20170103

6

7

8

6.0

5.0

20170104

9

10

11

NaN

2.0

df2.dropna(axis=0,how='any') #axis=[0,1] 0代表行,1代表列。how=['any','all'] any任意一个或多个 all全部



A

B

C

D

E

20170102

3

4

5

4.0

32.0

20170103

6

7

8

6.0

5.0

df2.dropna(axis=0,how='all')



A

B

C

D

E

20170101

0

1

2

3.0

NaN

20170102

3

4

5

4.0

32.0

20170103

6

7

8

6.0

5.0

20170104

9

10

11

NaN

2.0

df2.dropna(axis=1,how='any')



A

B

C

20170101

0

1

2

20170102

3

4

5

20170103

6

7

8

20170104

9

10

11

df2.dropna(axis=1,how='all')  # 当一行所有的值为空的时候就删除这列



A

B

C

D

E

20170101

0

1

2

3.0

NaN

20170102

3

4

5

4.0

32.0

20170103

6

7

8

6.0

5.0

20170104

9

10

11

NaN

2.0

df2.fillna(value=0) # 空值赋为0



A

B

C

D

E

20170101

0

1

2

3.0

0.0

20170102

3

4

5

4.0

32.0

20170103

6

7

8

6.0

5.0

20170104

9

10

11

0.0

2.0

df2.isnull() # 查看空值



A

B

C

D

E

20170101

False

False

False

False

True

20170102

False

False

False

False

False

20170103

False

False

False

False

False

20170104

False

False

False

True

False

np.any(df2.isnull()) # 只要有一个或多个空值就会返回True
True
np.all(df2.isnull()) # 所有为空值才返回NTrue
False

五、读取并写入文件

  • read
  • to
file = pd.read_csv('../people.csv',encoding='gbk')
file



地点

名字

职位

工资

在职情况

0

北京

小红

渠道合作经理

15000

在职

1

北京

小明

行政专员

8000

离职

2

北京

小白

行政专员

9000

在职

3

上海

小绿

商户经理

12000

在职

4

上海

小黄

商户经理

10000

离职

5

上海

小黑

团队长

12000

在职

6

广州

小紫

渠道合作主管

20000

在职

7

广州

小粉

渠道合作主管

20000

在职

8

广州

小青

经理

10000

离职

9

广州

小蓝

团队长

13000

在职

file.loc[1,'地点'] = '志哥'
file



地点

名字

职位

工资

在职情况

0

北京

小红

渠道合作经理

15000

在职

1

志哥

小明

行政专员

8000

离职

2

北京

小白

行政专员

9000

在职

3

上海

小绿

商户经理

12000

在职

4

上海

小黄

商户经理

10000

离职

5

上海

小黑

团队长

12000

在职

6

广州

小紫

渠道合作主管

20000

在职

7

广州

小粉

渠道合作主管

20000

在职

8

广州

小青

经理

10000

离职

9

广州

小蓝

团队长

13000

在职

file.to_csv('peo2.csv')

六、concat拼接

  • 0行1列 outer:默认,缺少部分填空 inner:缺少部分去掉 ignore_index=True不考虑原来的index
df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['a','b','c','d'])
df2 = pd.DataFrame(np.arange(12,24).reshape((3,4)),columns=['a','b','c','d'])
df3 = pd.DataFrame(np.arange(24,36).reshape((3,4)),columns=['a','b','c','d'])
print(df1)
print(df2)
print(df3)
a  b   c   d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
a b c d
0 12 13 14 15
1 16 17 18 19
2 20 21 22 23
a b c d
0 24 25 26 27
1 28 29 30 31
2 32 33 34 35
df4 = pd.concat([df1,df2,df3],axis=0)#纵向合并
df4



a

b

c

d

0

0

1

2

3

1

4

5

6

7

2

8

9

10

11

0

12

13

14

15

1

16

17

18

19

2

20

21

22

23

0

24

25

26

27

1

28

29

30

31

2

32

33

34

35

df4 = pd.concat([df1,df2,df3],axis=0,ignore_index=True)#纵向合并,不考虑原来的index
df4



a

b

c

d

0

0

1

2

3

1

4

5

6

7

2

8

9

10

11

3

12

13

14

15

4

16

17

18

19

5

20

21

22

23

6

24

25

26

27

7

28

29

30

31

8

32

33

34

35

df5 = pd.concat([df1,df2,df3],axis=1)#横向合并
df5



a

b

c

d

a

b

c

d

a

b

c

d

0

0

1

2

3

12

13

14

15

24

25

26

27

1

4

5

6

7

16

17

18

19

28

29

30

31

2

8

9

10

11

20

21

22

23

32

33

34

35

df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['a','b','c','f'])
df2 = pd.DataFrame(np.arange(12,24).reshape((3,4)),columns=['a','c','d','e'])
print(df1)
print(df2)
a  b   c   f
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
a c d e
0 12 13 14 15
1 16 17 18 19
2 20 21 22 23
df6 = pd.concat([df1,df2],join='outer',ignore_index=True)#合并两个表,缺少的部分填充NaN
df6



a

b

c

f

d

e

0

0

1.0

2

3.0

NaN

NaN

1

4

5.0

6

7.0

NaN

NaN

2

8

9.0

10

11.0

NaN

NaN

3

12

NaN

13

NaN

14.0

15.0

4

16

NaN

17

NaN

18.0

19.0

5

20

NaN

21

NaN

22.0

23.0

df7 = pd.concat([df1,df2],join='inner',ignore_index=True)#合并两个表,缺少的部分去掉
df7



a

c

0

0

2

1

4

6

2

8

10

3

12

13

4

16

17

5

20

21

df1 = pd.DataFrame(np.arange(12).reshape((3,4)),columns=['a','b','c','f'])
df2 = pd.DataFrame(np.arange(12,24).reshape((4,3)),columns=['a','c','d'])
print(df1)
print(df2)
a  b   c   f
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
a c d
0 12 13 14
1 15 16 17
2 18 19 20
3 21 22 23
df8 = pd.concat([df1,df2],axis=0)#纵向向合并,index使用df1的index
df8



a

b

c

f

d

0

0

1.0

2

3.0

NaN

1

4

5.0

6

7.0

NaN

2

8

9.0

10

11.0

NaN

0

12

NaN

13

NaN

14.0

1

15

NaN

16

NaN

17.0

2

18

NaN

19

NaN

20.0

3

21

NaN

22

NaN

23.0

df8 = pd.concat([df1,df2],axis=1)#横向合并
df8



a

b

c

f

a

c

d

0

0.0

1.0

2.0

3.0

12

13

14

1

4.0

5.0

6.0

7.0

15

16

17

2

8.0

9.0

10.0

11.0

18

19

20

3

NaN

NaN

NaN

NaN

21

22

23

七、merge合并

  • outter、inner(默认)、left、right
  • suffixes 为columns名增加后缀
  • left_index right_index 左index索引和右index索引拼接
  • indicator 显示merge信息
left = pd.DataFrame({'key':['K0','K1','K2','K3'],
'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3']})

right = pd.DataFrame({'key':['K0','K1','K2','K3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3']})

print(left)
print(right)
key   A   B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 B3
key C D
0 K0 C0 D0
1 K1 C1 D1
2 K2 C2 D2
3 K3 C3 D3
res = pd.merge(left,right,on='key')
res



key

A

B

C

D

0

K0

A0

B0

C0

D0

1

K1

A1

B1

C1

D1

2

K2

A2

B2

C2

D2

3

K3

A3

B3

C3

D3

pd.concat([left,right],axis=1)



key

A

B

key

C

D

0

K0

A0

B0

K0

C0

D0

1

K1

A1

B1

K1

C1

D1

2

K2

A2

B2

K2

C2

D2

3

K3

A3

B3

K3

C3

D3

left = pd.DataFrame({'key1':['K0','K0','K1','K2'],
'key2':['K0','K1','K0','K1'],
'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3']})

right = pd.DataFrame({'key1':['K0','K1','K1','K3'],
'key2':['K0','K0','K0','K0'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3']})

print(left)
print(right)
key1 key2   A   B
0 K0 K0 A0 B0
1 K0 K1 A1 B1
2 K1 K0 A2 B2
3 K2 K1 A3 B3
key1 key2 C D
0 K0 K0 C0 D0
1 K1 K0 C1 D1
2 K1 K0 C2 D2
3 K3 K0 C3 D3
#how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='outer')#how默认inner
res



key1

key2

A

B

C

D

0

K0

K0

A0

B0

C0

D0

1

K0

K1

A1

B1

NaN

NaN

2

K1

K0

A2

B2

C1

D1

3

K1

K0

A2

B2

C2

D2

4

K2

K1

A3

B3

NaN

NaN

5

K3

K0

NaN

NaN

C3

D3

#how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='inner')#how默认inner
res



key1

key2

A

B

C

D

0

K0

K0

A0

B0

C0

D0

1

K1

K0

A2

B2

C1

D1

2

K1

K0

A2

B2

C2

D2

#how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='left')#how默认inner
res



key1

key2

A

B

C

D

0

K0

K0

A0

B0

C0

D0

1

K0

K1

A1

B1

NaN

NaN

2

K1

K0

A2

B2

C1

D1

3

K1

K0

A2

B2

C2

D2

4

K2

K1

A3

B3

NaN

NaN

#how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='outer',indicator=True)#显示merge信息
res



key1

key2

A

B

C

D

_merge

0

K0

K0

A0

B0

C0

D0

both

1

K0

K1

A1

B1

NaN

NaN

left_only

2

K1

K0

A2

B2

C1

D1

both

3

K1

K0

A2

B2

C2

D2

both

4

K2

K1

A3

B3

NaN

NaN

left_only

5

K3

K0

NaN

NaN

C3

D3

right_only

#how = ['left','right','inner','outer']
res = pd.merge(left,right,on=['key1','key2'],how='outer',indicator='indicator_column')#显示merge信息
res



key1

key2

A

B

C

D

indicator_column

0

K0

K0

A0

B0

C0

D0

both

1

K0

K1

A1

B1

NaN

NaN

left_only

2

K1

K0

A2

B2

C1

D1

both

3

K1

K0

A2

B2

C2

D2

both

4

K2

K1

A3

B3

NaN

NaN

left_only

5

K3

K0

NaN

NaN

C3

D3

right_only

left = pd.DataFrame({'A':['A0','A1','A2'],
'B':['B0','B1','B2']},
index = ['K0','K1','K2'])
right = pd.DataFrame({'C':['C0','C2','C3'],
'D':['D0','D2','D3']},
index=['K0','K2','K3'])
print(left)
print(right)
A   B
K0 A0 B0
K1 A1 B1
K2 A2 B2
C D
K0 C0 D0
K2 C2 D2
K3 C3 D3
res = pd.merge(left,right,left_index=True,right_index=True,how='outer')
res



A

B

C

D

K0

A0

B0

C0

D0

K1

A1

B1

NaN

NaN

K2

A2

B2

C2

D2

K3

NaN

NaN

C3

D3

boys = pd.DataFrame({'k':['K0','K1','K2'],'age':[1,2,3]})

girls = pd.DataFrame({'k':['K0','K0','K3'],'age':[4,5,6]})

print(boys)
print(girls)
k  age
0 K0 1
1 K1 2
2 K2 3
k age
0 K0 4
1 K0 5
2 K3 6
# suffixes 第一张表格签名加_boy 后面加_girl
res = pd.merge(boys,girls,on='k',suffixes=['_boy','_girl'],how='outer')
res



k

age_boy

age_girl

0

K0

1.0

4.0

1

K0

1.0

5.0

2

K1

2.0

NaN

3

K2

3.0

NaN

4

K3

NaN

6.0

八、pandas的plot

  • 画二维数据 index和value
import matplotlib.pyplot as plt
data = pd.Series(np.random.randn(1000),index=np.arange(1000)) # np.random.randn(1000):返回长度为1000 范围[0,1)的数组
data = data.cumsum() # 累计求和
data.plot()
plt.show()
print(data.head())

Python进阶—Pandas_多列

0    1.034952
1 1.375026
2 1.193759
3 1.519312
4 1.499621
dtype: float64
data = pd.DataFrame(np.random.randn(1000,4),index=np.arange(1000),columns=['A','B','C','D'])
data = data.cumsum()
print(data.head()) # 显示前五条
A         B         C         D
0 0.026638 -0.482280 -1.087122 0.081180
1 -0.855294 0.472860 -1.712153 -0.122494
2 -2.542649 0.781326 -1.618599 -1.055795
3 -2.540139 1.202146 0.936892 -2.537980
4 -1.896418 0.601826 1.585485 -2.012536
data.plot()
plt.show()

Python进阶—Pandas_开发语言_02

ax = data.plot.scatter(x='A',y='B',color='Blue',label='class 1')
data.plot.scatter(x='A',y='C',color='Green',label='class 2',ax=ax) # ax=ax 放一个图里
plt.show()

Python进阶—Pandas_多列_03


标签:index,01,进阶,Python,df1,df2,K0,pd,Pandas
From: https://blog.51cto.com/u_14608932/5976372

相关文章

  • Python进阶—Numpy
    Numpy再来一遍文章目录​​一、Numpy的属性​​​​二、创建array​​​​三、Numpy的运算​​​​四、随机数生成及矩阵的统计​​​​五、Numpy索引​​​​六、合并​......
  • 前端万字面经——进阶篇
    此文为前端进阶篇前边已经出了基础篇​​基础篇链接点击跳转​​服务端编程/AjaxJSONP的缺点JSON只支持get,因为script标签只能使用get请求;JSONP需要后端配合返回指定格......
  • Python学习笔记--高阶技巧(二)
    Socket服务端开发基本步骤如下:socket客户端开发基本步骤如下:1、创建socket对象2、连接到服务器3、发送消息4、接收返回消息5、关闭连接正则表达式基础方法......
  • PYTHON用时变马尔可夫区制转换(MARKOV REGIME SWITCHING)自回归模型分析经济时间序列|附
    全文下载链接:http://tecdat.cn/?p=22617最近我们被客户要求撰写关于MRS的研究报告,包括一些图形和统计输出。本文提供了一个在统计模型中使用马可夫转换模型模型的例子,来......
  • python读取文本中的字典
    首先得明确文本的每行是存的json或者用python的write(str(一个字典))写入的,那么不用借助json模块就能读取为字典,使用eval函数就行,json只能处理带双引号的字符串,但很多时候......
  • opencv-python学习之旅
    opencv-python操作*注:在此笔记中只记录下各种函数的使用,规则详细讲解见https://opencv.apachecn.org/#/docs/4.0.0/2.1-tutorial_py_image_display创建,读取,显示,保存图......
  • 我的Python笔记02
    声明:本文整理借鉴金角大王的Python之路,Day2-Python基础2,仅供本人学习使用!!!本节内容列表、元组操作字符串操作字典操作集合操作文件操作字符编码与转码 1.列表、元组操作......
  • 距离产生美?k近邻算法python实现
    微信公众号:AI有道(ID:redstonewill)1.什么是k近邻算法?k最近邻(k-NearestNeighbor,kNN)分类算法是一个比较成熟也是最简单的机器学习(MachineLearning)算法之一。该方法的思......
  • Python__16--集合
    1集合一句话,没有value的字典,无序可变序列。1.1集合的创建1.1.1使用{}s={2,3,4,5,5,6,7,7}#输出为{2,3,4,5,6,7}集合中元素不允许重复1.1.2使用set()测试代码:......
  • @05.Python基本运算符
    文章目录​​一.基本运算符的介绍​​​​1.运算符概述​​​​2.运算符的分类​​​​二.基本运算符的使用​​​​1.算数运算符​​​​1》算数运算符的介绍​​​​2》P......