《Python从入门到实践》项目数据可视化

标签：plt 入门 Python repo 可视化 import print ax dict

生成数据

安装Matplotlib

python -m pip install matplotlib

绘制简单的折线图

import matplotlib.pyplot as plt


squares = [1, 4, 9, 16, 25]
fig, ax = plt.subplots()
ax.plot(squares)

plt.show()

首先导入pyplot模块，并给他指定别名plt，以免反复输入pyplot，然后调用subplots()函数，这个函数可在一个图形绘制一个或多个图形，变量fig表示由生成的一些列绘图构成的整个图形，变量ax表示图形中的绘图，在大多数情况下，使用这个变量来定义和定制绘图。

接下来调用plot()方法，它将根据给定的数据以浅显易懂的方式绘制绘图。plt.show()函数打开Matplotlib查看器并显示绘图。

修改标签文字和线条粗细

import matplotlib.pyplot as plt


squares = [1, 4, 9, 16, 25]
fig, ax = plt.subplots()
ax.plot(squares, linewidth=3)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

plt.show()

通常，需要尝试不同的值，才能找到最佳参数生成理想的图

矫正绘图

折线图的终点指出4的平方为25，接下来我们修复这个问题。

在想plot()提供一个数值序列时，它假设第一个数据点对应的x坐标值为0，但这里第一个点对应的x坐标应该为1.

import matplotlib.pyplot as plt

input_vale = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
fig, ax = plt.subplots()
ax.plot(input_vale, squares, linewidth=3)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

plt.show()

现在，plot()无需对输出值的生成方式做出假设，因此生成了正确的绘图

使用内置样式

Matplotlib提供了很多已定义好的样式，要看到能在你的系统重使用的所有样式，可在终端会话中执行如下命令：

import matplotlib.pyplot as plt
print(plt.style.available)

要使用这些样式，可在调用subplots()的代码前添加如下代码行：

import matplotlib.pyplot as plt

input_vale = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]

plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
ax.plot(input_vale, squares, linewidth=3)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

plt.show()

使用scatter()绘制散点图并设置样式

要绘制单个点，可使用scatter()方法，并向他传递该点的x坐标和y坐标

import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
# s表示点的尺寸
ax.scatter(2, 4, s=200)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

plt.show()

使用scatter()绘制一系列点

要绘制一系列点，可想scatter()传递两个分别包含x坐标值和y坐标值的列表。

import matplotlib.pyplot as plt

x_value = [1, 2, 3, 4, 5]
y_value = [1, 4, 9, 16, 25]

plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
# s表示点的尺寸
ax.scatter(x_value, y_value, s=200)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

plt.show()

自动计算数据

手动指定列表要包含的值效率不高，在需要绘制的点很多时候尤其如此，我们可以不指定值，直接用使用循环来计算：

import matplotlib.pyplot as plt

x_value = range(1,1001)
y_value = [x**2 for x in x_value]

plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
# s表示点的尺寸
ax.scatter(x_value, y_value, s=1)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

# 设置每个坐标轴的取值范围
ax.axis([0, 1100, 0, 1_100_000])
plt.show()

定制刻度标记

在刻度标记表示的数足够大时，Matplotlib将默认使用科学计数法。这通常是好事，因为如果使用常规表示法，很大的数据将占据很多内存

几乎每个图形元素都是可定制的，如果你愿意，可让Matplotlib始终使用常规表示法

import matplotlib.pyplot as plt

x_value = range(1,1001)
y_value = [x**2 for x in x_value]

plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
# s表示点的尺寸
ax.scatter(x_value, y_value, s=1)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

# 设置每个坐标轴的取值范围
ax.axis([0, 1100, 0, 1_100_000])
ax.ticklabel_format(style='plain')
plt.show()

定制颜色

要修改数据点的颜色，可先scatter()传递参数color并将其设置为要使用的颜色的名称，如下：

ax.scatter(x_value, y_value, color='green', s=1)

还可以使用RGB颜色模式定制颜色，此时传递参数color，并将其设置为一个元组，其中包含三个0~1的浮点数，分别表示红色、绿色、蓝色分量：

ax.scatter(x_value, y_value, color=(0.9, 0.4, 0.2), s=1)

值越接近0，指定的颜色越深；值越接近1，指定的颜色越浅

使用颜色映射

ax.scatter(x_value, y_value, c=y_value, cmap=plt.cm.Reds, s=10)

颜色映射是一个从起始颜色渐变到结束颜色的颜色序列，在可视化中，颜色映射用于突出数据的规律，例如，你可以使用较浅的颜色来显示较小的值，使用较深的颜色来显示较大的值，使用颜色映射时，可根据精心设计的色标准确地设置所有点的颜色。

参数c类似于参数color，但用于将一系列值关联到颜色映射，这里将参数c设置成了一个y坐标值列表，并使用参数cmap告诉pyplot使用哪个颜色映射。

如Reds, Blues， jet

自动保存绘图

如果要将绘图保存到文件中，而不是在Matplotlib查看器中显示它，可将plt.show()替换为plt.savefig():

plt.savefig('square_plot.png', bbox_inches='tight')

第一个实参指定要以什么文件名保存绘图，这个文件将被存储到scatter_squares.py所在的目录中，第二个实参指定将绘图多余的空白区域裁剪掉。如果要保留绘图周围多余的空白区域，只需省略这个实参即可，你还可以再调用savefig()时使用Path对象，将输出文件存储到系统上的任何地方。

plt.savefig('C:/Users/xjj/Pictures/Camera Roll/square_plot.png', bbox_inches='tight')

练习

import matplotlib.pyplot as plt

x_value = range(1, 5001)
y_value = [x**3 for x in x_value]

plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()

# s表示点的尺寸
ax.scatter(x_value, y_value, c=y_value, cmap=plt.cm.jet, s=10)

# 设置图题并给坐标轴加上标签、
ax.set_title("Square Numbers", fontsize=24)
ax.set_xlabel("Value", fontsize=14)
ax.set_ylabel("Square of Value", fontsize=14)
# 设置刻度标记的样式
ax.tick_params(labelsize=14)

# 设置每个坐标轴的取值范围
plt.savefig('C:/Users/xjj/Pictures/Camera Roll/square_plot.png', bbox_inches='tight')
plt.show()

随机游走

随机游走是由一系列简单的随机决策产生的行走路径

创建RandomWalk类

为了模拟随机游走，我们将创建一个名为RandomWalk的类，用来随机选择前进的方向。

这个类需要三个属性，一个是跟踪随机游走次数的变量，另外两个是列表，分别存储随机游走经过的每个点的x,y坐标值。

from random import choice


class RandomWalk:
    """生成一个随机游走数据的类"""
    
    def __init__(self, num_points=5000):
        """初始化随机游走数据的类"""
        
        self.num_points = num_points
        # 所有随机游走都始于（0,0）
        self.x_values = [0]
        self.y_values = [0]

选择方向

下面使用fill_walk()方法来生成游走包含的点：

放在class RandomWalk类中：

def fill_walk(self):
    """计算随机游走包含的所有点"""
    
    # 不断游走，直到列表达到指定的长度
    while len(self.x_values) < self.num_points:
        
        # 决定前进的方向以及沿这个方向前进的距离
        x_direction = choice([-1, 1])
        x_distance = choice([0, 1, 2, 3, 4])
        x_step = x_direction*x_distance

        y_direction = choice([-1, 1])
        y_distance = choice([0, 1, 2, 3, 4])
        y_step = y_direction * y_distance
        
        # 拒绝原地踏步
        if x_step == 0 and y_step ==0:
            continue
            
        # 计算下一个点的x坐标和y坐标
        x = self.x_values[-1] + x_step
        y = self.y_values[-1] + y_step
        
        self.x_values.append(x)
        self.y_values.append(y)

绘制随机游走图：

模拟多次随机游走

要在不运行程序多次的情况下使用前面的代码模拟多次随机游走，一种办法是将这些代码放在一个while循环中：

import matplotlib.pyplot as plt
from random_walk import RandomWalk

while True:
    # 创建一个RandomWalk实例
    rw = RandomWalk()
    rw.fill_walk()

    # 将所有的点都绘制出来
    plt.style.use('classic')
    fig, ax = plt.subplots()
    ax.scatter(rw.x_values, rw.y_values, s=15)
    ax.set_aspect('equal')

import matplotlib.pyplot as plt
from random_walk import RandomWalk

while True:
    # 创建一个RandomWalk实例
    rw = RandomWalk()
    rw.fill_walk()

    # 将所有的点都绘制出来
    plt.style.use('classic')
    fig, ax = plt.subplots()
    point_numbers = range(rw.num_points)
    ax.scatter(rw.x_values, rw.y_values, c=point_numbers,
               cmap=plt.cm.Blues, s=15)
    ax.set_aspect('equal')

    plt.show()

    keep_running = input("Make another walk?(y/n):\n")
    if keep_running == 'n':
        break

    plt.show()

    keep_running = input("Make another walk?(y/n):\n")
    if keep_running == 'n':
        break

设置随机游走图的样式

给点着色

import matplotlib.pyplot as plt
from random_walk import RandomWalk

while True:
    # 创建一个RandomWalk实例
    rw = RandomWalk()
    rw.fill_walk()

    # 将所有的点都绘制出来
    plt.style.use('classic')
    fig, ax = plt.subplots()
    point_numbers = range(rw.num_points)
    ax.scatter(rw.x_values, rw.y_values, c=point_numbers,
               cmap=plt.cm.Blues, edgecolor='none', s=15)
    ax.set_aspect('equal')

    plt.show()

    keep_running = input("Make another walk?(y/n):\n")
    if keep_running == 'n':
        break

将参数c设置为point_numbers，指定使用颜色映射Blues，并传递实参edgecolor='none'，用以删除每个点的轮廓。

重新绘制起点和终点

import matplotlib.pyplot as plt
from random_walk import RandomWalk

while True:
    # 创建一个RandomWalk实例
    rw = RandomWalk()
    rw.fill_walk()

    # 将所有的点都绘制出来
    plt.style.use('classic')
    fig, ax = plt.subplots()
    point_numbers = range(rw.num_points)
    ax.scatter(rw.x_values, rw.y_values, c=point_numbers,
               cmap=plt.cm.Blues, edgecolor='none', s=15)
    ax.set_aspect('equal')

    # 突出起点和终点
    ax.scatter(0, 0, color='green', edgecolor='none', s=100)
    ax.scatter(rw.x_values[-1], rw.y_values[-1], color='red',
               edgecolor='none', s=100)


    plt.show()

    keep_running = input("Make another walk?(y/n):\n")
    if keep_running == 'n':
        break

隐藏坐标轴：

import matplotlib.pyplot as plt
from random_walk import RandomWalk

while True:
    # 创建一个RandomWalk实例
    rw = RandomWalk()
    rw.fill_walk()

    # 将所有的点都绘制出来
    plt.style.use('classic')
    fig, ax = plt.subplots()
    point_numbers = range(rw.num_points)
    ax.scatter(rw.x_values, rw.y_values, c=point_numbers,
               cmap=plt.cm.Blues, edgecolor='none', s=15)
    ax.set_aspect('equal')

    # 突出起点和终点
    ax.scatter(0, 0, color='green', edgecolor='none', s=100)
    ax.scatter(rw.x_values[-1], rw.y_values[-1], color='red',
               edgecolor='none', s=100)

    #隐藏坐标轴
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    plt.show()

    keep_running = input("Make another walk?(y/n):\n")
    if keep_running == 'n':
        break

增加点的个数

rw = RandomWalk(50000)

在创建对象的时候改变默认的点数

调整尺寸以适应屏幕

fig, ax = plt.subplots(figsize=(15, 9))

参数figsize是一个元组，向Matplotlib指出绘图窗口的尺寸，单位为英寸。

如果直到当前系统的分辨率，可通过参数dpi向plt.subplots()传递该分辨率

fig, ax = plt.subplots(figsize=(15, 9), dpi=128)

练习

15.3

import matplotlib.pyplot as plt
from random_walk import RandomWalk

while True:
    # 创建一个RandomWalk实例
    rw = RandomWalk()
    rw.fill_walk()

    # 将所有的点都绘制出来
    plt.style.use('classic')
    fig, ax = plt.subplots(figsize=(15, 9), dpi=128)
    point_numbers = range(rw.num_points)
    ax.set_aspect('equal')
    ax.plot(rw.x_values, rw.y_values, color='blue', linewidth=1)

    plt.show()

    keep_running = input("Make another walk?(y/n):\n")
    if keep_running == 'n':
        break

15.5

from random import choice


class RandomWalk:
    """生成一个随机游走数据的类"""

    def __init__(self, num_points=5000):
        """初始化随机游走数据的类"""

        self.num_points = num_points
        # 所有随机游走都始于（0,0）
        self.x_values = [0]
        self.y_values = [0]

    def get_step(self):
        direction = choice([-1, 1])
        distance = choice([0, 1, 2, 3, 4, 5])
        return direction * distance

    def fill_walk(self):
        """计算随机游走包含的所有点"""

        # 不断游走，直到列表达到指定的长度
        while len(self.x_values) < self.num_points:

            # 决定前进的方向以及沿这个方向前进的距离
            x_step = self.get_step()
            y_step = self.get_step()

            # 拒绝原地踏步
            if x_step == 0 and y_step == 0:
                continue

            # 计算下一个点的x坐标和y坐标
            x = self.x_values[-1] + x_step
            y = self.y_values[-1] + y_step

            self.x_values.append(x)
            self.y_values.append(y)

使用Plotly模拟掷骰子

安装Plotly

使用pip安装Plotly以及pandas

Plotly Express依赖于pandas，因此需要同时安装pandas

创建Die类

from random import randint


class Die:
    """表示一个骰子的类"""
    
    def __init__(self, num_sides=6):
        self.num_sides = num_sides
        
    def roll(self):
        """返回一个介于1和骰子面数之间的随机值"""
        return randint(1, self.num_sides)

掷骰子

from  die import Die


die = Die()
results = []
for roll_num in range(100):
    result = die.roll()
    results.append(result)

print(results)

分析结果

from  die import Die


die = Die()
results = []
for roll_num in range(100):
    result = die.roll()
    results.append(result)

print(results)

# 分析结果
frequencies = []
poss_results = range(1, die.num_sides+1)
for value in poss_results:
    frequency = results.count(value)
    frequencies.append(frequency)

print(frequencies)

绘制直方图

有了所需的数据，就可以使用Plotly Express来创建图形了。

from die import Die
import plotly.express as px


die = Die()
results = []
for roll_num in range(1000):
    result = die.roll()
    results.append(result)

print(results)

# 分析结果
frequencies = []
poss_results = range(1, die.num_sides+1)
for value in poss_results:
    frequency = results.count(value)
    frequencies.append(frequency)

print(frequencies)

# 对结果进行可视化
fig = px.bar(x=poss_results, y=frequencies)
fig.show()

调用fig.show()让Plotly将生成的直方图渲染为HTML文件，并在最后一个新的浏览器选项卡中打开这个文件。

这个直方图非常简单，但是并不完整。然而这正是Plotly Express的用途所在：让你在编写几行代码就能查看生成的图，确定它以你希望的方式呈现了数据。如果你对结果大致满意，可进一步定制图形元素，如标签的样式，如果你想使用其他的图表类型，也可马上做出改变，而不用花额外的时间来定制当前的图形状，如将px.bar()替换为px.scatter(),px.line()，直方图，散点图，折线图。

定制绘图

title = 'Result of Rolling One D6 1000 Times'
labels = {'x': 'Result', 'y': 'Frequency of Result'}
fig = px.bar(x=poss_results, y=frequencies, title=title, labels=labels)
fig.show()

同时投掷两个骰子

from die import Die
import plotly.express as px


die_1 = Die()
die_2 = Die()

results_1 = []
results_2 = []
results = []
for roll_num in range(1000):
    result_1 = die_1.roll()
    result_2 = die_2.roll()
    result = result_1 + result_2
    results_1.append(result_1)
    results_2.append(result_2)
    results.append(result)

# 分析结果
frequencies = []
poss_results = range(2, 2*die_1.num_sides+1)
for value in poss_results:
    frequency = results.count(value)
    frequencies.append(frequency)

print(frequencies)

# 对结果进行可视化
title = 'Result of Rolling One D6 1000 Times'
labels = {'x': 'Result', 'y': 'Frequency of Result'}
fig = px.bar(x=poss_results, y=frequencies, title=title, labels=labels)
fig.show()

进一步定制

Plotly提供了update_layout()方法，可用来对创建的图形做各种修改。下面演示了如何让Plotly给每个条形都加上标签：

fig.update_layout(xaxis_dtick=1)

表示对整张图的fig对象调用update_layout()方法。这里传递了参数xaxis_dtick，它指定x轴上刻度标记的间距，我们将这个间距设置为1，给每个条形都加上了标签。

同时投掷两个面数不同的骰子

from die import Die
import plotly.express as px


die_1 = Die()
die_2 = Die(10)

results_1 = []
results_2 = []
results = []
for roll_num in range(50000):
    result_1 = die_1.roll()
    result_2 = die_2.roll()
    result = result_1 + result_2
    results_1.append(result_1)
    results_2.append(result_2)
    results.append(result)

# 分析结果
frequencies = []
poss_results = range(2, die_1.num_sides+die_2.num_sides+1)
for value in poss_results:
    frequency = results.count(value)
    frequencies.append(frequency)

print(frequencies)

# 对结果进行可视化
title = 'Result of Rolling One D6 1000 Times'
labels = {'x': 'Result', 'y': 'Frequency of Result'}
fig = px.bar(x=poss_results, y=frequencies, title=title, labels=labels)
fig.update_layout(xaxis_dtick=1)
fig.show()

下载数据

CSV文件格式

将数据组织为一系列以逗号分隔的值，这样的文件称为CSV文件

CSV文件阅读起来比较麻烦，但是程序能够快速而准确地提取并处理其中的信息。

解析CSV文件头

from pathlib import Path
import csv


path = Path('D:/python/Lib/site-packages/python_work/36736099999.csv')
lines = path.read_text().splitlines()

reader = csv.reader(lines)
header_row = next(reader)
print(header_row)

创建了reader对象，用于解析文件的各行，为了创建reader对象，调用csv.reader()函数并包含CSV文件中各行的列表传递给他

当以reader对象为参数时，函数next()返回文件中的下一行（从文件开头开始），在上述代码中，只调用了next()一次，而且是首次调用，因此得到的是文件的第一行，其中包含文件头。

打印文件头机器位置

from pathlib import Path
import csv


path = Path('D:/python/Lib/site-packages/python_work/36736099999.csv')
lines = path.read_text().splitlines()

reader = csv.reader(lines)
header_row = next(reader)

for index, column_header in enumerate(header_row):
    print(index, column_header)

对列表调用enumerate()来获取每个元素的索引及其值

提取并读取数据

尝试提取某一列的全部数据

from pathlib import Path
import csv


path = Path('D:/python/Lib/site-packages/python_work/36736099999.csv')
lines = path.read_text().splitlines()

reader = csv.reader(lines)
header_row = next(reader)

# 提取平均温度
temps = []
for row in reader:
    temp = float(row[6])
    temps.append(temp)

print(temps)

先创建一个temps空列表，在便利文件中余下的各行，reader对象从刚才中断的地方继续往下读取CSV文件，每次都自动返回当前所处位置的下一行，由于以及读取了文件头行，这个循环将从第二行开始——从这行开始才是实际数据，每次执行循环时都将索引为4的数据追加到temps的末尾，在文件中，这项数据是以字符串的格式存储的，因此在追加到temps的末尾时，要使用float()函数将其转换为数值格式，以便使用。

绘制温度图

为了可视化这些温度数据，首先使用Matplotlib创建一个显示每日最高温度的简单绘图

from pathlib import Path
import csv
import matplotlib.pyplot as plt


path = Path('D:/python/Lib/site-packages/python_work/36736099999.csv')
lines = path.read_text().splitlines()

reader = csv.reader(lines)
header_row = next(reader)
print(header_row)

# 提取平均温度
temps = []
for row in reader:
    temp = float(row[6])
    temps.append(temp)

print(temps)

# 根据温度绘图
plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
ax.plot(temps, color='red')

# 设置绘图的格式
ax.set_title('Average Temperatures, 1952', fontsize=24)
ax.set_xlabel('', fontsize=16)
ax.set_ylabel('Temperatures(F)', fontsize=16)
# tick_params用于设置刻度线的样式
ax.tick_params(labelsize=16)

plt.show()

datetime模块

在读取日期时，获得的是一个字符串，因此需要想办法将字符串”2021-7-1“转换为一个相应日期的对象，为了创建一个表示2021年7月1日的对象，可使用datetime模块中的strptime()方法：

from datetime import datetime

first_date = datetime.strptime('2021-07-01', '%Y-%m-%d')
print(first_date)

2021-07-01 00:00:00

首先导入datetime模块中的datetime类，再调用strptime()方法，将包含日期的字符串作为第一个实参，第二各实参告诉Python如何设置日期的格式。

strptime()方法的第二个实参可接受各种以%打头的参数，并根据它们决定如何解读日期

在途中添加日期

现在可对温度图进行改进——提取日期和最高温度，并将日期作为x坐标值：

from pathlib import Path
import csv
import matplotlib.pyplot as plt
from datetime import datetime


path = Path('D:/python/Lib/site-packages/python_work/36736099999.csv')
lines = path.read_text().splitlines()

reader = csv.reader(lines)
header_row = next(reader)
print(header_row)

# 提取平均温度
temps, dates = [], []
for row in reader:
    current_date = datetime.strptime(row[1], '%Y-%m-%d')
    temp = float(row[6])
    temps.append(temp)
    dates.append(current_date)

print(temps)
print(dates)

# 根据温度绘图
plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
ax.plot(dates, temps, color='red')

# 设置绘图的格式
ax.set_title('Average Temperatures, 1952', fontsize=24)
ax.set_xlabel('', fontsize=16)
ax.set_ylabel('Temperatures(F)', fontsize=16)
# 绘制倾斜的日期标签，以防止重叠
fig.autofmt_xdate()
# tick_params用于设置刻度线的样式
ax.tick_params(labelsize=16)

plt.show()

datetime模块还能够合理安排日期，让其合理分布

涵盖更长的时间

方法同上

再绘制一个图形

可以再一张图中包含两个图像

ax.plot(dates, highs, color='red')

ax.plot(dates, lows, color='red')

给图中区域着色

fill_between()方法，它接受一组x坐标和两组y坐标，并填充两组y坐标值之间的空间

ax.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)

实参alpha指定颜色的透明度，0表示完全透明，1表示完全不透明

错误检查

当出现数据缺失，可以合理采用try-except-else

for row in reader:
    current_date = datetime.strptime(row[1], '%Y-%m-%d')
    try:
        high = int(row[3])
        low = int(row[4])
    except:
        print(f"Missing data for {current_date}")
    else:
        dates.append(current_date)
        highs.append(high)
        lows.append(low)

自娱自乐：

from pathlib import Path
import csv
import matplotlib.pyplot as plt
from datetime import datetime


path = Path('D:/python/Lib/site-packages/python_work/3639474.csv')
lines = path.read_text().splitlines()

reader = csv.reader(lines)
header_row = next(reader)
print(header_row)

# 提取最低温和最高温
tmaxs, tmins, dates = [], [], []
for row in reader:
    current_date = datetime.strptime(row[2], '%Y-%m-%d')
    try:
        tmax = int(row[4])
        tmin = int(row[5])
    except:
        print(f"Missing data for {current_date}")
    else:
        dates.append(current_date)
        tmaxs.append(tmax)
        tmins.append(tmin)


print(dates)
print(tmaxs)
print(tmins)

# 根据温度绘图
plt.style.use('seaborn-v0_8')
fig, ax = plt.subplots()
ax.plot(dates, tmaxs, color='red')
ax.plot(dates, tmins, color='red')

# 设置绘图的格式
ax.set_title('Temperatures in 2024', fontsize=24)
ax.set_xlabel('', fontsize=16)
ax.set_ylabel('Temperatures(F)', fontsize=16)
# 绘制倾斜的日期标签，以防止重叠
fig.autofmt_xdate()
# tick_params用于设置刻度线的样式
ax.tick_params(labelsize=16)
ax.fill_between(dates, tmaxs, tmins, facecolor='blue', alpha=0.5)

plt.show()

制作全球地震散点图：GeoJSON格式

GeoJSON格式使用json模块来处理

地震数据

直接打开GeoJSON能够发现数据内容密密麻麻难以阅读，这些数据适合机器读取。

查看GeoJSON数据

json模块提供了探索和处理JSON数据的各种工具，其中有一些有助于重新设置这个文件的格式，让我们能够更清楚地查看原始数据，继而决定如何以编程的方式处理它们。

首先加载这些数据并以易于阅读的方式显示它们，这个数据文件很长，因此我们不打印它，而是将数据写入另一个文件，从而可以打开这个文件并轻松地滚动查看。

from pathlib import Path
import json


# 将数据作为字符串读取并转换为Python对象
path = Path('D:/python/Lib/site-packages/python_work/赣州市.geoJson')
contents = path.read_text('utf-8-sig')
all_eq_data = json.loads(contents)

# 将数据文件转换为更易于阅读的版本
path = Path('D:/python/Lib/site-packages/python_work/readable_赣州市.geoJson')
readable_contents = json.dumps(all_eq_data, indent=4)
path.write_text(readable_contents)

GeoJSON格式遵循（经度，纬度）的格式

创建地震列表

from pathlib import Path
import json


# 将数据作为字符串读取并转换为Python对象
path = Path('D:/python/Lib/site-packages/python_work/赣州市.geoJson')
contents = path.read_text('utf-8-sig')
all_eq_data = json.loads(contents)

# 查看数据集中的所有地震
all_eq_dicts = all_eq_data['features']
print(len(all_eq_dicts))

提取震级

from pathlib import Path
import json


# 将数据作为字符串读取并转换为Python对象
path = Path('D:/python/Lib/site-packages/python_work/赣州市.geoJson')
contents = path.read_text('utf-8-sig')
all_eq_data = json.loads(contents)

# 查看数据集中的所有地震
all_eq_dicts = all_eq_data['features']
print(len(all_eq_dicts))

mags = []
for eq_dict in all_eq_dicts:
    mag =eq_dict['properties']['mage']   # 目录的值还是目录
    mags.append(mag)
    
print(mags[:10])

提取位置数据

from pathlib import Path
import json


# 将数据作为字符串读取并转换为Python对象
path = Path('D:/python/Lib/site-packages/python_work/赣州市.geoJson')
contents = path.read_text('utf-8-sig')
all_eq_data = json.loads(contents)

# 查看数据集中的所有地震
all_eq_dicts = all_eq_data['features']
print(len(all_eq_dicts))

mags, titles, lons, lats = [], [], [], []
for eq_dict in all_eq_dicts:
    mag =eq_dict['properties']['mage'] 
    title = eq_dict['properties']['title']
    lon = eq_dict['geometry']['coordinates'][0] # 目录的值还是目录
    lat = eq_dict['geometry']['coordinates'][1]
    mags.append(mag)
    titles.append(title)
    lons.append(lon)
    lats.append(lat)

有了这些数据，就可以绘制地震散点图了

绘制地震散点图

绘制初始散点图的代码如下：

from pathlib import Path
import json
import  plotly.express as px


# 将数据作为字符串读取并转换为Python对象
path = Path('D:/python/Lib/site-packages/python_work/赣州市.geoJson')
contents = path.read_text('utf-8-sig')
all_eq_data = json.loads(contents)

# 查看数据集中的所有地震
all_eq_dicts = all_eq_data['features']
print(len(all_eq_dicts))

mags, titles, lons, lats = [], [], [], []
for eq_dict in all_eq_dicts:
    mag =eq_dict['properties']['mage']
    title = eq_dict['properties']['title']
    lon = eq_dict['geometry']['coordinates'][0] # 目录的值还是目录
    lat = eq_dict['geometry']['coordinates'][1]
    mags.append(mag)
    titles.append(title)
    lons.append(lon)
    lats.append(lat)

fig = px.scatter(
    x = lons,
    y = lats,
    labels={'x': '经度', 'y':'维度'},
    range_x=[-200,200],
    rang_y=[-90,90],
    width=800,
    height=800,
    title='全球地震散点图',
    )
fig.write_html('global_eqrthquakes.html')
fig.show()

fig.write_html方法可以将图形保存为.html文件。在文件夹中找到global_eqrthquakes.html，用浏览器打开即可，如果使用Jupyter Notebook，可以直接使用fig.show方法在notebook单元格中显示散点图。

指定数据的另一种方式

当前，经度和纬度数据是手动配置的

x=lons,
y=lats,
labels={'x':'经度', 'y':'纬度'}

这是在Plotly Express中给图形指定数据的最简单的方式之一，但在数据处理中并不是最佳的，下面介绍给图形指定数据的一种等效方式，需要使用pandas数据分析工具，首先创建一个DataFrame，将需要的数据封装起来：

import pandas as pd

data = pd.DataFrame(
    data=zip(lons, lats, titles, mags), columns=['经度', '纬度', '位置', '震级']
    )
data.head()

然后将配置参数的方式变更为：

data,
x='经度'
y='纬度'

关于zip:

参考：详细分析Python遇到的各种数据结构Map、Dict、Set、DataFrame、Series、Zip_python map结构-CSDN博客

定制标记的尺寸：

fig = px.scatter(
    data,
    x='经度',
    y='纬度',
    range_x=[-200,200],
    rang_y=[-90,90],
    width=800,
    height=800,
    title='全球地震散点图',
    size='震级',
    size_max=10,
    )
    这里使用size参数来指定散点图中每个标记的尺寸，只需要将前面data中的‘震级’字段提供给size参数即可，另外标记尺寸默认为20像素，还可以通过size_max=10将最大显示尺寸缩小到10像素。

定制标记的颜色

try:
    contents = path.read()
except UnicodeDecodeError:
    contents = path.read('utf-8')

fig = px.scatter(
    data,
    x='经度',
    y='纬度',
    range_x=[-200,200],
    rang_y=[-90,90],
    width=800,
    height=800,
    title='全球地震散点图',
    size='震级',
    size_max=10,
    color='震级'，
    )

视觉映射图例的默认渐变色范围是从蓝色到红色再到黄色，数值越小标记越蓝，数值越大则标记越黄。

其他渐变

import plotly.express as px

px.colors.named_colorscales()

可以通过这两行代码知道哪些渐变可供使用

添加悬停文本

为了完成这幅散点图的绘制，我们将添加一些说明性文本，在你将鼠标指向表示地震的标记时显示出来。除了默认显示的经度和纬度以外，这还将显示震级以及地震的大致位置。

fig = px.scatter(
    data,
    x='经度',
    y='纬度',
    range_x=[-200,200],
    rang_y=[-90,90],
    width=800,
    height=800,
    title='全球地震散点图',
    size='震级',
    size_max=10,
    color='震级',
    hover_name='位置'
    )

使用API

API是网站的一部分，用于与程序进行交互。这些程序使用非常具体的URL请求特定的信息，而这种请求称为API调用，请求的数据将以程序易于处理的格式（如JSON,CSV）返回。使用外部数据源的应用程序大多依赖API调用。

Git和GitHub

GitHub上的项目都存储在仓库中，接下来我们将编写一个程序，自动下载GitHub上星数最多的Python项目的信息，并对这些信息进行可视化。

使用API调用请求数据

在浏览器的地址栏中输入如下地址并按回车键：

https://api.github.com/search/repositories?q=language:python+sort:stars

这个API调用返回GitHub当前托管了多少个Python项目，以及有关最受欢迎的Python仓库的信息

https://api.github.com/是GitHub的API地址，接下来的search/repositories让API搜索GitHub上的所有仓库，repositories后面的问号指出需要传递一个参数，参数q表示查询，而等号(=)让我们能够开始指定查询（q=）。接着，通过language:python指出只想要获取主要语言为Python的仓库的信息。最后的+sort:stars指定将项目按照星数排序。

incomplete_results为ture说明GitHub没有处理完这个查询。为了确保API能够及时相应所有用户，GitHub对每个查询的运行时间进行了限制。

安装Requests

-m pip install --user requests

处理API响应

编写程序，自动执行API调用并处理结果：

import requests

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

# 处理结果
print(response_dict.keys())

通过指定headers显式地要求使用这个版本的API并返回JSON格式的结果。 headers的作用在于使用我们浏览器的信息，将请求伪装成是浏览器发出的。

使用request调用API，发送HTTP GET请求到指定的URL，并获取响应对象。

响应对象包含一个status_code的属性，状态码200表示成功，我们打印status_code，以合适调用是否成功。前面已经让API放回JSON格式的信息了，因此使用json()方法将这些信息转换为一个Python字典，并将结果赋给变量response_dict，最后，打印出response_dict中的键。

处理响应字典

将API调用返回的信息存储到字典中后，就可以处理其中的数据了。

import requests

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

# 处理结果
print(response_dict.keys())
print(f"Total repositories: {response_dict['total_count']}")
print(f"Complete results: {not response_dict['incomplete_results']}")

# 探索有关仓库的信息
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")

# 研究第一个仓库
repo_dict = repo_dicts[0]
print(f"\nKeys:{len(repo_dict)}")
for key in sorted(repo_dict.keys()):
    print(key)

如你所见，GitHub有足够的时间处理完这个API调用，在这个响应中，GitHub返回了前30个满足查询条件的仓库的信息。如果要获取更多仓库的信息，可请求额外的数据页码：

下面来提取repo_dict中与一些键相关联的值：

import requests

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

# 处理结果
print(response_dict.keys())
print(f"Total repositories: {response_dict['total_count']}")
print(f"Complete results: {not response_dict['incomplete_results']}")

# 探索有关仓库的信息
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")

# 研究第一个仓库
repo_dict = repo_dicts[0]
print(f"\nKeys:{len(repo_dict)}")
for key in sorted(repo_dict.keys()):
    print(key)

print("\nSelected information about first repository:")
print(f"Name:{repo_dict['name']}")
print(f"Owner:{repo_dict['owner']['login']}")    #套娃典中典
print(f"Stars:{repo_dict['stargazers_count']}")
print(f"Repository:{repo_dict['html_url']}")
print(f"Created:{repo_dict['created_at']}")
print(f"Updated:{repo_dict['updated_at']}")
print(f"Description:{repo_dict['description']}")

概述最受欢迎的仓库

在对这些数据进行可视化时，我们想涵盖多个仓库，下面就来编写一个循环，打印API调用返回的每个仓库的特定信息，以便能够在图形中包含这些信息：

import requests

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

# 处理结果
print(response_dict.keys())
print(f"Total repositories: {response_dict['total_count']}")
print(f"Complete results: {not response_dict['incomplete_results']}")

# 探索有关仓库的信息
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")

print("\nSelected information about each repository:")

for repo_dict in repo_dicts:
    print(f"\nName:{repo_dict['name']}")
    print(f"Owner:{repo_dict['owner']['login']}")
    print(f"Stars:{repo_dict['stargazers_count']}")
    print(f"Repository:{repo_dict['html_url']}")
    print(f"Description:{repo_dict['description']}")

监控API的速率限制

大多数API存在速率限制，即在特定时间内可执行的请求数存在限制，要获悉是否接近了GitHub的限制，请在浏览器中输入：

http://api.github.com/rate_limit

我们将会得到：


resources
core	{…}
graphql	{…}
integration_manifest	{…}
search
limit	10
remaining	10
reset	1711698161
used	0
resource	"search"
rate	{…}

我们可以看到限制为每分钟10个请求，在当前这一分钟内，还可以执行remaining也就是9个请求，与键reset对应的值是配额将被重置的Unix时间或新纪元时间（从1970年1月1日零点开始经过的秒数），在用完配额后，我们将收到一条简单的响应信息，得知已到达API的限值，到达限值后，必须等待配额重置。

注意：很多API要求，在通过注册获得API密钥（访问令牌）后，才能执行API调用，在本书编写期间，GitHub没有这样的要求，但是获得访问令牌后，配额将高得多。

使用Plotly可视化仓库

下面使用收集到的数据来创建图形，以展示GitHub上Python项目的受欢迎程度，我们将创建一个交互式条形图，其中条形的高度表示项目获得了多少颗星，而单机条形将进入相应项目在GitHub上的主页。

import requests
import plotly.express as px

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

print(f"Complete results: {not response_dict['incomplete_results']}")

# 探索有关仓库的信息
repo_dicts = response_dict['items']
repo_names, stars = [], []
for repo_dict in repo_dicts:
    repo_names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])

# 可视化
fig = px.bar(x=repo_names, y=stars)
fig.show()

设置图形的样式

import requests
import plotly.express as px

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

print(f"Complete results: {not response_dict['incomplete_results']}")

# 探索有关仓库的信息
repo_dicts = response_dict['items']
repo_names, stars = [], []
for repo_dict in repo_dicts:
    repo_names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])

# 可视化
title = "Most-Starred Python Projects on GitHub"
labels = {'x': 'Repository', 'y': 'Stars'}
fig = px.bar(x=repo_names, y=stars, title=title, labels=labels)
fig.update_layout(title_font_size=28, xaxis_title_font_size=20,
                  yaxis_title_font_size=20)

fig.show()

添加定制工具提示

在Plotly中，将鼠标指向条形将显示它表示的信息，这通常称为工具提示（tooltip）。在这里，当前显示的是项目获得了多少颗星。下面来添加定制工具提示，以显示项目的描述和所有者：

import requests
import plotly.express as px

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

print(f"Complete results: {not response_dict['incomplete_results']}")

# 探索有关仓库的信息
repo_dicts = response_dict['items']
repo_names, stars, hover_texts = [], [], []
for repo_dict in repo_dicts:
    repo_names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])

    # 创建悬停文本
    owner = repo_dict['owner']['login']
    description = repo_dict['description']
    hover_text = f"{owner}<br />{description}"
    hover_texts.append(hover_text)

# 可视化
title = "Most-Starred Python Projects on GitHub"
labels = {'x': 'Repository', 'y': 'Stars'}
fig = px.bar(x=repo_names, y=stars, title=title, labels=labels, hover_name=hover_texts)
fig.update_layout(title_font_size=28, xaxis_title_font_size=20,
                  yaxis_title_font_size=20)

fig.show()

<br /> 换行符

添加可单击的链接

import requests
import plotly.express as px

# 执行API调用并查看响应
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {'Accept': 'application/vnd.github.v3+json'}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")

# 将响应转换为字典
response_dict = r.json()

print(f"Complete results: {not response_dict['incomplete_results']}")

# 探索有关仓库的信息
repo_dicts = response_dict['items']
repo_links, stars, hover_texts = [], [], []
for repo_dict in repo_dicts:
    # 将仓库名转换为链接
    repo_name = repo_dict['name']
    repo_url = repo_dict['html_url']
    repo_link = f"<a href='{repo_url}'>{repo_name}</a>"
    repo_links.append(repo_link)
    stars.append(repo_dict['stargazers_count'])

    # 创建悬停文本
    owner = repo_dict['owner']['login']
    description = repo_dict['description']
    hover_text = f"{owner}<br />{description}"
    hover_texts.append(hover_text)

# 可视化
title = "Most-Starred Python Projects on GitHub"
labels = {'x': 'Repository', 'y': 'Stars'}
fig = px.bar(x=repo_links, y=stars, title=title, labels=labels, hover_name=hover_texts)
fig.update_layout(title_font_size=28, xaxis_title_font_size=20,
                  yaxis_title_font_size=20)

fig.show()

这里将reop_names修改成repo_links，更准确指出了其中存放的是哪些信息，然后从repo_dict中提取项目的URL，将其赋给临时变量repo_url。接下来创建一个指向项目的连接，为此使用了HTML标签<a>，其格式为<a herf='URL'>link text</a>，然后将这个链接追加到列表repo_links的末尾。

生成的图形是可交互的，点击图形底端的项目名，可以访问响应项目在GitHub上的主页。

定制标记颜色

创建图形后，可使用update_打头的方法来定制其各个方面。前面使用了update_layout()方法，而update_traces()则可用来定制图形呈现的数据。

fig.update_traces(marker_color='SteelBlue', marker_opacity=0.6)

我们将条形改为更深的蓝色并且是半透明的

在Plotly中，trace指的是图形上的一系列数据。update_trace()方法接受大量的参数，其中一marker_打头的参数都会影响图形上的标记。这里将每个标记的颜色都设置成了‘SteelBlue’，你可以将参数marker_color设置为任何有具体名称的CSS颜色。我们还可将每个标记的不透明度都设置成0.6，不透明度1.0表示完全不透明，0表示完全透明。

深入了解Plotly和GitHub API

深入了解Plotly ，可阅读文章 Plotly Express in Python

要深入了解如何定制Plotly图形，可阅读文章Styling Plotly Express Figures in Python。

要深入地了解GitHub API，可参阅其文档。

标签：plt,入门,Python,repo,可视化,import,print,ax,dict
From： https://blog.csdn.net/a_bear_in_Spring/article/details/136937683

《Python从入门到实践》项目 数据可视化

生成数据

安装Matplotlib

绘制简单的折线图

修改标签文字和线条粗细

矫正绘图

使用内置样式

使用scatter()绘制散点图并设置样式

使用scatter()绘制一系列点

自动计算数据

定制刻度标记

定制颜色

使用颜色映射

自动保存绘图

练习

随机游走

创建RandomWalk类

选择方向

绘制随机游走图：

模拟多次随机游走

设置随机游走图的样式

给点着色

重新绘制起点和终点

隐藏坐标轴：

增加点的个数

调整尺寸以适应屏幕

练习

使用Plotly模拟掷骰子

安装Plotly

创建Die类

掷骰子

分析结果

绘制直方图

定制绘图

同时投掷两个骰子

进一步定制

同时投掷两个面数不同的骰子

下载数据

CSV文件格式

解析CSV文件头

打印文件头机器位置

提取并读取数据

绘制温度图

datetime模块

在途中添加日期

涵盖更长的时间

再绘制一个图形

给图中区域着色

错误检查

自娱自乐：

制作全球地震散点图：GeoJSON格式

地震数据

查看GeoJSON数据

创建地震列表

提取震级

提取位置数据

绘制地震散点图

指定数据的另一种方式

定制标记的尺寸：

定制标记的颜色

其他渐变

添加悬停文本

使用API

使用API

Git和GitHub

使用API调用请求数据

安装Requests

处理API响应

处理响应字典

概述最受欢迎的仓库

监控API的速率限制

使用Plotly可视化仓库

设置图形的样式

添加定制工具提示

添加可单击的链接

定制标记颜色

深入了解Plotly和GitHub API

相关文章

赞助商

阅读排行

《Python从入门到实践》项目数据可视化