禁用 GIL 的 Python 3.13 非常慢

标签：python performance gil pep python-3.12

我对 python 3.12.0 与使用 3.13.0b3 标志编译的 python --disable-gil 进行了简单的性能测试。该程序使用 ThreadPoolExecutor 或 ProcessPoolExecutor 执行斐波那契数列的计算。引入禁用 GIL 的 PEP 文档表示，存在一些开销，主要是由于有偏差的引用计数和每个对象锁定（ https://peps.python.org/pep-0703/#performance|| |）。但它表示 pyperformance 基准套件的开销约为 5-8%。我的简单基准测试显示了性能的显着差异。事实上，没有 GIL 的 python 3.13 会利用所有 CPU 与a 但它比带有GIL的python 3.12慢得多。根据 CPU 利用率和运行时间，我们可以得出结论，与 3.12 相比，Python 3.13 的时钟周期数增加了数倍。 ThreadPoolExecutor 程序代码：

测试结果：

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import datetime
from functools import partial
import sys
import logging
import multiprocessing

logging.basicConfig(
    format='%(levelname)s: %(message)s',
)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
cpus = multiprocessing.cpu_count()
pool_executor = ProcessPoolExecutor if len(sys.argv) > 1 and sys.argv[1] == '1' else ThreadPoolExecutor
python_version_str = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}'
logger.info(f'Executor={pool_executor.__name__}, python={python_version_str}, cpus={cpus}')


def fibonacci(n: int) -> int:
    if n < 0:
        raise ValueError("Incorrect input")
    elif n == 0:
        return 0
    elif n == 1 or n == 2:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)

start = datetime.datetime.now()

with pool_executor(8) as executor:
    for task_id in range(30):
        executor.submit(partial(fibonacci, 30))

    executor.shutdown(wait=True)

end = datetime.datetime.now()
elapsed = end - start
logger.info(f'Elapsed: {elapsed.total_seconds():.2f} seconds')

任何人都可以解释为什么我会遇到这种情况将开销与 pyperformance 基准套件中的开销进行比较时有这么大的差异吗？

# TEST Linux 5.15.0-58-generic, Ubuntu 20.04.6 LTS

INFO: Executor=ThreadPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 10.54 seconds

INFO: Executor=ProcessPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 4.33 seconds

INFO: Executor=ThreadPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.48 seconds

INFO: Executor=ProcessPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.03 seconds

Can anyone explain why do I experience such a difference when comparing the overhead to the one from pyperformance benchmark suit?

當然可以！我會解釋為何會看到 Python 3.13（無 GIL）與 3.12 在效能上有如此大的差異，並針對的基準測試提供一些見解。

觀察到的效能差異主要歸因於兩個因素：

無 GIL 開銷： 雖然 PEP 703 提及 pyperformance 基準測試中開銷約為 5-8%，但重要的是要了解此數字會因工作負載而異。開銷主要來自兩個方面：
- 偏向參考計數： 在無 GIL 的環境中，每次對物件的參考計數進行遞增或遞減時，都需要額外的原子操作來防止競爭條件。這會增加一些開銷，特別是在涉及許多物件建立和刪除的工作負載中。
- 細粒度鎖定： 為了保護對共用資源的存取，無 GIL 的 Python 需要更頻繁地使用細粒度鎖定。這會在執行緒間產生更多競爭，進而導致效能下降，尤其是在高度爭用的工作負載中。
基準測試選擇： 選擇的基準測試（計算費氏數列）對於展示無 GIL 的優點並非理想之選。這是因為：
- CPU 密集型： 計算費氏數列是一個 CPU 密集型任務，這意味著它在大部分時間裡都在進行計算，而不是等待 I/O 或其他資源。在這種情況下，無 GIL 的優點（能夠在多個執行緒上真正並行執行 Python 程式碼）受到開銷的影響，因為執行緒會花費更多時間在同步和管理資料存取上，而不是實際進行計算。
- 遞迴開銷： 的費氏數列實作使用遞迴，這會產生大量函數呼叫。在無 GIL 的環境中，每次函數呼叫都會導致額外的開銷，因為需要管理執行緒的區域資料和堆疊。

為什麼 pyperformance 顯示出不同的結果？

pyperformance 基準測試套件包括範圍廣泛的工作負載，其中一些工作負載可以從無 GIL 中受益，而另一些則不能。整體開銷為 5-8% 是所有這些工作負載的平均值。的基準測試專注於一個特定工作負載，該工作負載恰好放大了無 GIL 的開銷。

如何改進基準測試？

為了更準確地評估 Python 3.13（無 GIL）的效能，請考慮以下因素：

使用更具代表性的工作負載： 選擇更能反映的實際應用程式的工作負載。如果主要處理 I/O 密集型任務或使用 C 擴充功能執行 CPU 密集型計算，則更有可能看到無 GIL 的優點。
比較單一執行緒效能： 在比較多執行緒效能之前，請務必測量單一執行緒效能。這將幫助了解 GIL 本身的開銷，以及無 GIL 是否確實提高了效能。
使用分析工具： 使用分析工具來識別程式碼中的瓶頸。這將幫助了解無 GIL 開銷的來源，以及如何最佳化程式碼。

請記住，無 GIL 仍然是一項正在進行的工作，預計效能會隨著時間推移而有所提高。

标签：python,performance,gil,pep,python-3.12
From： 78828192

禁用 GIL 的 Python 3.13 非常慢

相关文章

赞助商

阅读排行