首页 > 编程问答 >多重处理会导致 Python 崩溃,并给出一个错误:调用 fork() 时可能已在另一个线程中进行

多重处理会导致 Python 崩溃,并给出一个错误:调用 fork() 时可能已在另一个线程中进行

时间:2024-07-24 10:39:59浏览次数:18  
标签:python python-3.x multithreading macos

我对 Python 比较陌生,并试图为我的 for 循环实现一个多处理模块。

我在 img_urls 中存储了一个图像 url 数组,我需要下载并应用一些 Google 视觉。

if __name__ == '__main__':

    img_urls = [ALL_MY_Image_URLS]
    print("--- %s seconds ---" % (time.time() - start_time)) 

这是我的 runAll( ) 方法

def runAll(img_urls):
    num_cores = multiprocessing.cpu_count()

    print("Image URLS  {}",len(img_urls))
    if len(img_urls) > 2:
        numberOfImages = 0
        numberOfImages = 1

    start_timeProcess = time.time()

    pool = multiprocessing.Pool()
    end_timeProcess = time.time()
    print('\n Time to complete ', end_timeProcess-start_timeProcess)


def annotate(img_path):
    file =  requests.get(img_path).content
    print("file is",file)
    """Returns web annotations given the path to an image."""
    print('Process Working under ',os.getpid())
    image = types.Image(content=file)
    web_detection = vision_client.web_detection(image=image).web_detection

当我运行它并且 python 崩溃时,我收到此警告

objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

此错误消息表明正在使用 macOS 上的 multiprocessing 库遇到问题,而问题源于 fork() 与某些库(如请求库)交互的方式。 让我们分解一下原因以及如何解决该问题。


  • fork(): multiprocessing 库(默认情况下在 macOS 上)使用一种称为 fork() 的机制来创建新进程。 fork() 本质上是创建一个与父进程相同状态的子进程。
  • 线程和库不兼容性: 某些库(包括可能正在使用的用于网络的 requests )或 macOS 上的 GUI 框架,在分叉的进程中与线程和内部状态的处理方式不兼容。当一个进程使用 fork() 创建时,它会继承所有线程,但并非所有线程状态都以线程安全的方式复制,从而导致崩溃或意外行为。


  1. 使用“spawn”或“forkserver”启动方法:

  2. spawn: 此方法为每个进程启动一个全新的解释器,从而避免继承不兼容的线程状态。这是 macOS 上推荐的方法。

  3. forkserver: 此方法创建一个服务器进程来派生新的 Python 进程。它比“spawn”快,但设置起来更复杂。

可以使用 set_start_method 在代码中更改启动方法:

```python import multiprocessing

if name == ' main ': multiprocessing.set_start_method("spawn") # 或 "forkserver" # ... 的代码的其余部分 ... ```

  1. 将有问题的代码移至主进程之后:

  2. 如果可能,请重构代码,以便调用 requests 或其他可能导致问题的库的代码 子进程中执行,而不是在分叉发生之前执行。这确保了有问题的初始化发生在子进程上下文中。


import multiprocessing
import time
import requests
from google.cloud import vision
from google.cloud.vision import types

# 初始化 Google Vision 客户端
vision_client = vision.ImageAnnotatorClient()

def annotate(img_path):
    print('进程在以下进程下工作:', os.getpid())
    file = requests.get(img_path).content  # 在子进程中执行请求
    image = types.Image(content=file)
    web_detection = vision_client.web_detection(image=image).web_detection
    # 处理 web_detection...

def runAll(img_urls):
    # ...(代码的其余部分)

if __name__ == '__main__':
    img_urls = [ALL_MY_Image_URLS]
    print("--- %s seconds ---" % (time.time() - start_time))


  • 库文档: 如果使用的是除 requests 之外的其他库,请查阅其文档以了解它们是否对分叉进程有任何特定说明或限制。
  • 调试: 如果问题仍然存在,请尝试使用调试器来查明导致崩溃的确切代码行。这可以帮助查明是否有任何意外交互。


From: 50168647


  • Python编程时输入操作数错误
  • Python入门知识点 7--散列类型与字符编码
    1、初识散列类型(无序序列)数据类型分为3种:   前面已经学过了两种类型   1.数值类型:int/float/bool只能存储单个数据      2.序列类型:str/list/tuple,有序的存储多个数据--有序类型,有下标,可以进行索引切片步长操作          3.散列类型......
  • Python入门知识点 6--序列类型的方法
  • Python项目流程图
  • 使用 mypy 时Python中的继承和多态性不起作用
  • Python函数获取匹配和错误记录
  • python,替换标点符号但保持特殊单词完整的最佳方法
  • Python 检测 USB 设备 - IDLE 和 CMD 解释器之间的不同结果
  • Python查找字符串的CRC32
  • 使用python,如何创建重复的工作时间表