我对 Python 比较陌生,并试图为我的 for 循环实现一个多处理模块。
我在 img_urls 中存储了一个图像 url 数组,我需要下载并应用一些 Google 视觉。
if __name__ == '__main__':
img_urls = [ALL_MY_Image_URLS]
runAll(img_urls)
print("--- %s seconds ---" % (time.time() - start_time))
这是我的 runAll( ) 方法
def runAll(img_urls):
num_cores = multiprocessing.cpu_count()
print("Image URLS {}",len(img_urls))
if len(img_urls) > 2:
numberOfImages = 0
else:
numberOfImages = 1
start_timeProcess = time.time()
pool = multiprocessing.Pool()
pool.map(annotate,img_urls)
end_timeProcess = time.time()
print('\n Time to complete ', end_timeProcess-start_timeProcess)
print(full_matching_pages)
def annotate(img_path):
file = requests.get(img_path).content
print("file is",file)
"""Returns web annotations given the path to an image."""
print('Process Working under ',os.getpid())
image = types.Image(content=file)
web_detection = vision_client.web_detection(image=image).web_detection
report(web_detection)
当我运行它并且 python 崩溃时,我收到此警告
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
此错误消息表明正在使用 macOS 上的
multiprocessing
库遇到问题,而问题源于
fork()
与某些库(如请求库)交互的方式。 让我们分解一下原因以及如何解决该问题。
问题根源
-
fork():
multiprocessing
库(默认情况下在 macOS 上)使用一种称为fork()
的机制来创建新进程。fork()
本质上是创建一个与父进程相同状态的子进程。 -
线程和库不兼容性:
某些库(包括可能正在使用的用于网络的
requests
)或 macOS 上的 GUI 框架,在分叉的进程中与线程和内部状态的处理方式不兼容。当一个进程使用fork()
创建时,它会继承所有线程,但并非所有线程状态都以线程安全的方式复制,从而导致崩溃或意外行为。
解决方案
-
使用“spawn”或“forkserver”启动方法:
-
spawn: 此方法为每个进程启动一个全新的解释器,从而避免继承不兼容的线程状态。这是 macOS 上推荐的方法。
- forkserver: 此方法创建一个服务器进程来派生新的 Python 进程。它比“spawn”快,但设置起来更复杂。
可以使用
set_start_method
在代码中更改启动方法:
```python import multiprocessing
if name == ' main ': multiprocessing.set_start_method("spawn") # 或 "forkserver" # ... 的代码的其余部分 ... ```
-
将有问题的代码移至主进程之后:
-
如果可能,请重构代码,以便调用
requests
或其他可能导致问题的库的代码 在 子进程中执行,而不是在分叉发生之前执行。这确保了有问题的初始化发生在子进程上下文中。
示例(使用“spawn”):
import multiprocessing
import time
import requests
from google.cloud import vision
from google.cloud.vision import types
# 初始化 Google Vision 客户端
vision_client = vision.ImageAnnotatorClient()
def annotate(img_path):
"""返回给定图像路径的网络注释。"""
print('进程在以下进程下工作:', os.getpid())
file = requests.get(img_path).content # 在子进程中执行请求
image = types.Image(content=file)
web_detection = vision_client.web_detection(image=image).web_detection
# 处理 web_detection...
def runAll(img_urls):
# ...(代码的其余部分)
if __name__ == '__main__':
multiprocessing.set_start_method("spawn")
img_urls = [ALL_MY_Image_URLS]
runAll(img_urls)
print("--- %s seconds ---" % (time.time() - start_time))
其他提示:
-
库文档:
如果使用的是除
requests
之外的其他库,请查阅其文档以了解它们是否对分叉进程有任何特定说明或限制。 - 调试: 如果问题仍然存在,请尝试使用调试器来查明导致崩溃的确切代码行。这可以帮助查明是否有任何意外交互。
通过切换到“spawn”启动方法或重构代码以避免在分叉之前进行有问题的初始化,应该能够解决此多处理问题。
标签:python,python-3.x,multithreading,macos From: 50168647