scrapyd_client deploy 提供了对于开发的spider 的打包,同时push 到scrapyd server 中,因为python 的特殊性,我们开发的spider 可能有依赖,scrapyd_client 会结合实际命令打包应用为是否包含依赖的egg 包
egg 包处理
对于egg 包处理,scrapyd_client deploy 会先检查spider 项目中是否包含setup.py 没有的话,会自己生成一个
- setup.py 模版
settings 使用了spider 项目中的settings , 实际上就是一个scrapy 的扩展
_SETUP_PY_TEMPLATE = """
# Automatically created by: scrapyd-deploy
from setuptools import setup, find_packages
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = %(settings)s']},
)
""".lstrip()
- 构建egg
包含了是否包含依赖,对于包含依赖的基于了uberegg 实现一个uber egg 的打包,否则使用默认的
构建命令
实际上就是python setup.py bdist_egg
或者python setup.py bdist_uberegg
def _build_egg(opts):
closest = closest_scrapy_cfg()
os.chdir(os.path.dirname(closest))
if not os.path.exists("setup.py"):
settings = get_config().get("settings", "default")
_create_default_setup_py(settings=settings)
d = tempfile.mkdtemp(prefix="scrapydeploy-")
o = open(os.path.join(d, "stdout"), "wb")
e = open(os.path.join(d, "stderr"), "wb")
if opts.include_dependencies:
_log("Including dependencies from requirements.txt")
if not os.path.isfile("requirements.txt"):
_fail("Error: Missing requirements.txt")
command = "bdist_uberegg"
else:
command = "bdist_egg"
retry_on_eintr(
check_call,
[sys.executable, "setup.py", "clean", "-a", command, "-d", d],
stdout=o,
stderr=e,
)
o.close()
e.close()
egg = glob.glob(os.path.join(d, "*.egg"))[0]
return egg, d
- push scrapyd 服务
我只说明http 处理部分,核心是对于通过addversion.json endpoint 进行数据post 处理
def _upload_egg(target, eggpath, project, version):
with open(eggpath, "rb") as f:
eggdata = f.read()
data = {
"project": project,
"version": version,
"egg": ("project.egg", eggdata),
}
body, content_type = encode_multipart_formdata(data)
url = _url(target, "addversion.json")
headers = {
"Content-Type": content_type,
"Content-Length": str(len(body)),
}
req = Request(url, body, headers)
_add_auth_header(req, target)
_log('Deploying to project "%s" in %s' % (project, url))
return _http_post(req)
说明
以上是关于scrapyd_client deplo 处理的简单说明,后边会结合scrapyd 的spider 调度说明内部运行
参考资料
https://github.com/scrapy/scrapyd-client
https://packaging.python.org/en/latest/specifications/entry-points/