[ERROR: Failed to start a transaction to create a new pipeline and a new pipeline version: dial tcp: lookup mysql on 10.96.0.10:53: no such host","]
>>> kfp.Client().upload_pipeline("/home/maye/pipeline_wafer_distribute.yaml", "pipeline_wafer_ps_worker_mount_pv", "wafer pipeline with distributed training,parameter server srategy.")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp/_client.py", line 1232, in upload_pipeline
response = self._upload_api.upload_pipeline(
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 69, in upload_pipeline
return self.upload_pipeline_with_http_info(uploadfile, **kwargs) # noqa: E501
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 163, in upload_pipeline_with_http_info
return self.api_client.call_api(
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 364, in call_api
return self.__call_api(resource_path, method,
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 188, in __call_api
raise e
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 181, in __call_api
response_data = self.request(
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 407, in request
return self.rest_client.POST(url,
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/rest.py", line 265, in POST
return self.request("POST", url,
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/rest.py", line 224, in request
raise ApiException(http_resp=r)
kfp_server_api.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id': '46804f38-da18-4357-927a-b78b4e8b7574', 'Cache-Control': 'no-cache, private', 'Content-Length': '561', 'Content-Type': 'text/plain; charset=utf-8', 'Date': 'Wed, 14 Feb 2024 09:09:22 GMT'})
HTTP response body: {"error_message":"Failed to create a pipeline and a pipeline version: Failed to create a pipeline and a pipeline version: InternalServerError: Failed to start a transaction to create a new pipeline and a new pipeline version: dial tcp: lookup mysql on 10.96.0.10:53: no such host","error_details":"Failed to create a pipeline and a pipeline version: Failed to create a pipeline and a pipeline version: InternalServerError: Failed to start a transaction to create a new pipeline and a new pipeline version: dial tcp: lookup mysql on 10.96.0.10:53: no such host"}
>>>
[ANALYSIS]
Error: 'dial tcp: lookup mysql on 10.96.0.10:53: no such host"', this error is saying that no host 'mysql' is found when looking up on DNS (domain name service) 10.96.0.10:53, which is coredns of the kubernetes cluster. Note that if resolving domain name (namely host name) timeout, this error will also be raised. And in this example, service mysql is running ok, and service mysql is on the same computer with kfp.Client().upload_pipeline(), so the very likely cause of the error is domain name resolving timeout, maybe coredns temporally has no enough resource to process the request of resolving the domain name, not connection timeout, since on one computer connection should be fast.
[SOLUTION]
retry. then ok:
>>> kfp.Client().upload_pipeline("/home/maye/pipeline_wafer_distribute.yaml", "pipeline_wafer_ps_worker_mount_pv", "wafer pipeline with distributed training,parameter server srategy.")
{'created_at': datetime.datetime(2024, 2, 14, 9, 31, 19, tzinfo=tzutc()),
'default_version': {'code_source_url': None,
'created_at': datetime.datetime(2024, 2, 14, 9, 31, 19, tzinfo=tzutc()),
'description': 'wafer pipeline with distributed '
'training,parameter server srategy.',
'id': '82508c98-3349-4d42-bdfd-3d131615e7ea',
'name': 'pipeline_wafer_ps_worker_mount_pv',
'package_url': None,
'parameters': [{'name': 'pipeline-root',
'value': '/tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema'}],
'resource_references': [{'key': {'id': '7cbbd9e1-6657-4fe4-ac6c-e11beeaa0d4f',
'type': 'PIPELINE'},
'name': None,
'relationship': 'OWNER'}]},
'description': 'wafer pipeline with distributed training,parameter server '
'srategy.',
'error': None,
'id': '7cbbd9e1-6657-4fe4-ac6c-e11beeaa0d4f',
'name': 'pipeline_wafer_ps_worker_mount_pv',
'parameters': [{'name': 'pipeline-root',
'value': '/tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema'}],
'resource_references': None,
'url': None}
>>>
标签:pipeline,transaction,wafer,kfp,upload,server,api,new
From: https://www.cnblogs.com/zhenxia-jiuyou/p/18015335