当我尝试使用请求库获取 API 时出现以下错误。
Traceback (most recent call last):
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connection.py", line 196, in _new_conn
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/util/connection.py", line 85, in create_connection
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/util/connection.py", line 73, in create_connection
TimeoutError: [Errno 110] Connection timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connectionpool.py", line 789, in urlopen
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connectionpool.py", line 490, in _make_request
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connectionpool.py", line 466, in _make_request
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connectionpool.py", line 1095, in _validate_conn
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connection.py", line 615, in connect
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connection.py", line 211, in _new_conn
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0xffff812d8a60>: Failed to establish a new connection: [Errno 110] Connection timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/requests/adapters.py", line 667, in send
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/connectionpool.py", line 843, in urlopen
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='vpic.nhtsa.dot.gov', port=443): Max retries exceeded with url: /api/vehicles/getallmakes?format=json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff812d8a60>: Failed to establish a new connection: [Errno 110] Connection timed out'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/test.py", line 8, in <module>
response = requests.request("GET", url, headers=headers, data=payload)
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/requests/api.py", line 59, in request
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/requests/sessions.py", line 589, in request
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/requests/sessions.py", line 703, in send
File "/tmp/spark-39775710-130a-4403-9182-c557003f351b/lib.zip/requests/adapters.py", line 700, in send
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='vpic.nhtsa.dot.gov', port=443): Max retries exceeded with url: /api/vehicles/getallmakes?format=json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff812d8a60>: Failed to establish a new connection: [Errno 110] Connection timed out'))
我拥有的代码
import requests
def call_api():
url = "https://vpic.nhtsa.dot.gov/api/vehicles/getallmakes?format=json"
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
相同的代码在 AWS Lambda 中工作,甚至在我的本地,也不知道 EMR 无服务器中出了什么问题。|| |注意:我没有使用任何 VPC 或子网配置
Note: I am not using any VPC or subnets configs
在 EMR Serverless 中遇到“连接超时”错误,而相同的代码在 Lambda 和本地环境中有效,这表明 EMR Serverless 环境无法访问公共互联网。默认情况下,EMR Serverless 不会向公共互联网开放,因此需要进行网络配置才能实现。
以下是如何解决此问题的步骤:
选项 1:使用 VPC 端点(推荐)
-
创建 VPC 端点:
为
*.amazonaws.com
(用于 S3 和其他 AWS 服务)和vpic.nhtsa.dot.gov
创建 VPC 端点。这将允许的 EMR Serverless 环境通过 VPC 网络访问这些服务,而无需暴露于公共互联网。 - 将端点添加到安全组: 将 VPC 端点添加到 EMR Serverless 环境使用的安全组中。
- 更新 EMR Serverless 应用程序: 使用 VPC 端点配置更新的 EMR Serverless 应用程序,以使用新的网络设置。
选项 2:使用 NAT 网关(不太安全)
- 创建 NAT 网关: 在的 VPC 中创建一个 NAT 网关,并将其连接到可以访问互联网的公共子网。
- 更新路由表: 将 EMR Serverless 环境使用的子网的路由表更新为使用 NAT 网关进行互联网流量。
选项 3:使用互联网网关(最不安全)
警告: 此选项将的 EMR Serverless 环境直接暴露于互联网,因此不建议用于生产工作负载。
- 创建互联网网关: 在的 VPC 中创建一个互联网网关。
- 更新路由表: 将 EMR Serverless 环境使用的子网的路由表更新为使用互联网网关进行互联网流量。
完成网络配置后,请确保的 EMR Serverless 环境具有以下权限:
- 访问 S3: EMR Serverless 需要访问 S3 才能存储应用程序代码、日志和其他数据。
- 访问 VPC 端点: 如果选择使用 VPC 端点,则需要授予 EMR Serverless 访问它们的权限。
建议使用 VPC 端点,因为它提供了一种更安全、更可靠的方式来连接到 VPC 外部的服务。
代码示例(使用 VPC 端点):
import requests
import boto3
def call_api():
# 创建 VPC 端点客户端
ec2 = boto3.client('ec2')
# 获取默认 VPC ID
response = ec2.describe_vpcs()
vpc_id = response.get('Vpcs', [{}])[0].get('VpcId', '')
# 创建 VPC 端点
endpoint = ec2.create_vpc_endpoint(
VpcEndpointType='Gateway',
VpcId=vpc_id,
ServiceName='com.amazonaws.us-east-1.s3',
RouteTableIds=[
# 将的路由表 ID 添加到这里
]
)
# 获取 VPC 端点 DNS 名称
endpoint_dns = endpoint['VpcEndpoint']['DnsEntries'][0]['DnsName']
# 使用 VPC 端点 DNS 名称访问 S3
url = "https://vpic.nhtsa.dot.gov/api/vehicles/getallmakes?format=json"
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
# 调用 API 函数
call_api()
请注意,这只是一个示例,需要根据的特定环境和需求对其进行调整。
标签:python,amazon-web-services,request,amazon-emr From: 78802071