我想从 http 流式下载 .txt.gz 文件并流式上传到 s3 存储桶,我已经做到了这一点,但它不起作用,我错过了什么?
from smart_open import open as sopen
chunk_size = (16 * 1024 * 1024)
http_url = 'http://someurl'
with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:
with sopen('s3://bucket/filename.txt.gz', 'wb') as fout:
while True:
buf = fin.read(chunk_size)
if not buf:
break
fout.write(chunk_size)
在代码中犯了一个小错误。正在将
chunk_size
写入
fout
,而不是
buf
。应该将读取的缓冲区 (
buf
) 写入输出文件。
以下是更正后的代码:
from smart_open import open as sopen
chunk_size = (16 * 1024 * 1024)
http_url = 'http://someurl'
with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:
with sopen('s3://bucket/filename.txt.gz', 'wb') as fout:
while True:
buf = fin.read(chunk_size)
if not buf:
break
fout.write(buf) # <-- 将 buf 写入,而不是 chunk_size
通过更改为
fout.write(buf)
,将读取的来自
http_url
的数据块写入
s3://bucket/filename.txt.gz
。