使用 smart_open 从 http 下载 .gz 流并上传到 s3 存储桶

时间：2024-07-29 05:25:13浏览次数：11

标签：python amazon-s3 boto3 smart-open

我想从 http 流式下载 .txt.gz 文件并流式上传到 s3 存储桶，我已经做到了这一点，但它不起作用，我错过了什么？

from smart_open import open as sopen

chunk_size = (16 * 1024 * 1024)
http_url = 'http://someurl'

with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:    
    with sopen('s3://bucket/filename.txt.gz', 'wb') as fout:                

                    while True:
                        buf = fin.read(chunk_size)
                        if not buf:
                            break
                        fout.write(chunk_size)

在代码中犯了一个小错误。正在将 chunk_size 写入 fout ，而不是 buf 。应该将读取的缓冲区 ( buf ) 写入输出文件。

以下是更正后的代码：

from smart_open import open as sopen

chunk_size = (16 * 1024 * 1024)
http_url = 'http://someurl'

with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:    
    with sopen('s3://bucket/filename.txt.gz', 'wb') as fout:                

                    while True:
                        buf = fin.read(chunk_size)
                        if not buf:
                            break
                        fout.write(buf)  # <-- 将 buf 写入，而不是 chunk_size

通过更改为 fout.write(buf) ，将读取的来自 http_url 的数据块写入 s3://bucket/filename.txt.gz 。

标签：python,amazon-s3,boto3,smart-open
From： 75866278

在Python中解压文件
我通读了zipfile文档，但不明白如何解压缩文件，只了解如何压缩文件。如何将zip文件的所有内容解压缩到同一目录中？importzipfilewithzipfile.ZipFile('your_zip_file.zip','r')aszip_ref:zip_ref.extractall('target_directory')将......
如何在Python中从RSA公钥中提取N和E？
我有一个RSA公钥，看起来像-----BEGINPUBLICKEY-----MIIBIDANBgkqhkiG9w0BAQEFAAOCAQ0AMIIBCAKCAQEAvm0WYXg6mJc5GOWJ+5jkhtbBOe0gyTlujRER++cvKOxbIdg8So3mV1eASEHxqSnp5lGa8R9Pyxz3iaZpBCBBvDB7Fbbe5koVTmt+K06o96ki1/4NbHGyRVL/x5fFiVuTVfmk+GZNakH5dXDq0fwvJyVmUtGYA......
Swagger、Docker、Python-Flask: : https://editor.swagger.io/ 生成服务器 python-fl
在https://editor.swagger.io/上您可以粘贴一些json/yaml。我正在将此作为JSON进行测试（不要转换为YAML）：{"swagger":"2.0","info":{"version":"1.0","title":"OurfirstgeneratedRES......
使用 Matplotlib 的 Python 代码中出现意外的控制流
Ubuntu22.04上的此Python3.12代码的行为符合预期，除非我按q或ESC键退出。代码如下：importnumpyasnp,matplotlib.pyplotaspltfrompathlibimportPathfromcollectionsimportnamedtuplefromskimage.ioimportimreadfrommatplotlib.widgets......
参考 - Python 类型提示
这是什么？这是与在Python中使用类型提示主题相关的问题和答案的集合。这个问题本身就是一个社区维基；欢迎大家参与维护。这是为什么？Python类型提示是一个不断增长的话题，因此许多（可能的）新问题已经被提出，其中许多甚至已经有了答案。该集合有助于查找现有内容。范......
我的 Python 程序中解决 UVa 860 的运行时错误 - 熵文本分析器
我正在尝试为UVa860编写一个解决方案，但是当我通过vJudge发送它时，它一直显示“运行时错误”。fromsysimportstdinimportmathdefmain():end_of_input=Falselambda_words=0dictionary={}text_entropy=0relative_entropy=0whilenotend_of_in......
Python进度条
当我的脚本正在执行某些可能需要时间的任务时，如何使用进度条？例如，一个需要一些时间才能完成并在完成后返回True的函数。如何在函数执行期间显示进度条？请注意，我需要实时显示进度条，所以我不知道该怎么办。我需要thread为此吗？我不知道。现在在执行函数......
此 Python 代码给出了超出时间限制的错误。由于其中使用的输入或输出方法而在其中传递
N=int(input())L1=input()L=L1.split()s=set(L)d={}foreins:d[e]=L.count(e)print(d)max_value=max(d.values())print(max_value)L=list(d.values())print(L)res=L.count(max_value)print(res)/在提供正常输入时，它运行良好，但在提......
@staticmethod 在 Python 中意味着什么？
我正在使用Python学习OOP。我想知道@staticmethod在OOP中到底做了什么。为什么我应该/不应该使用它？classCar:@staticmethoddefstart():print("carstarted")defstop():print("carstopped")当然，让我们来分解一下Pyth......
如何在Anaconda基础环境中更新Python？
如何在Anaconda基础环境中更新Python？Python是否打算在基础环境中进行升级，还是应该完全删除并重新安装Anaconda？任何版本最终都会失去支持，所以应该有一些解决方案。到目前为止我已经尝试过，并没有导致Python更新：condaupdate--allcondaupdate......

使用 smart_open 从 http 下载 .gz 流并上传到 s3 存储桶

相关文章

赞助商

阅读排行