使用 DuckDB 访问 Minio
以对象存储为中心的世界的数据库
数据库一直是基于 SAN 的块存储和基于 NAS 的文件存储的主要工作负载,OLAP 数据库领域将在未来几年内转向对象存储优先的定位
在 DuckDB 实例的生命周期内存储在内存中。持久密码以未加密的二进制格式存储在 ~/.duckdb/stored_secrets目录中。
在启动 DuckDB 时,会从该目录读取持久密码并自动加载。
SELECT extension_name, installed, description FROM duckdb_extensions();
The httpfs filesystem is tested with AWS S3, Minio, Google Cloud
1.安装 Installing DuckDB
2.安装 Installing httpfs 扩展
https://extensions.duckdb.org/v1.0.0/linux_amd64_gcc4/httpfs.duckdb_extension.gz
gzip -dk httpfs.duckdb_extension.gz
INSTALL 'path/to/httpfs.duckdb_extension';
3.安装 Installing MinIO
设置 s3_use_ssl true的话是https false 是http
4.执行命令
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-east-1';
SET s3_url_style='path';
SET s3_endpoint='play.min.io:9000';
SET s3_access_key_id='***' ;
SET s3_secret_access_key='***';
CREATE TABLE bookings AS SELECT * FROM read_csv_auto('s3://bookings/hotel_bookings.csv', all_varchar=1);
SELECT COUNT(*) AS TotalRows from bookings;
python执行
import pandas as pd
import duckdb
query = """
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-west-2';
SET s3_access_key_id='key';
SET s3_secret_access_key='secret';
#SET s3_use_ssl = false;
SELECT FROM read_parquet('s3://bucket/folder/file.parquet')
"""
cursor = duckdb.connect()
cursor.execute(query).df()
参考
https://blog.minio.org.cn/databases-for-object-storage/
https://duckdb.org/docs/extensions/httpfs/s3api.html
https://blog.min.io/duckdb-and-minio-for-a-modern-data-stack/
https://github.com/duckdb/duckdb/tree/main/extension
https://duckdb.org/docs/extensions/working_with_extensions.html
https://www.modb.pro/db/1759396805863362560
https://www.cnblogs.com/ytwang/p/18233176
数据库_duckdb_本地访问远程数据_ aws s3 https://www.cnblogs.com/ytwang/p/17359906.html
DuckDB and MinIO for a Modern Data Stack
标签:SET,Minio,httpfs,s3,数据库,DuckDB,duckdb,https
From: https://www.cnblogs.com/ytwang/p/18303358