首页 > 数据库 >postgresql/lightdb批量导入导出数据系列copy/ltuldr/ltldr/lt_bulkload及最佳实践推荐mysql到lightdb迁移

postgresql/lightdb批量导入导出数据系列copy/ltuldr/ltldr/lt_bulkload及最佳实践推荐mysql到lightdb迁移

时间:2023-06-11 10:01:03浏览次数:51  
标签:copy postgresql lightdb zjh big ltldr COPY table csv

文件位于服务器上,这就限制了使用范围。为此,对于导出,lightdb提供了高性能导出版本ltuldr。对于导入,lightdb在23.1之前提供lt_bulkload,见下文;从23.1开始,支持和oracle sql*loader对应的ltldr。

  copy可用于快速导入和导出数据,主要用途如下:

  • The COPY command moves data between PostgreSQL tables and standard file system files.
  • COPY TO copies the contents of the table to the file.
  • COPY TO can also copy the results of the SELECT query. That is, if the column list is specified, COPY TO only copies the data in the specified columns to the file.
  • The COPY command instructs the PostgreSQL server to read from or write to the file directly. Therefore, the file must be accessible to the PostgreSQL user.
  • COPY FROM copies the data from the file to the table.
  • When using the COPY FROM command, each field in the file is inserted sequentially to the specified column. Table columns not specified in the COPY FROM column list get their default values.
  • It is necessary to grant SELECT privilege on the table read by COPY TO, and the INSERT privilege in the table where the values are inserted with COPY FROM.
  • COPY TO can only be used with tables, not views. However, if we want to copy the contents of the view, we must feed the COPY command with the sql query.


1



(COPY(SELECT *FROM country)TOlist_countries.copy’;).


  • Files named in the COPY command are read or written directly by the server, not by the client application. Therefore, it must be located on or accessible to the database server machine, not the client either.
  • We shouldn’t confuse COPY with \copy in psql. \copy calls COPY FROM STDIN or COPY TO STDOUT and then retrieves and stores the data from a file accessible by the psql client. Therefore, file accessibility and access rights depend on the client rather than the server when using \copy.

copy典型用法:

-- 来自控制台,lt_restore的做法
zjh@postgres=# copy big_table from stdin(delimiter ' ');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1 abc
>> 2 bcd
>> \.
COPY 2

-- 来自程序,通常用于gzip直接压缩,不经过中间文件
zjh@postgres=# copy big_table from program 'echo -e "1 abc\\n2 bcsfws"' (delimiter ' ');
COPY 2
zjh@postgres=# select * from big_table ;
 id |   v    
----+--------
  1 | abc
  2 | bcd
  1 | abc
  1 | abc
  2 | bcsfws
(5 rows)

 

  copy from除了直接导入数据外,还能包含where子句进行数据过滤,如下:

zjh@postgres=# create table big_table(id int, v varchar(100));
CREATE TABLE

zjh@postgres=# insert into big_table select i, i || '-' || i from generate_series(1,100000) i;
INSERT 0 100000
zjh@postgres=# exit
[zjh@hs-10-20-30-193 ~]$ ltsql -p14000 postgres
ltsql (13.3-22.2)
Type "help" for help.

zjh@postgres=# \copy ( select * from big_table) to '/home/zjh/big_table.csv' delimiter '|' csv header;
COPY 100000
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
zjh@postgres=# \timing on
Timing is on.
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 44.725 ms
zjh@postgres=#
zjh@postgres=# \timing on
Timing is on.
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 45.757 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true);
COPY 100000
Time: 41.660 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true);
COPY 100000
Time: 41.146 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 42.736 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 42.753 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where id % 100 != 99;
COPY 99000
Time: 42.475 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v like '99%';
COPY 1111
Time: 28.995 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%';
COPY 98889
Time: 43.442 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where v not like '99%';
COPY 98889
Time: 40.352 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%';
COPY 98889
Time: 60.274 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where lower(v) not like '99%';   -- 不同函数对性能影响很大
COPY 98889
Time: 61.503 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%';  -- 不同函数对性能影响大
COPY 98889
Time: 53.131 ms
zjh@postgres=# \copy big_table from '/home/zjh/big_table.csv' with (FORMAT csv, DELIMITER '|', HEADER true) where substr(v,1,10) not like '99%';
COPY 98889
Time: 49.655 ms
zjh@postgres=#

pg_bulkload是没有该特性的。如下:

postgresql/lightdb批量导入导出数据系列copy/ltuldr/ltldr/lt_bulkload及最佳实践推荐mysql到lightdb迁移_oracle

其filter函数在记录不满足要求的时候返回null记录,而不是丢弃该记录,所以并不符合该语义。

无论是copy还是pg_bulkload,相比oracle sql loader,都存在很大的不足,常见的特性包括如下:

  • 基于绝对位置的分隔符
  • pg_bulkload不支持过滤记录
  • pg_bulkload/copy都不支持对字段调用函数
  • 不支持replace
  • copy不支持parallel/direct等特性
  • 不支持指定列的默认值
  • 缓冲大小优化等

lightdb 23.1中完整的支持上述特性,通过ltldr提供。

当前版本可以通过insert values多值实现该特性。如下:

insert into big_table select * from (VALUES(1, 200000, 1.2), (2, 400000, 1.4)) AS v (depno, target, increase) where depno = 1;

需要注意,pg不支持超过1000个绑定变量值,这极大地限制了insert values多值的价值,甚至不如jdbc batch的优化。 

此时,可以考虑使用file_fdw+nfs实现,如下:

http://www.light-pg.com/docs/lightdb/13.3-22.2/file-fdw.html

http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-values.html

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

http://www.light-pg.com/docs/lightdb/13.3-22.2/sql-copy.html

https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-sql-loader-field-list-contents.html#GUID-83FF6EDC-C7F8-4F29-8994-59153BE31924 # 应用SQL函数到字段

https://www.depesz.com/2019/02/03/waiting-for-postgresql-12-allow-copy-from-to-filter-data-using-where-conditions/

https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/moving-data-with-postgresql-copy-and-copy-commands/ba-p/1561266

https://wiki.postgresql.org/wiki/COPY

http://www.light-pg.com/docs/lightdb/current/sql-copy.html

javascript:void(0) copy from/to使用示例

https://pgloader.readthedocs.io/en/latest/index.html 支持从mysql/pg/txt/dbf导入postgresql,底层也是COPY实现

mysql到lightdb迁移

总结


目标库

推荐

csv txt

lightdb-x

lt_bulkload/ltldr

lt_dump大量小文件恢复

lightdb-x

lt_restore根据exclude_schema并行恢复

oracle

lightdb-x

大量小表:ora2pg

小量大表:sqluldr2导出csv,ltldr/lt_bulkload导入

mysql/mariadb

lightdb-x

大量小表:ltloader

小量大表:select into outfile导出csv,ltldr/lt_bulkload导入

postgresql/lightdb-x

lightdb-x

lt_dump/lt_restore

postgresql/lightdb-x

lightdb-a

大量小表:postgresql_fdw/ltloader

小量大表:ltuldr导出csv,ltldr/lt_bulkload导入

要注意修改分片键确保性能佳

oracle

lightdb-a

大量小表:ora2pg

小量大表:小量大表:sqluldr2导出csv,ltldr/lt_bulkload/gpfdist导入

要注意修改分片键确保性能佳

oracle

lightdb-x分布式

大量小表:ora2pg

小量大表:小量大表:sqluldr2导出csv,ltldr/lt_bulkload/gpfdist导入

注意时候调用命令create_distributed_table指定分片键确保性能佳

mysql/mariadb

lightdb-x分布式

大量小表:ltloader

小量大表:select into outfile导出csv,ltldr/lt_bulkload导入



标签:copy,postgresql,lightdb,zjh,big,ltldr,COPY,table,csv
From: https://blog.51cto.com/zhjh256/6457075

相关文章

  • PostgreSQL常用运维SQL
    一、数据库连接1、获取数据库实例连接数selectcount(*)frompg_stat_activity;2、获取数据库最大连接数showmax_connections3、查询当前连接数详细信息select*frompg_stat_activity;4、查询数据库中各个用户名对应的数据库连接数selectusename,count(*)fr......
  • 【Checkpoint】Command for log's checkpoint - SQLserver, Oracle, PostgreSQL
    文档引子最近,SQLserver环境中的SQLalwayson因事务爆满导致磁盘持续告警,通过这次事件,记载下SQLserverAG的事务日志处理的正确方式,同时也把Oracle以及PG的相关的checkpoint问题一并做个简单的总结,并且只从结果的角度给出过程,至于具体的理论,请移步官方文档查阅。SQLserver检......
  • lightdb 忽略未知执行计划提示
    lightdb忽略未知执行计划提示执行计划提示介绍见其他文章背景Oracle中的执行计划提示(后文简称为hint)种类繁多,lightdb目前还没有做到全部兼容,因此需要忽视未实现的hint,继续执行其他支持的hint.比如:all_rowshint不支持。lightdb中hint之间以空格分隔或'()'......
  • Postgresql,MySQL, SQL Server 的多表连接(join)update 操作
    数据库更新时经常会join其他表做判断更新,PostgreSQL的写法与其他关系型数据库更有不同,下面以SQLServer,MySQL,PostgreSQL的数据库做对比和展示。先造数据源。createtableA(idint,cityvarchar(20));createtableB(idint,namevarchar(20));insertintoAvalues(1......
  • PostgreSQL中使用动态SQL-实现自动按时间创建表分区
    PostgreSQL中通过继承,可以支持基本的表分区功能,比如按时间,每月创建一个表分区,数据记录到对应分区中。按照官方文档的操作,创建子表和index、修改trigger等工作都必须DBA定期去手动执行,不能实现自动化,非常不方便。尝试着通过在plpgsql代码中使用动态SQL,将大表分区的运维操作实现......
  • 部署postgresql数据库
    当使用Docker部署PostgreSQL时,可以按照以下步骤进行操作:首先,确保已经安装了Docker并启动了Docker服务。打开终端或命令行界面,运行以下命令以拉取最新的PostgreSQLDocker镜像:dockerpullpostgres容器的启动命令将需要指定一些参数,如容器名称、端口映射、密码等。以下是一个示例命......
  • postgresql 集群和同步以及企业解决方案
    pgpool-II入门教程[url]http://www.pgpool.net/docs/latest/tutorial-zh_cn.html[/url],集群教程[size=medium][color=red][b]方案1:pgpool[/b][/color][/size]pgpool:设置简单,实现SharedNothing的双机写入同步,及查询负载均衡。也可结合Slony实现双机异步复制,提高写数据性能。......
  • Docker安装Java, Apache, Redis, Tomcat, Postgresql, SSH
    [color=red]centos安装Supervisor[/color][url]http://www.alphadevx.com/a/455-Installing-Supervisor-and-Superlance-on-CentOS[/url]网络设定[b][color=darkblue]#创建网络brctladdbrbr0iplinksetdevbr0upipaddradd192.168.2.1/24devbr0#创建容器#......
  • PostgreSQL In BigData 大数据Postgresql
    1.BigSQL(整合了pg和hadoop的一个开源项目)[url]http://www.bigsql.org/se/[/url]2.ClouderaManagerDB[url]http://www.cloudera.com/content/cloudera/en/home.html[/url]3.Hadoopdb(耶鲁大学的一个开源项目)SQLtoMapReducetoSQL(SMS)Plann......
  • lightdb增量检查点特性及稳定性测试
    checkpoint是一个数据库事件,它将已修改的数据从高速缓存刷新到磁盘,并更新控制文件和数据文件,此时会有大量的I/O写操作。在PostgreSQL中,检查点(后台)进程执行检查点;当发生下列情况之一时,其进程将启动:检查点间隔时间由checkpoint_timeout设置(默认间隔为300秒(5分钟))在9.5版或更高......