首页 > 其他分享 >Datax安装及基本使用

Datax安装及基本使用

时间:2022-11-29 16:31:12浏览次数:64  
标签:基本 name parameter json datax mysql hbase 安装 Datax


文章目录

  • ​​一、Datax概述​​
  • ​​1.概述​​
  • ​​2.DataX插件体系​​
  • ​​3.DataX核心架构​​
  • ​​二、安装​​
  • ​​2.1下载并解压​​
  • ​​2.2运行自检脚本​​
  • ​​三、基本使用​​
  • ​​3.1从stream读取数据并打印到控制台​​
  • ​​1. 查看官方json配置模板​​
  • ​​2. 根据模板编写json文件​​
  • ​​3. 运行Job​​
  • ​​3.2 Mysql导入数据到HDFS​​
  • ​​1. 查看官方json配置模板​​
  • ​​2. 根据模板编写json文件​​
  • ​​3. 运行Job​​
  • ​​3.3 HDFS数据导出到Mysql​​
  • ​​1. 将3.2中导入的文件重命名并在数据库创建表​​
  • ​​2. 查看官方json配置模板​​
  • ​​3. 根据模板编写json文件​​
  • ​​4. 运行Job​​
  • ​​3.4 mysql同步到mysql​​
  • ​​3.5 mysql同步到hbase​​
  • ​​3.6 hbase同步到hbase​​
  • ​​3.7 hbase同步到mysql​​
  • ​​四、辅助资料​​

一、Datax概述

1.概述

Datax安装及基本使用_datax安装及基本使用

2.DataX插件体系

Datax安装及基本使用_mysql_02

3.DataX核心架构

Datax安装及基本使用_json_03

二、安装

2.1下载并解压

源码地址: ​​https://github.com/alibaba/DataX ​​​ 这里我下载的是最新版本的 DataX3.0 。下载地址为:
​http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz​

# 下载后进行解压
[xiaokang@hadoop ~]$ tar -zxvf datax.tar.gz -C /opt/software/

2.2运行自检脚本

[xiaokang@hadoop ~]$ cd /opt/software/datax/
[xiaokang@hadoop datax]$ bin/datax.py job/job.json

出现以下界面说明DataX安装成功

Datax安装及基本使用_mysql_04

三、基本使用

3.1从stream读取数据并打印到控制台

1. 查看官方json配置模板

[xiaokang@hadoop ~]$ python /opt/software/datax/bin/datax.py -r streamreader -w streamwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the streamreader document:
https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md

Please refer to the streamwriter document:
https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md

Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.

{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [],
"sliceRecordCount": ""
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

2. 根据模板编写json文件

{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"type":"string",
"value":"xiaokang-微信公众号:小康新鲜事儿"
},
{
"type":"string",
"value":"你好,世界-DataX"
}
],
"sliceRecordCount": "10"
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "utf-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": "2"
}
}
}
}

3. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./stream2stream.json

Datax安装及基本使用_mysql_05

3.2 Mysql导入数据到HDFS

示例:导出 MySQL 数据库中的 help_keyword 表到 HDFS 的 /datax目录下(此目录必须提前创建)。

Datax安装及基本使用_大数据_06

1. 查看官方json配置模板

[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r mysqlreader -w hdfswriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md

Please refer to the hdfswriter document:
https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md

Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.

{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [],
"compress": "",
"defaultFS": "",
"fieldDelimiter": "",
"fileName": "",
"fileType": "",
"path": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

2. 根据模板编写json文件

Datax安装及基本使用_mysql_07

Datax安装及基本使用_mysql_08

{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"help_keyword_id",
"name"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://192.168.1.106:3306/mysql"
],
"table": [
"help_keyword"
]
}
],
"password": "xiaokang",
"username": "root"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name":"help_keyword_id",
"type":"int"
},
{
"name":"name",
"type":"string"
}
],
"defaultFS": "hdfs://hadoop:9000",
"fieldDelimiter": "|",
"fileName": "keyword.txt",
"fileType": "text",
"path": "/datax",
"writeMode": "append"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}

3. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./mysql2hdfs.json

3.3 HDFS数据导出到Mysql

1. 将3.2中导入的文件重命名并在数据库创建表

[xiaokang@hadoop ~]$ hdfs dfs -mv /datax/keyword.txt__4c0e0d04_e503_437a_a1e3_49db49cbaaed /datax/keyword.txt

表必须预先创建,建表语句如下:

CREATE  TABLE  help_keyword_from_hdfs_datax LIKE help_keyword;

2. 查看官方json配置模板

[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r hdfsreader -w mysqlwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the hdfsreader document:
https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md

Please refer to the mysqlwriter document:
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md

Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.

{
"job": {
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"column": [],
"defaultFS": "",
"encoding": "UTF-8",
"fieldDelimiter": ",",
"fileType": "orc",
"path": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

3. 根据模板编写json文件

{
"job": {
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"column": [
"*"
],
"defaultFS": "hdfs://hadoop:9000",
"encoding": "UTF-8",
"fieldDelimiter": "|",
"fileType": "text",
"path": "/datax/keyword.txt"
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [
"help_keyword_id",
"name"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.106:3306/mysql",
"table": ["help_keyword_from_hdfs_datax"]
}
],
"password": "xiaokang",
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}

4. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./hdfs2mysql.json

3.4 mysql同步到mysql

{
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"password": "gee123456",
"username": "geespace",
"connection": [{
"jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"],
"querySql": ["SELECT id, name FROM test_test"]
}]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["id", "name"],
"password": "gee123456",
"username": "geespace",
"writeMode": "insert",
"connection": [{
"table": ["test_test_1"],
"jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"
}]
}
}
}],
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
}
}
}

3.5 mysql同步到hbase

{
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"password": "gee123456",
"username": "geespace",
"connection": [{
"jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"],
"querySql": ["SELECT id, name FROM test_test"]
}]
}
},
"writer": {
"name": "hbase11xwriter",
"parameter": {
"mode": "normal",
"table": "test_test_1",
"column": [{
"name": "f:id",
"type": "string",
"index": 0
}, {
"name": "f:name",
"type": "string",
"index": 1
}],
"encoding": "utf-8",
"hbaseConfig": {
"hbase.zookeeper.quorum": "192.168.20.91:2181",
"zookeeper.znode.parent": "/hbase"
},
"rowkeyColumn": [{
"name": "f:id",
"type": "string",
"index": 0
}, {
"name": "f:name",
"type": "string",
"index": 1
}]
}
}
}],
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
}
}
}

3.6 hbase同步到hbase

{
"job": {
"content": [{
"reader": {
"name": "hbase11xreader",
"parameter": {
"mode": "normal",
"table": "test_test",
"column": [{
"name": "f:id",
"type": "string"
}, {
"name": "f:name",
"type": "string"
}],
"encoding": "utf-8",
"hbaseConfig": {
"hbase.zookeeper.quorum": "192.168.20.91:2181",
"zookeeper.znode.parent": "/hbase"
}
}
},
"writer": {
"name": "hbase11xwriter",
"parameter": {
"mode": "normal",
"table": "test_test_1",
"column": [{
"name": "f:id",
"type": "string",
"index": 0
}, {
"name": "f:name",
"type": "string",
"index": 1
}],
"encoding": "utf-8",
"hbaseConfig": {
"hbase.zookeeper.quorum": "192.168.20.91:2181",
"zookeeper.znode.parent": "/hbase"
},
"rowkeyColumn": [{
"name": "f:id",
"type": "string",
"index": 0
}, {
"name": "f:name",
"type": "string",
"index": 1
}]
}
}
}],
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
}
}
}

3.7 hbase同步到mysql

{
"job": {
"content": [{
"reader": {
"name": "hbase11xreader",
"parameter": {
"mode": "normal",
"table": "test_test_1",
"column": [{
"name": "f:id",
"type": "string"
}, {
"name": "f:name",
"type": "string"
}],
"encoding": "utf-8",
"hbaseConfig": {
"hbase.zookeeper.quorum": "192.168.20.91:2181",
"zookeeper.znode.parent": "/hbase"
}
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["id", "name"],
"password": "gee123456",
"username": "geespace",
"writeMode": "insert",
"connection": [{
"table": ["test_test"],
"jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"
}]
}
}
}],
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
}
}
}



标签:基本,name,parameter,json,datax,mysql,hbase,安装,Datax
From: https://blog.51cto.com/u_15896157/5896005

相关文章

  • Pycharm无法导入安装CV2包注意事项
    一、在安装CV2时若报错pip版本问题,则应该通过cmd找到安装的Python解释器路径执行命令:pipinstall--upgradepip二、如果pip安装成功还报错:ERROR:Couldnotfindavers......
  • (转)NFS在Linux下的安装、部署与应用
    转:https://baijiahao.baidu.com/s?id=1694470911715977170&wfr=spider&for=pcNFS文件系统是Sun公司开发的网络文件系统,也称为分布式文件系统,其基本原理是将某个设备本地......
  • Kafka基本概念大全
    下面给出Kafka一些重要概念,让大家对Kafka有个整体的认识和感知,后面还会详细的解析每一个概念的作用以及更深入的原理•Producer:消息生产者,向KafkaBroker发消息的......
  • Kafka基本概念大全
    下面给出Kafka一些重要概念,让大家对Kafka有个整体的认识和感知,后面还会详细的解析每一个概念的作用以及更深入的原理•Producer:消息生产者,向KafkaBroker发消息的......
  • pnpm的安装与使用
    在npm基础上全局安装npminstallpnpm-g设置源查看源pnpmconfiggetregistry切换淘宝源pnpmconfigsetregistryhttp://registry.npm.taobao.org使用pnpm......
  • LINUX离线安装ftp服务
    1.下载FTP离线安装包:         下载地址:http://rpmfind.net/linux/rpm2html/search.php?query=vsftpd(x86-64) 2.检查是否已经安装了vsftp   ......
  • 基本数据类型在转换时的注意点
    基本数据类型在转换时的注意点以Java的两种常用数值类型为例long,int常用的一种错误的防止溢出的写法是inta=???????,b=????????;longc=a*b;当a*b超出In......
  • Mysql一次安装问题记录
    本次在redhat8.7的系统上安装MySQL5.7.37版本,关于一些启动失败的问题 MySQL包地址:https://downloads.mysql.com/archives/get/p/23/file/mysql-5.7.37-el7-x86_64.tar.......
  • Java安装JDK
    JDK、JRE、JVMJDK:JavaDevelopmentKit-Java开发者工具JRE:JavaRuntimeEnvironment-Java运行时环境JVM:JavaVirtualMachine-Java虚拟机  Java开......
  • 创建可引导的 macOS 安装器
    创建可引导的macOS安装器,从苹果官网复制,命令记不住,写在博客里只是方便自己,不用每次都要去翻官网看。macOS  最新版本macOSVentura         13.0.1macOSMo......