DataSophon集成DataX&DataX-Web(单节点)
DATAX部署
环境准备
- JDK(1.8以上,推荐1.8)
- Python(2或3都可以,linux自带py2,py3执行脚本会报错,需要修改脚本)
- Apache Maven 3.x (Compile DataX,如果下载的是官方的压缩包[datax.tar.gz],不用安装这个,如果是在git拉的项目,打包时需要)
安装包编译
方法一、直接下载DataX工具包:DataX下载地址
下载后解压至本地某个目录,进入bin目录,即可运行同步作业:
$ cd {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}
自检脚本:
python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
方法二、下载DataX源码,自己编译:DataX源码
(1)、下载DataX源码:
如果没有将服务器的密钥配置到Github,使用Git克隆源表的时候会拉取失败,可以考虑直接将下载zip文件上传到服务器
# git clone https://github.com/alibaba/DataX.git
git clone git@github.com:alibaba/DataX.git
(2)、通过maven打包:
$ cd {DataX_source_code_home}
$ mvn -U clean package assembly:assembly -Dmaven.test.skip=true
打包成功,日志显示如下:
该过程大约20~30分钟。具体快慢看当前的网速如何
[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------------------------
[INFO] Total time: 08:12 min
[INFO] Finished at: 2015-12-13T16:26:48+08:00
[INFO] Final Memory: 133M/960M
[INFO] -----------------------------------------------------------------
打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:
$ cd {DataX_source_code_home}
$ ls ./target/datax/datax/
bin conf job lib log log_perf plugin script tmp
上传DataX安装包
cd /opt/datasophon/DDP/packages/
md5sum datax.tar.gz | awk '{print $1}' > datax.tar.gz.md5
准备配置文件service_ddl.json
进入datasophon-manager-1.2.1中
mkdir -p /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAX
vi /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAX/service_ddl.json
{
"name": "Datax",
"label": "Datax",
"description": "离线数据同步工具",
"version": "1.0.0",
"sortNum": 6,
"dependencies":[],
"packageName": "datax.tar.gz",
"decompressPackageName": "datax",
"runAs":"root",
"roles": [
{
"name": "DataxClient",
"label": "DataxClient",
"roleType": "client",
"cardinality": "1+",
"logFile": ""
}
],
"configWriter": {
"generators": []
},
"parameters": []
}
重启datasophon-manager的api
sh /opt/datasophon/datasophon-manager-1.2.1/bin/datasophon-api.sh restart api
安装DataX
添加DataX服务。
直接下一步。
选择安装DataX的工作节点。
直接下一步,不需要进行任何配置。
DataSophon集成DataX成功。
注意:其实DataSophon集成DataX本质就是将安装包进行解压到指定节点集群,为了就是后续集成DataX-Web的时候进行调用。
DATAX-WEB部署
环境准备
- 1.MySQL (5.5+) 必选,对应客户端可以选装, Linux服务上若安装mysql的客户端可以通过部署脚本快速初始化数据库
- 2.JDK (1.8.0_xxx) 必选(本次部署java版本为1.8.0_131)
- 3.Maven (3.6.1+) 必选(本次部署maven版本为3.6.3)
- 4.DataX 必选 (本次部署DataX 版本为3.0)
- 5.Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造 (本次部署Python版本为2.7.5)
编译打包
-
克隆
git clone https://github.com/WeiYe-Jing/datax-web.git
-
直接从Git上面获得源代码,在项目的根目录下执行如下命令
mvn clean install
-
执行成功后将会在工程的build目录下生成安装包
build/datax-web-{VERSION}.tar.gz
获取安装包
-
解压安装包
在选定的安装目录,解压安装包
# tar -zxvf datax-web-{VERSION}.tar.gz tar -zxvf datax-web-2.1.2.tar.gz
-
执行一键安装脚本
进入解压后的目录,找到bin目录下面的install.sh文件,如果选择交互式的安装,则直接执行
./bin/install.sh
在交互模式下,对各个模块的package压缩包的解压以及configure配置脚本的调用,都会请求用户确认,可根据提示查看是否安装成功,如果没有安装成功,可以重复尝试; 如果不想使用交互模式,跳过确认过程,则执行以下命令安装
./bin/install.sh --force
准备配置文件status-all.sh
vi datax-web-2.1.2/bin/status-all.sh
#!/bin/bash
set -e # 启用错误检查,发生错误时退出
# 定义服务名称
DATAX_SERVICES=("datax-admin" "datax-executor")
# 获取进程ID函数
get_pid() {
local service_name=$1
# 查找正在运行的服务的进程ID
pid=$(ps -ef | grep -v grep | grep "$service_name" | awk '{print $2}')
echo "$pid"
}
# 检查多个服务的状态
check_status() {
local service_names=("$@")
local all_running=true # 假设所有服务都运行
for service_name in "${service_names[@]}"; do
pid=$(get_pid "$service_name")
if [ -z "$pid" ]; then
echo "$service_name is NOT running."
all_running=false # 如果有服务未启动,标记为 false
else
echo "$service_name is running with PID $pid."
fi
done
# 如果所有服务都运行,则退出0,否则退出1
if [ "$all_running" = true ]; then
exit 0
else
exit 1
fi
}
# 同时检查 datax-admin 和 datax-executor 状态
check_status "${DATAX_SERVICES[@]}"
重新编译安装包
cp -r datax-web-2.1.2 /opt/datasophon/DDP/packages/
cd /opt/datasophon/DDP/packages/
#拷贝初始化SQL出来备用
cp -r datax-web-2.1.2/db/datax_web.sql /opt/datasophon/DDP/packages/
tar -czf datax-web-2.1.2.tar.gz datax-web-2.1.2
md5sum datax-web-2.1.2.tar.gz | awk '{print $1}' >datax-web-2.1.2.tar.gz.md5
准备配置文件service_ddl.json
进入datasophon-manager-1.2.1中
mkdir -p /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAXWEB
vi /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAXWEB/service_ddl.json
{
"name": "DATAXWEB",
"label": "DataxWeb",
"description": "DATAX离线数据可视化同步工具",
"version": "2.1.2",
"sortNum": 22,
"dependencies": [],
"packageName": "datax-web-2.1.2.tar.gz",
"decompressPackageName": "datax-web-2.1.2",
"roles": [{
"name": "DataxWeb",
"label": "DataxWeb",
"roleType": "worker",
"runAs": {
"user": "root",
"group": "root"
},
"cardinality": "1+",
"sortNum": 3,
"jmxPort": 2192,
"startRunner": {
"timeout": "60",
"program": "bin/start-all.sh",
"args": []
},
"stopRunner": {
"timeout": "600",
"program": "bin/stop-all.sh",
"args": []
},
"statusRunner": {
"timeout": "60",
"program": "bin/status-all.sh",
"args": []
},
"externalLink": {
"name": "DataxWebUi",
"label": "DataxWebUi",
"url": "http://${host}:9527/index.html"
}
}],
"configWriter": {
"generators": [{
"filename": "bootstrap.properties",
"configFormat": "custom",
"outputDirectory": "modules/datax-admin/conf",
"templateName": "bootstrap-properties.ftl",
"includeParams": ["DB_HOST", "DB_PORT", "DB_USERNAME", "DB_PASSWORD", "DB_DATABASE"]
}, {
"filename": "datax-admin/bin/env.properties",
"configFormat": "custom",
"outputDirectory": "modules",
"templateName": "datax-admin-env-properties.ftl",
"includeParams": ["MAIL_USERNAME", "MAIL_PASSWORD"]
},{
"filename": "datax-executor/bin/env.properties",
"configFormat": "custom",
"outputDirectory": "modules",
"templateName": "datax-executor-env-properties.ftl",
"includeParams": ["PYTHON_PATH"]
}]
},
"parameters": [{
"name": "DB_HOST",
"label": "DATAXWEB数据库的主机名或IP地址",
"description": "DATAXWEB数据库的主机名或IP地址",
"configType": "map",
"required": true,
"type": "input",
"value": "192.168.10.21",
"configurableInWizard": true,
"hidden": false,
"defaultValue": "localhost"
}, {
"name": "DB_PORT",
"label": "DATAXWEB数据库监听的端口号",
"description": "DATAXWEB数据库监听的端口号",
"configType": "map",
"required": true,
"type": "input",
"value": "3306",
"configurableInWizard": true,
"hidden": false,
"defaultValue": "3306"
}, {
"name": "DB_USERNAME",
"label": "用于连接DATAXWEB数据库的用户名",
"description": "用于连接DATAXWEB数据库的用户名",
"configType": "map",
"required": true,
"type": "input",
"value": "root",
"configurableInWizard": true,
"hidden": false,
"defaultValue": "root"
}, {
"name": "DB_PASSWORD",
"label": "用于连接DATAXWEB数据库的密码",
"description": "用于连接DATAXWEB数据库的密码",
"configType": "map",
"required": true,
"type": "input",
"value": "yixiao666",
"configurableInWizard": true,
"hidden": false,
"defaultValue": "123456"
}, {
"name": "DB_DATABASE",
"label": "用于连接DATAXWEB数据库的库名",
"description": "用于连接DATAXWEB数据库的库名",
"configType": "map",
"required": true,
"type": "input",
"value": "dataxweb",
"configurableInWizard": true,
"hidden": false,
"defaultValue": "dataxweb"
}, {
"name": "MAIL_USERNAME",
"label": "告警邮箱账号用户名",
"description": "告警邮箱账号用户名",
"configType": "map",
"required": true,
"type": "input",
"value": "",
"configurableInWizard": true,
"hidden": false,
"defaultValue": ""
}, {
"name": "MAIL_PASSWORD",
"label": "告警邮箱密码",
"description": "告警邮箱密码",
"configType": "map",
"required": true,
"type": "input",
"value": "",
"configurableInWizard": true,
"hidden": false,
"defaultValue": ""
}, {
"name": "PYTHON_PATH",
"label": "DATAX的Python解释器路径",
"description": "DATAX的Python解释器路径",
"configType": "map",
"required": true,
"type": "input",
"value": "/opt/datasophon/datax/bin/datax.py",
"configurableInWizard": true,
"hidden": false,
"defaultValue": "/opt/datasophon/datax/bin/datax.py"
}]
}
各节点新增ftl脚本文件
注意:所有节点都需要操作
bootstrap-properties.ftl
vi /opt/datasophon/datasophon-worker/conf/templates/bootstrap-properties.ftl
#Database
DB_HOST=${DB_HOST}
DB_PORT=${DB_PORT}
DB_USERNAME=${DB_USERNAME}
DB_PASSWORD=${DB_PASSWORD}
DB_DATABASE=${DB_DATABASE}
datax-admin-env-properties.ftl
vi /opt/datasophon/datasophon-worker/conf/templates/datax-admin-env-properties.ftl
# environment variables
JAVA_HOME="/usr/local/jdk1.8.0_333"
WEB_LOG_PATH=/opt/datasophon/dataxweb/modules/datax-admin/logs
WEB_CONF_PATH=/opt/datasophon/dataxweb/modules/datax-admin/conf
DATA_PATH=/opt/datasophon/dataxweb/modules/datax-admin/data
SERVER_PORT=9527
# mail account
MAIL_USERNAME="${MAIL_USERNAME}"
MAIL_PASSWORD="${MAIL_PASSWORD}"
#debug
#REMOTE_DEBUG_SWITCH=true
#REMOTE_DEBUG_PORT=7003
datax-executor-env-properties.ftl
vi /opt/datasophon/datasophon-worker/conf/templates/datax-executor-env-properties.ftl
# environment variables
JAVA_HOME="/usr/local/jdk1.8.0_333"
SERVICE_LOG_PATH=/opt/datasophon/dataxweb/modules/datax-executor/logs
SERVICE_CONF_PATH=/opt/datasophon/dataxweb/modules/datax-executor/conf
DATA_PATH=/opt/datasophon/dataxweb/modules/datax-executor/data
## datax json文件存放位置
JSON_PATH=/opt/datasophon/dataxweb/modules/datax-executor/json
## executor_port
EXECUTOR_PORT=9999
## 保持和datax-admin端口
DATAX_ADMIN_PORT=
## PYTHON脚本执行位置
PYTHON_PATH=${PYTHON_PATH}
## dataxweb 服务端口
SERVER_PORT=9504
#debug 远程调试端口
#REMOTE_DEBUG_SWITCH=true
#REMOTE_DEBUG_PORT=7004
重启
各节点worker重启
sh /opt/datasophon/datasophon-worker/bin/datasophon-worker.sh restart worker
主节点重启api
sh /opt/datasophon/datasophon-manager-1.2.1/bin/datasophon-api.sh restart api
手动创建数据库并且运行sql
执行/opt/datasophon/DDP/packages目录下datax_web.sql创建datax_web数据库表。
CREATE DATABASE dataxweb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
use dataxweb;
source /opt/datasophon/DDP/packages/datax_web.sql;
安装DATAXWEB
添加DATAXWEB服务
直接下一步。
分配Admin、Executor角色。根据实际选择安装在哪个节点机器
根据实际情况修改相关配置。
添加DATAXWEB服务成功
标签:opt,Web,datasophon,DB,DataSophon1.2,datax,true,DataX From: https://www.cnblogs.com/yixiaocn/p/18598188