题记部分
001 || mysql2hdfs
(1)查看MySQL被迁移的数据情况
(2)根据需求确定reader为mysqlreader,writer为hdfswriter
查看reader和writer模板的方式(-r读模板;-w写模板)
python bin/datax.py -r mysqlreader -w hdfswriter
(3)编写同步json脚本
(4)确定HDFS上目标路径是否存在
(5)通过datax.py指定json任务运行同步数据
(6)数据验证,查看HDFS上是否已经有MySQL对应表中的所有数据
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"paramter": {
"column": ["id","name"],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://xxxxx:3306/dbName"],
"table": ["test"]
}
],
"password": "twgdhbtzhy",
"username": "root",
"splitPk": ""
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name": "id", "type": "bigint"},
{"name": "name", "type": "string"}
],
"compress": "gzip",
"defaultFS": "hdfs://xxxxx:8020",
"fieldDelimiter": "\t",
"fileName": "test",
"fileType": "text",
"path": "/test",
"writeMode": "append"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
(7)任务执行
hdfs dfs -mkdir /test
python bin/datax.py job/mysql2hdfs.json
(8)
002 || 标题
003 || 标题
标签:03,py,name,mysqlreader,writer,案例,DataX,reader,test From: https://www.cnblogs.com/houhuilinblogs/p/18613143