一、什么是Monstache
Monstache 是Golang语言实现的基于MongoDB的oplog实现实时数据同步及订阅的插件,支持MongoDB与ES之间的数据同步。其中MongoDB需要搭建副本集。
二、安装过程
1、安装Go,并配置环境变量
(1) 下载Go并解压 wget https://dl.google.com/go/go1.14.4.linux-amd64.tar.gz tar -C /usr/local -xzf go1.14.4.linux-amd64.tar.gz (2) 建立go的工作空间,即GOAPTH所指向的目录 创建三个目录 mkdir bin、mkdir src、mkdir pkg (3) 配置环境变量 使用 vim /etc/profile命令打开环境变量配置文件,并将如下内容写入该文件中。其中GOPROXY用来指定阿里云Go模块代理 export GOROOT=/usr/local/go export GOPATH=/home/go/ export PATH=$PATH:$GOROOT/bin:$GOPATH/bin export GOPROXY=https://mirrors.aliyun.com/goproxy/ (4) 应用环境变量配置 source /etc/profile
2、Monstache说明
(1)Monstache-Mongodb-Es 版本关系:
Monstache version |
Git branch (used to build plugin) |
Docker tag |
Description |
Elasticsearch |
MongoDB |
Status |
6 |
rel6 |
rel6, latest |
MongoDB, Inc. go driver |
Version 7+ |
Version 2.6+ |
Supported |
5 |
rel5 |
rel5 |
MongoDB, Inc. go driver |
Version 6 |
Version 2.6+ |
Supported |
4 |
master |
rel4 |
mgo community go driver |
Version 6 |
Version 3 |
Deprecated |
3 |
rel3 |
rel3 |
mgo community go driver |
Versions 2 and 5 |
Version 3 |
Deprecated |
(2)Monstache配置
参数 |
说明 |
mongo-url |
MongoDB实例的主节点访问地址。可在实例的基本信息页面获取,获取前需配置MongoDB实例的白名单,即在白名单中添加安装Monstache的ECS实例的内网IP地址。 |
elasticsearch-urls |
Elasticsearch实例的访问地址,格式为http://<Elasticsearch实例的内网地址>:9200。 |
direct-read-namespaces |
指定待同步的集合 |
change-stream-namespaces |
如果要使用MongoDB变更流功能,需要指定此参数。启用此参数后,oplog追踪会被设置为无效 |
namespace-regex |
通过正则表达式指定需要监听的集合。此设置可以用来监控符合正则表达式的集合中数据的变化。 |
elasticsearch-user |
访问Elasticsearch实例的用户名,默认为elastic。 注意 实际业务中不建议使用elastic用户,这样会降低系统安全性。建议使用自建用户,并给予自建用户分配相应的角色和权限。 |
elasticsearch-password |
对应用户的密码。elastic用户的密码在创建实例时指定,如果忘记可进行重置 |
elasticsearch-max-conns |
定义连接Elasticsearch的线程数。默认为4,即使用4个Go线程同时将数据同步到Elasticsearch。 |
dropped-collections |
默认为true,表示当删除MongoDB集合时,会同时删除Elasticsearch中对应的索引。 |
dropped-databases |
默认为true,表示当删除MongoDB数据库时,会同时删除Elasticsearch中对应的索引。 |
resume |
默认为false。设置为true,Monstache会将已成功同步到Elasticsearch的MongoDB操作的时间戳写入monstache.monstache集合中。当Monstache因为意外停止时,可通过该时间戳恢复同步任务,避免数据丢失。如果指定了cluster-name,该参数将自动开启。 |
resume-strategy |
指定恢复策略。仅当resume为true时生效。 |
verbose |
默认为false,表示不启用调试日志。 |
cluster-name |
指定集群名称。指定后,Monstache将进入高可用模式,集群名称相同的进程将进行协调 |
mapping |
指定Elasticsearch索引映射。默认情况下,数据从MongoDB同步到Elasticsearch时,索引会自动映射为数据库名.集合名。如果需要修改索引名称,可通过该参数设置。 |
3、Monstache安装
(1)进入安装路径
cd /usr/local
(2)使用命令下载或手动上传安装包
1)从git库下载
git clone https://github.com/rwynn/monstache.git,若提示下载失败需先下载git ,使用yum install -y git命令
2)手动上传
资源下载地址: https://github.com/rwynn/monstache
进入安装目录,安装
cd /usr/local/monstache-6.7.11 go install
查看版本
安装完成
4、配置实时同步任务
(1)进入Monstache安装目录,创建并编辑配置文件
#创建monstache的存储目录 cd /usr/local/monstache-6.7.11 mkdir config mkdir logs
进入config目录,编辑创建配置文件
cd config vim config.toml
文件内容如下
# connection settings # connect to MongoDB using the following URL mongo-url = "mongodb://192.168.8.79:27017/pns" # connect to the Elasticsearch REST API at the following node URLs elasticsearch-urls = ["http://192.168.8.153:9200"] # frequently required settings # if you need to seed an index from a collection and not just listen and sync changes events # you can copy entire collections or views from MongoDB to Elasticsearch direct-read-namespaces = ["库名.集合名"] # if you want to use MongoDB change streams instead of legacy oplog tailing use change-stream-namespaces # change streams require at least MongoDB API 3.6+ # if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment # in this case you usually don't need regexes in your config to filter collections unless you target the deployment. # to listen to an entire db use only the database name. For a deployment use an empty string. #change-stream-namespaces = ["mydb.col"] # additional settings # if you don't want to listen for changes to all collections in MongoDB but only a few # e.g. only listen for inserts, updates, deletes, and drops from mydb.mycollection # this setting does not initiate a copy, it is only a filter on the change event listener namespace-regex = '^库名\.集合名$' # compress requests to Elasticsearch gzip = true # generate indexing statistics #stats = true # index statistics into Elasticsearch #index-stats = true # use the following PEM file for connections to MongoDB #mongo-pem-file = "/path/to/mongoCert.pem" # disable PEM validation #mongo-validate-pem-file = false # use the following user name for Elasticsearch basic auth elasticsearch-user = "elastic" # use the following password for Elasticsearch basic auth elasticsearch-password = "<your_es_password>" # use 4 go routines concurrently pushing documents to Elasticsearch elasticsearch-max-conns = 4 # use the following PEM file to connections to Elasticsearch #elasticsearch-pem-file = "/path/to/elasticCert.pem" # validate connections to Elasticsearch #elastic-validate-pem-file = true # propogate dropped collections in MongoDB as index deletes in Elasticsearch dropped-collections = true # propogate dropped databases in MongoDB as index deletes in Elasticsearch dropped-databases = true # do not start processing at the beginning of the MongoDB oplog # if you set the replay to true you may see version conflict messages # in the log if you had synced previously. This just means that you are replaying old docs which are already # in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones. #replay = false # resume processing from a timestamp saved in a previous run resume = true # do not validate that progress timestamps have been saved #resume-write-unsafe = false # override the name under which resume state is saved #resume-name = "default" # use a custom resume strategy (tokens) instead of the default strategy (timestamps) # tokens work with MongoDB API 3.6+ while timestamps work only with MongoDB API 4.0+ resume-strategy = 0 # exclude documents whose namespace matches the following pattern #namespace-exclude-regex = '^mydb\.ignorecollection$' # turn on indexing of GridFS file content #index-files = true # turn on search result highlighting of GridFS content #file-highlighting = true # index GridFS files inserted into the following collections #file-namespaces = ["users.fs.files"] # print detailed information including request traces verbose = true # enable clustering mode #cluster-name = 'es-cn-mp91kzb8m00******' # do not exit after full-sync, rather continue tailing the oplog #exit-after-direct-reads = false [[mapping]] namespace = "pns.user_device_table" index = "pns.user_device_table" #type = "device"
注:1)如果不设置mapping,那么同步到es的索引默认为mongoDB的数据库名.集合名
2)monstache是根据mongoDB的oplog来同步更新的,因此需要开启mongo的副本集才可以
3)当mongo有密码认证时mongoUrl需要改为
mongodb://用户:密码@IP:PORT/数据库名称?authSource=数据库名称&authMechanism=SCRAM-SHA-1,同时这个地址要为mongoDB主节点访问地址
4)direct-read-namespaces表示初始化会全量同步到es,而change-stream-namespaces只会同步有更改变化的
5)namespace-regex不设置时默认同步mongoDB中所有数据的更改变化,可以指定只同步哪个或哪些集合的增删改
(2)配置自启
#切换目录 cd /etc/init.d #编辑命令 vim monstache
脚本内容
#!/bin/bash # # chkconfig: 2345 10 90 # # description: MONSTACHE RUN #程序名 MONSTACHE_NAME="monstache-rel6" #资源位置 MONSTACHE_OPTS=/home/go/bin/monstache MONSTACHE_CONF=/usr/local/monstache-6.7.11/config/config.toml #日志位置 MONSTACHE_LOGS=/usr/local/monstache-6.7.11/logs/monstache.log #开始方法 start() { nohup $MONSTACHE_OPTS -f $MONSTACHE_CONF > $MONSTACHE_LOGS 2>&1 & echo "$MONSTACHE_NAME started success." } #结束方法 stop() { echo "stopping $MONSTACHE_NAME ..." kill -9 `ps -ef|grep $MANSTACHE_OPTS|grep -v grep|grep -v stop|awk '{print $2}'` } case "$1" in start) start ;; stop) stop ;; restart) stop start ;; *) echo "Userage: $0 {start|stop|restart}" exit 1 esac
#授权 chmod +x monstache #添加系统服务 chkconfig --add monstache chkconfig monstache on chkconfig --list
注:MONSTACHE_OPTS位置是在GO的安装目录下
(3)启停相关
#启停命令 systemctl start|stop|restart monstache #查看进程 ps -ef|grep monstache
注:
1、使用版本
mongo 4.4.15
monstache 6.7.11
elasticsearch 7.9.2
2、mongo单机创建副本集
参考文档:http://www.kaotop.com/it/14307.html
3、集群下创建副本集
参考文档:https://blog.csdn.net/qq_42428264/article/details/124227050
参考原文地址:https://blog.csdn.net/weixin_44187730/article/details/117198268
标签:true,MongoDB,Monstache,Elasticsearch,monstache,使用,go,安装 From: https://www.cnblogs.com/nastynail/p/16848500.html