背景
向量数据库集群只有一个,如果集群出问题将影响整个业务,所以需要对向量数据库中的数据做定时备份
现有两个milvus集群,方案如下
通过对milvus集群的了解发现其数据的存储是集群中的minio组件,所以做的整个数据备份及恢复是基于minio来做的。
首先是在需要做备份的集群中对minio数据做备份(milvus-backup),然后通过rclone将备份数据传输到备份集群的minio集群中,然后利用py脚本来对备份集群进行数据清理,然后恢复最新的备份数据到集群中,最后加载到内存中并进行数据的校验和对比最后告警出来
milvus-backup备份
工具地址milvus-backup下载地址
需要配置好config文件
这里配置两个,一个是备份生产数据的,一个是在备份集群中恢复数据的
此处只做一个示例
# Configures the system log output.
log:
level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
console: true # whether print log to console
file:
rootPath: "logs/backup.log"
http:
simpleResponse: true
# milvus proxy address, compatible to milvus.yaml
milvus:
address:
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "milvus-bucket" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "file" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "backup" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "file" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
主要更改地址及端口
创建备份文件
./milvus-backup create -n backup_${datetime} --config backup-prod.yaml
会在prod的minio集群中生成一个备份数据
备份数据传输
生产备份好的数据是在生产milvus集群中的minio中,所以需要工具进行对象存储的传输,此处使用rclone
此处rclone配置文件
[prod-minio]
type = s3
provider = Minio
env_auth = false
access_key_id = minioadmin
secret_access_key = minioadmin
endpoint =
[backup-minio]
type = s3
provider = Minio
env_auth = false
access_key_id = minioadmin
secret_access_key = minioadmin
endpoint =
此处主要修改endpoint,为minio的地址
备份数据传输
rclone copy -P prod-minio:backup/file/backup_${datetime} backup-minio:backup/file/backup_${datetime}
清理备份集群中现有数据
#!/usr/bin/env python3.8
from pymilvus import (
connections,
utility,
)
# backup-server proxy地址
hosts = "XXXXXX"
# 连接milvus服务器
connections.connect("default", host=hosts, port="19530")
# 获取collection列表
conn_lst = utility.list_collections()
for i in conn_lst:
utility.drop_collection(i)
备份数据恢复到备份集群并加载至内存
恢复数据使用backup工具实现
./milvus-backup restore --restore_index -n backup_${datetime} --config restore-prod.yaml
加载collections使用py脚本实现
#!/usr/bin/env python3.8
from pymilvus import (
connections,
utility,
Collection,
)
hosts="XXXXXX"
# 连接milvus服务器
connections.connect("default", host=hosts, port="19530")
# 获取collection列表
conn_lst = utility.list_collections()
# 遍历collection列表
for i in conn_lst:
collection = Collection(i)
collection.load()
校验数据并告警
此处同样用py脚本去实现校验
最后便是实现完全的自动化,在服务器上创建定时任务并编写自动化shell脚本,使其在每晚两点开始执行备份
#!/bin/bash
# 创建完整备份文件
echo -e "\033[32m----------start backup prod milvus data to minio----------\033[0m"
datetime=$(date "+%Y%m%d")
cd /opt/milvus-backups
./milvus-backup create -n backup_${datetime} --config backup-prod.yaml
echo -e "\033[32m----------backup prod milvus data to minio finish----------\033[0m"
# 拷贝备份文件到备份机器
echo -e "\033[32m----------copy backup files to backup minio server----------\033[0m"
rclone copy -P prod-minio:backup/file/backup_${datetime} backup-minio:backup/file/backup_${datetime}
echo -e "\033[32m----------copy backup files finish----------\033[0m"
# 清理备份集群现有数据
echo -e "\033[32m----------clean backup-server data----------\033[0m"
python3 /opt/milvus-backups/delete.py
echo -e "\033[32m----------clean data finish------------\033[0m"
# 从备份数据恢复至备份集群
echo -e "\033[32m----------start restore backup files----------\033[0m"
./milvus-backup restore --restore_index -n backup_${datetime} --config restore-prod.yaml
echo -e "\033[32m----------restore backup files finish----------\033[0m"
# 加载collections
echo -e "\033[32m----------load collection------------\033[0m"
python3 /opt/milvus-backups/load.py
echo -e "\033[32m----------load finish------------\033[0m"
# 对比数据并上传qw
python3 /opt/milvus-backups/alert.py