首页 > 其他分享 > rabbit MQ —— ha-sync-mode. message 同步/ 丢失 in new pods

rabbit MQ —— ha-sync-mode. message 同步/ 丢失 in new pods

时间:2023-06-16 18:11:19浏览次数:55  
标签:will messages sync queue MQ mode mirror ha leader

经典队列镜像 — 兔子MQ (rabbitmq.com)

 

why?

message 信息同步 =》 queue 一段时间不可用(可用性 降低)

 

Configuring Synchronisation

Let's start with the most important aspect of queue synchronisation: while a queue is being synchronised, all other queue operations will be blocked. Depending on multiple factors, a queue might be blocked by synchronisation for many minutes or hours, and in extreme cases even days.

Queue synchronisation can be configured as follows:

  • ha-sync-mode: manual: this is the default mode. A new queue mirror will not receive existing messages, it will only receive new messages. The new queue mirror will become an exact replica of the leader over time, once consumers have drained messages that only exist on the leader. If the leader queue fails before all unsynchronised messages are drained, those messages will be lost. You can fully synchronise a queue manually, refer to unsynchronised mirrors section for details.
  • ha-sync-mode: automatic: a queue will automatically synchronise when a new mirror joins. It is worth reiterating that queue synchronisation is a blocking operation. If queues are small, or you have a fast network between RabbitMQ nodes and the ha-sync-batch-size was optimised, this is a good choice.

 

 

Unsynchronised Mirrors

A node may join a cluster at any time. Depending on the configuration of a queue, when a node joins a cluster, queues may add a mirror on the new node. At this point, the new mirror will be empty: it will not contain any existing contents of the queue. Such a mirror will receive new messages published to the queue, and thus over time will accurately represent the tail of the mirrored queue. As messages are drained from the mirrored queue, the size of the head of the queue for which the new mirror is missing messages, will shrink until eventually the mirror's contents precisely match the leader's contents. At this point, the mirror can be considered fully synchronised, but it is important to note that this has occurred because of actions of clients in terms of draining the pre-existing head of the queue.

A newly added mirror provides no additional form of redundancy or availability of the queue's contents that existed before the mirror was added, unless the queue has been explicitly synchronised. Since the queue becomes unresponsive while explicit synchronisation is occurring, it is preferable to allow active queues from which messages are being drained to synchronise naturally, and only explicitly synchronise inactive queues.

When enabling automatic queue mirroring, consider the expected on disk data set of the queues involved. Queues with a sizeable data set (say, tens of gigabytes or more) will have to replicate it to the newly added mirror(s), which can put a significant load on cluster resources such as network bandwidth and disk I/O. This is a common scenario with lazy queues, for example.

To see mirror status (whether they are synchronised), use:

# mirror_pids is a new field alias introduced in RabbitMQ 3.11.4
rabbitmqctl list_queues name mirror_pids synchronised_mirror_pids

It is possible to manually synchronise a queue:

rabbitmqctl sync_queue {name}

Or cancel an in-progress synchronisation:

rabbitmqctl cancel_sync_queue {name}

These features are also available through the management plugin.

 

 

Promotion of Unsynchronised Mirrors on Failure

By default if a queue's leader node fails, loses connection to its peers or is removed from the cluster, the oldest mirror will be promoted to be the new leader. In some circumstances this mirror can be unsynchronised, which will cause data loss.

Starting with RabbitMQ 3.7.5, the ha-promote-on-failure policy key controls whether unsynchronised mirror promotion is allowed. When set to when-synced, it will make sure that unsynchronised mirrors are not promoted.

Default value is always. The when-synced value should be used with care. It trades off safety from unsynchronised mirror promotion for increased reliance on queue leader's availability. Sometimes queue availability can be more important than consistency.

The when-synced promotion strategy avoids data loss due to promotion of an unsynchronised mirror but makes queue availability dependent on its leader's availability. In the event of queue leader node failure the queue will become unavailable until queue leader recovers. In case of a permanent loss of queue leader the queue won't be available unless it is deleted and redeclared. Deleting a queue deletes all of its contents, which means permanent loss of a leader with this promotion strategy equates to losing all queue contents.

Systems that use the when-synced promotion strategy must use publisher confirms in order to detect queue unavailability and broker's inability to enqueue messages.

 

 

Stopping Nodes Hosting Queue Leader with Only Unsynchronised Mirrors

It's possible that when you shut down a leader node that all available mirrors are unsynchronised. A common situation in which this can occur is rolling cluster upgrades.

By default, RabbitMQ will refuse to promote an unsynchronised mirror on controlled leader shutdown (i.e. explicit stop of the RabbitMQ service or shutdown of the OS) in order to avoid message loss; instead the entire queue will shut down as if the unsynchronised mirrors were not there.

An uncontrolled leader shutdown (i.e. server or node crash, or network outage) will still trigger a promotion of an unsynchronised mirror.

If you would prefer to have queue leader move to an unsynchronised mirror in all circumstances (i.e. you would choose availability of the queue over avoiding message loss due to unsynchronised mirror promotion) then set the ha-promote-on-shutdown policy key to always rather than its default value of when-synced.

If the ha-promote-on-failure policy key is set to when-synced, unsynchronised mirrors will not be promoted even if the ha-promote-on-shutdown key is set to always. This means that in the event of queue leader node failure the queue will become unavailable until leader recovers. In case of a permanent loss of queue leader the queue won't be available unless it is deleted (that will also delete all of its contents) and redeclared.

Note that ha-promote-on-shutdown and ha-promote-on-failure have different default behaviours. ha-promote-on-shutdown is set to when-synced by default, while ha-promote-on-failure is set to always by default.

 

Loss of a Leader While All Mirrors are Stopped

It is possible to lose the leader for a queue while all mirrors for the queue are shut down. In normal operation the last node for a queue to shut down will become the leader, and we want that node to still be the leader when it starts again (since it may have received messages that no other mirror saw).

However, when you invoke rabbitmqctl forget_cluster_node, RabbitMQ will attempt to find a currently stopped mirror for each queue which has its leader on the node we are forgetting, and "promote" that mirror to be the new leader when it starts up again. If there is more than one candidate, the most recently stopped mirror will be chosen.

It's important to understand that RabbitMQ can only promote stopped mirrors during forget_cluster_node, since any mirrors that are started again will clear out their contents as described at "stopping nodes and synchronisation" above. Therefore when removing a lost leader in a stopped cluster, you must invoke rabbitmqctl forget_cluster_node before starting mirrors again.

 

 

Batch Synchronization

Classic queue leaders perform synchronisation in batches. Batch can be configured via the ha-sync-batch-size queue argument. If no value is set mirroring_sync_batch_size is used as the default value. Earlier versions (prior to 3.6.0) will synchronise 1 message at a time by default. By synchronising messages in batches, the synchronisation process can be sped up considerably.

To choose the right value for ha-sync-batch-size you need to consider:

  • average message size
  • network throughput between RabbitMQ nodes
  • net_ticktime value

For example, if you set ha-sync-batch-size to 50000 messages, and each message in the queue is 1KB, then each synchronisation message between nodes will be ~49MB. You need to make sure that your network between queue mirrors can accommodate this kind of traffic. If the network takes longer than net_ticktime to send one batch of messages, then nodes in the cluster could think they are in the presence of a network partition.

The amount of data sent over the network can also be controlled by setting the parameter mirroring_sync_max_throughput. The parameter specifies the number of bytes per second that is being transferred. The default is 0, which disables this feature.

标签:will,messages,sync,queue,MQ,mode,mirror,ha,leader
From: https://www.cnblogs.com/panpanwelcome/p/17486233.html

相关文章

  • RabbitMQ快速使用代码手册
    本篇博客的内容为RabbitMQ在开发过程中的快速上手使用,侧重于代码部分,几乎没有相关概念的介绍,相关概念请参考以下csdn博客,两篇都是我找的精华帖,供大家学习。本篇博客也持续更新~~~内容代码部分由于word转md格式有些问题,可以直接查看我的有道云笔记,链接:https://note.youdao.com/s/A......
  • rocketmq集群配置简介
    RocketMQ天生对集群的支持非常好,它有以下一些模式:(1)单Master优点:除了配置简单没什么优点缺点:不可靠,该机器重启或者宕机,将要导致整个服务不可用(2)多Master优点:配置简单,性能最高缺点:可能会有少量消息丢失(配置相关),单台机器重启或宕机期间,该机器下未被消费的消息在机器恢复前不可......
  • MQTT Broker 比较与选型——开源与商业服务器/服务对比
    MQTTBroker比较与选型——开源与商业服务器/服务对比  编程  2020-03-20  2020-03-21  评论数: 2开源MQTTBroker对比截止2021年,物联网行业里可选的MQTTBroker有很多,除了经典的Mosquitto和AWS、Azure,百度云、阿里云、IBM等几个提供物联网MQTT接入服务的产品外......
  • RocketMQ 从入门到实战
    扫一扫加入作者公众号扫一扫关注中间件兴趣圈RocketMQ官微扫一扫关注【阿里巴巴云原生】公众号阿里云开发者“藏经阁”获取第一手技术干货海量免费电子书下载作者简介作者简介丁威,《RocketMQ技术内幕》作者,RocketMQ官方社区优秀布道师,荣获CSDN2020博客之星亚军;担任......
  • SpringBoot快速整合RabbitMq小案例
    对于一个直接创建的springBoot项目工程来说,可以按照以下步骤使用rabbitmq添加依赖:添加rabbitMQ的依赖。<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-amqp</artifactId></dependency>配置连接:在配置文件中配置虚拟主......
  • c++多线程 std::async std::future
    c++标准库中对线程操作有完善的封装,其中最常用到的如std::thread,std::async。EffectiveModernCpp中指出,应尽量使用std::async即基于任务的编程而非基于线程的编程。std::thread在前面的文章有提到过,此处仅对std::async作以记录。正如前面所说,std::async是基于任务的策略,本人理......
  • ActiveMQ InactivityIOException: Channel was inactive for too (>30000) long 处理
    生产服务器上,MQProduce和consumer端同时报如下错误,导致不能正常工作。Transport(tcp://132.97.122.168:61616)failed,reason:org.apache.activemq.transport.InactivityIOException:Channelwasinactivefortoo(>30000)long:tcp://132.97.122.168:61616,attemptingt......
  • mormot2 model序列和还原
    mormot2model序列和还原unitmormot2.json.serial;///<author>cxg2023-6-4</author>{$Idef.inc}interfaceusesmormot.core.buffers,mormot.core.text,mormot.core.json,mormot.core.base,Classes,SysUtils;type{TSerial}TSerial=c......
  • mormot2 THttpAsyncServer
    mormot2THttpAsyncServer支持delphi和lazarus。///<author>cxg2023-2-12</author>///mormot2异步httpserver支持delphi+lazarusunitsock.mormot2.httpserver;{$IFDEFfpc}{$MODEDELPHI}{$H+}{$ENDIF}interfaceusesclasses,keyValue.serialize,......
  • delphi model序列和还原
    delphimodel序列和还原封装了json和protobuf。unitserialize;///<author>cxg2022-8-30</author>interfaceusesSystem.SysUtils,Grijjy.ProtocolBuffers,System.JSON.Serializers;typeTSerial=classpublic//unmarshalclassfunctionun......