首页 > 其他分享 >Test kvm guest watchdog device

Test kvm guest watchdog device

时间:2023-01-06 19:31:58浏览次数:49  
标签:guest kvm system dev hardware timer Test watchdog

watchdog 是什么

A Watchdog Timer (WDT) is a hardware circuit that can reset the computer system in case of a software fault. You probably knew that already. Usually a userspace daemon will notify the kernel watchdog driver via the /dev/watchdog special device file that userspace is still alive, at regular intervals. When such a notification occurs, the driver will usually tell the hardware watchdog that everything is in order, and that the watchdog should wait for yet another little while to reset the system. If userspace fails (RAM error, kernel bug, whatever), the notifications cease to occur, and the hardware watchdog will reset the system (causing a reboot) after the timeout occurs.

看门狗是一种监控系统的运行状况的手段,通过软硬件结合的方式实现对系统运行状况的监控。稳定运行的软件会在执行完特定指令后进行喂狗,若在一定周期内看门狗没有收到来自软件的喂狗信号,则认为系统故障,会进入中断处理程序或强制系统复位。

在Linux 内核下, watchdog的基本工作原理是:当watchdog启动后(即/dev/watchdog 设备被打开后),如果在某一设定的时间间隔内/dev/watchdog没有被执行写操作, 硬件watchdog电路或软件定时器就会重新启动系统。/dev/watchdog 是一个主设备号为10, 从设备号130的字符设备节点。 Linux内核不仅为各种不同类型的watchdog硬件电路提供了驱动,还提供了一个基于定时器的纯软件watchdog驱动。

watchdog的生效,需要硬件和软件的支持。

The feature relies on two components: • A hardware component that sets a 30-seconds timer (your seconds value might be different). When the timer expires, the component triggers a system restart. On bare-metal machines, a chipset provides the feature. • Run by the operating system, a software component regularly resets the hardware timer to prevent it from expiring. When the operating system hangs, the software component also hangs and cannot refresh the timer. The timer expires and the system restarts. Watchdogs only monitor operating systems and do not detect application failures. For example, the watchdog does not trigger when the application hangs, but the operating system is still responding.

vm的watchdog设备(硬件):

x86_64 的vm当前支持两种型号的watchdog, i6300esb 和 ib700:

<devices>
    <watchdog model='i6300esb/ib700' action='poweroff/reset/shutdown/poweroff/pause/none/dump/inject-nmi'/>
  </devices>

action 表示watchdog被触发以后,guest的操作,可以是关机,重启,暂停,可以dump,也可以inject-nmi,或者什么也不做;

linux系统对vm的watchdog的软件支持(软件):

1. watchdog device node

linux系统启动后就会有一个watchdog的字符设备,不管硬件的watchdog是否存在。这个叫Watchdog device node。如果不存在可以通过mknod来创建:

# ll /dev/watchdog
crw-------. 1 root root 10, 130 Jan  6 17:36 /dev/watchdog
## You can recreate it if it not exists.
# mknod /dev/watchdog c 10 130

2. watchdog硬件设备的驱动

# lspci | grep -i 6300
08:01.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer
# lspci -vvv -s 08:01.0 | grep "Kernel modules"
	Kernel modules: i6300esb

怎样触发guest的watchdog action

以下面的设备为例,有两种方式可以触发watchdog action,来验证硬件是否生效。

<devices>
    <watchdog model='i6300esb' action='reset'/>
  </devices>

Trigger方法1:

使用其他的进程,比如cat, echo写入 /dev/watchdog 设备。

if an application opens this device file, it becomes responsible of the watchdog, and can reset it by writing to the file. The system(watchdog daemon) periodically keeps writing to /dev/watchdog. It is also called “kicking or feeding the watchdog”. If the system fails to kick or feed the watchdog, then after a while the system is hard reset by the hardware watchdog.

在guest执行:

# echo 1 > /dev/watchdog
[   21.608056] watchdog: watchdog0: watchdog did not stop!

# cat >> /dev/watchdog

到了30s会自动重启。不需要安装任何软件或启动任何服务。

Trigger方法2:

某一设定的时间间隔内/dev/watchdog没有被执行写操作,会trigger watchdog。

if no application opens the /dev/watchdog file, then the kernel takes care of resetting the watchdog(when watchdog daemon is active and running). The watchdog module is a timer, it won't appear as a dedicated kernel thread, but handled by the soft IRQ thread.

  1. 在guest中先安装watchdog的软件:
# yum install watchdog   
  1. 配置/etc/watchdog.conf:
# cat /etc/watchdog.conf:
# Interval between tests. Should be a couple of seconds shorter than
# the hardware time-out value.
interval = 61  //should be larger than 60.

interval 是指 kernel 两次喂狗的间隔,应该小于硬件的time-out的值,否则就会喂狗失败,触发watchdog的action。因为默认的hardware time-out时间是60s。所以我们设置61s,就会trigger watchdog。

If the device is opened but not written to within a minute, the machine will reboot.

  1. 在guest中启动watchdog的进程,等待61s:
# systemctl start watchdog
Job for watchdog.service failed because the control process exited with error code.
See "systemctl status watchdog.service" and "journalctl -xeu watchdog.service" for details.
# watchdog
watchdog: This interval length (61) might reboot the system while the process sleeps! Try 59 or less
watchdog: To force parameter(s) use the --force command line option.
# watchdog -f

到了61s会自动重启。

Refer to: https://libvirt.org/formatdomain.html#watchdog-device http://junyelee.blogspot.com/2021/07/linux-watchdog.html#:~:text=The%20watchdog%20is%20automatically%20started,its%20configuration%20file%20in%20turn. https://mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt

标签:guest,kvm,system,dev,hardware,timer,Test,watchdog
From: https://blog.51cto.com/u_15288977/5994523

相关文章

  • CentOS7.9下配置安装KVM虚拟机
     一、准备工作:1.关闭selinux、防火墙##关闭selinux#sed-i's/SELINUX=enforcing/SELINUX=disabled/g'/etc/selinux/config#永久生效,但是必须要重启系统。##立......
  • CodeArts TestPlan:一站式测试管理平台
    摘要:华为云正式发布CodeArtsTestPlan,这是一款自主研发的一站式测试管理平台,沉淀了华为30多年高质量的软件测试工程方法与实践,覆盖测试计划、测试设计、测试执行和测试评估......
  • AtCoder Beginner Contest 132
    AtCoderBeginnerContest132https://atcoder.jp/contests/abc132持续被暴打的一天,因为晚上要打cf,所以明天再来写总结。悲ct就是菜鸟newbie......
  • Codeforces Contest 1616
    A.IntegerDiversity直接用个map贪心,如果有相同的就反向即可。B.MirrorintheString这道题洛谷的翻译锅了,所以建议去看原题。考虑这样一个字符串baacc,那么答案显......
  • 孤独的照片【USACO 2021 December Contest Bronze】
    孤独的照片FarmerJohn最近购入了\(N\)头新的奶牛,每头奶牛的品种是更赛牛(Guernsey)或荷斯坦牛(Holstein)之一。奶牛目前排成一排,FarmerJohn想要为每个连续不少于三头奶......
  • pytest测试使用
    固件:pytest.fixture()定义固件pytest会在执行测试函数之前(或之后)加载运行它们,最常见的可能就是数据库的初始连接和最后关闭操作标记:pytest.mark.***pytest会在执行......
  • USACO 2020 January Contest, Silver
    USACO2020JanuaryContest,Silver1.BerryPicking题目意思给定\(n\)颗树,分别有\(a_i\)个果子,求选出\(m\)篮果子使得最少的\(\frac{m}{2}\)篮最多,要求每篮的......
  • AtCoder Beginner Contest 131
    AtCoderBeginnerContest131https://atcoder.jp/contests/abc1314/6:ABCDA-Security水题#include<bits/stdc++.h>usingnamespacestd;signedmain(){......
  • 简易PC基准测试丨PerformanceTest功能简介
    简易PC基准测试1、将您的PC的性能与世界各地的类似计算机进行比较。2、衡量配置更改和硬件升级的效果。3、自1998年以来的行业标准。您可以直接从USB驱动......
  • kvm guest 设置hugepages
    Host支持两种大小的hugepage:2MiB,1GiB.默认使用的页面大小是4KiB.Hugepage会立即分配,并不会swapout。2MiB的hugepage设置host的大页,使用virsh命令即可,然后guest......