显卡服务器中一个显卡崩溃了:
May 16 05:38:58 dell kernel: [14244871.006970] NVRM: Xid (PCI:0000:b1:00): 13, pid=1375637, Graphics SM Warp Exception on (GPC 0, TPC 0, SM 0): Illegal Instruction Encoding
May 16 05:38:58 dell kernel: [14244871.010256] NVRM: Xid (PCI:0000:b1:00): 13, pid=1375637, Graphics Exception: ESR 0x504730=0x30009 0x504734=0x0 0x504728=0x4c1eb72 0x50472c=0x174
个人估计是显卡过热导致的。找到一个解决方法:
sudo nvidia-smi -pl 150 # 把功率限制从默认的250W调整到150W
参考:
[杂记] Nvidia-smi显卡丢失以及GPU Fan显示ERR!
https://www.cnblogs.com/jins-note/p/11948927.html
=========================================
=====================================
标签:00,Errors,Xid,pid,PCI,b1,SM From: https://www.cnblogs.com/devilmaycry812839668/p/17409542.html