锐捷上网行为管理硬盘故障,锐捷超融合服务器硬盘故障
锐捷上网行为管理硬盘故障,锐捷超融合服务器硬盘故障7 1.81360 osd.7 up 1.00000 1.000002 1.81360 osd.2 up 1.00000 1.00000ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY-1 21.76318 root default-2 5.44080 host node-4
我的思路是这样的,首次出现故障是服务器其中一个节点01号硬盘亮红灯,这台超融合服务器一共有4个节点,每个节点都可以独立登录控制。
服务器故障灯
登录节点后发现硬盘的OSD.8已经是down其它为UP,又因为是超融合服务器硬盘内置看不到,无法确定这个故障硬盘在节点的哪个盘
[root@node-1 ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 21.76318 root default
-2 5.44080 host node-4
2 1.81360 osd.2 up 1.00000 1.00000
7 1.81360 osd.7 up 1.00000 1.00000
10 1.81360 osd.10 up 1.00000 1.00000
-3 5.44080 host node-2
1 1.81360 osd.1 up 1.00000 1.00000
5 1.81360 osd.5 up 1.00000 1.00000
11 1.81360 osd.11 up 1.00000 1.00000
-4 5.44080 host node-1
3 1.81360 osd.3 up 1.00000 1.00000
4 1.81360 osd.4 up 1.00000 1.00000
8 1.81360 osd.8 down 0 1.00000
-5 5.44080 host node-3
0 1.81360 osd.0 up 1.00000 1.00000
6 1.81360 osd.6 up 1.00000 1.00000
9 1.81360 osd.9 up 1.00000 1.00000
所以得找出故障硬盘OSD.8的硬盘序列号
继续执行lsblk显示
sdc
sdc1
8:32 0 1.8T 0 disk
8:33 0 1.8T 0 part /var/lib/ceph/osd/ceph-8
确定osd.8在SDC上面
查询SDC盘
[root@node-1 ~]# smartctl -i /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.18.2.el7.x86_64] (local build)
Copyright (C) 2002-13 Bruce Allen Christian Franke www.SMARTmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST2000NM0045
Revision: N003
User Capacity: 2 000 398 934 016 bytes [2.00 TB]
Logical block size: 512 bytes
Logical block provisioning type unreported LBPME=0 LBPRZ=0
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5009490e877
Serial number: ZC21FCNX0000R80458BB
Device type: disk
Transport protocol: SAS
Local Time is: Tue Feb 22 11:11:37 2022 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
确认硬盘型号和序列号(下划线)
到这一步坏盘已经找出来了,但是具体怎么去更换不敢乱动,有大神指教下的吗?关注留言后续处理情况。