[Logwatch-Devel] RAID errors

Paweł Gołaszewski blues at ds.pg.gda.pl
Fri Aug 6 00:38:37 MST 2004


On Thu, 5 Aug 2004, Kirk Bauer wrote:
> > I've got in my logs this report:
> >
> > Aug  4 00:23:49 raid1: Disk failure on sda5, disabling device.
> > Aug  4 00:23:49 md: (skipping faulty sda5 )
> > Aug  4 00:23:49 md3: no spare disk to reconstruct array! -- continuing in degraded mode
> >
> > Problem is that it was outside any filter. It's not matched anywhere
> > else and is extremely important. What can we do about that?
> That's strange that it isn't reported as a 'kernel' message.

I know it's strange....

> The 'raid' filter seems to assume that.  Unfortunately, with messages
> related to 'raid*' and 'md*' those are a lot of "services" to match for.

raid and md messages should be separated into brand new filter, I think...
This is nice, stand-alone and big new facility. Extremelly important...
Services, such a mdadm, should have their own filter too...

> I actually had thought about writing a filter to parse /proc/mdstat and
> report any errors.  But I have limited data to work with... here is what
> mine says:
> 
> # cat /proc/mdstat
> Personalities : [raid5]
> read_ahead 1024 sectors
> md0 : active raid5 sdd1[2] sde1[1] sdc1[0]
>       97691008 blocks level 5, 32k chunk, algorithm 2 [3/3] [UUU]
> 
> unused devices: <none>

If you want I can give you few entries from my servers...

Personalities : [raid0] [raid1] 
md5 : active raid1 sda6[0] sdb6[1]
      47062272 blocks [2/2] [UU]
      
md4 : active raid0 sda5[0] sdb5[1]
      81915136 blocks 32k chunks
      
md3 : active raid1 hda3[0] hdb3[1]
      96213184 blocks [2/2] [UU]
      
md2 : active raid1 sda3[0] sdb3[1]
      20482752 blocks [2/2] [UU]
      
md1 : active raid1 hda1[0] hdb1[1]
      20482752 blocks [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1]
      8193024 blocks [2/2] [UU]
      
unused devices: <none>


Personalities : [linear] [raid0] [raid1] [raid5] 
md1 : active raid1 sdb2[1] sda2[0]
      3068288 blocks [2/2] [UU]
      
md2 : active raid1 sdb3[1] sda3[0]
      15358016 blocks [2/2] [UU]
      
md3 : active raid1 sdb5[1] sda5[0]
      13823808 blocks [2/2] [UU]
      
md4 : active raid1 sdb6[1] sda6[0]
      521984 blocks [2/2] [UU]
      
md0 : active raid1 sdb1[1] sda1[0]
      3068288 blocks [2/2] [UU]
      
unused devices: <none>


Personalities : [raid0] [raid1] 
read_ahead 1024 sectors
md0 : active raid1 sda1[1] sdb1[0]
      2562240 blocks [2/2] [UU]
      
md1 : active raid1 sdb2[0] sda2[1]
      1534080 blocks [2/2] [UU]
      
md2 : active raid1 sda3[1] sdb3[0]
      522048 blocks [2/2] [UU]
      
md3 : active raid1 sda5[1] sdb5[0]
      4265152 blocks [2/2] [UU]
      
unused devices: <none>



This one is with one faulty drive:
Personalities : [raid0] [raid1] 
read_ahead 1024 sectors
md0 : active raid1 sda1[1] sdb1[0]
      2562240 blocks [2/2] [UU]
      
md1 : active raid1 sdb2[0] sda2[1]
      1534080 blocks [2/2] [UU]
      
md2 : active raid1 sda3[1] sdb3[0]
      522048 blocks [2/2] [UU]
      
md3 : active raid1 sdb5[0]
      4265152 blocks [2/1] [U_]
      
unused devices: <none>

...and the same before hot-removing faulty drive:
Personalities : [raid0] [raid1] 
read_ahead 1024 sectors
md0 : active raid1 sda1[1] sdb1[0]
      2562240 blocks [2/2] [UU]
      
md1 : active raid1 sdb2[0] sda2[1]
      1534080 blocks [2/2] [UU]
      
md2 : active raid1 sda3[1] sdb3[0]
      522048 blocks [2/2] [UU]
      
md3 : active raid1 sda5[1](F) sdb5[0]
      4265152 blocks [2/1] [U_]
      
unused devices: <none>



And this is hardware raid with one faulty drive. I think it should be 
reported too...:
# ls -l /proc/rd/
razem 0
dr-xr-xr-x    2 root     proc            0 08-06 09:34 c0
-r--r--r--    1 root     proc            0 08-06 09:34 status
# cat /proc/rd/status 
ALERT

c0 tells me that only one controller is present. ALERT shows that there is 
problem with something.... Details?
# ls -l /proc/rd/c0/
-r--r--r--    1 root     proc            0 08-06 09:35 current_status
-r--r--r--    1 root     proc            0 08-06 09:35 initial_status
-rw-------    1 root     proc            0 08-06 09:35 user_command
# cat /proc/rd/c0/current_status 
***** DAC960 RAID Driver Version 2.5.47 of 14 November 2002 *****
Copyright 1998-2001 by Leonard N. Zubkoff <lnz at dandelion.com>
Configuring Mylex AcceleRAID 352 PCI RAID Controller
  Firmware Version: 6.00-01, Channels: 2, Memory Size: 64MB
  PCI Bus: 2, Device: 10, Function: 1, I/O Address: Unassigned
  PCI Address: 0xFD360000 mapped at 0xF8809000, IRQ Channel: 20
  Controller Queue Depth: 512, Maximum Blocks per Command: 2048
  Driver Queue Depth: 511, Scatter/Gather Limit: 128 of 257 Segments
  Physical Devices:
    0:0  Vendor: IBM       Model: IC35L073UCDY10-0  Revision: S21E
         Wide Synchronous at 160 MB/sec
         Serial Number:         E6V12VYB
         Disk Status: Online, 143339520 blocks
    0:1  Vendor: IBM       Model: IC35L073UCDY10-0  Revision: S21E
         Wide Synchronous at 160 MB/sec
         Serial Number:         E6V12RDB
         Disk Status: Online, 143339520 blocks
    0:6  Vendor: ESG-SHV   Model: SCA HSBP M14      Revision: 0.03
         Asynchronous
    0:7  Vendor: MYLEX     Model: AcceleRAID 352    Revision: 0600
         Wide Synchronous at 160 MB/sec
         Serial Number:   
    1:0  Vendor:           Model:                   Revision:     
         Wide Synchronous at 160 MB/sec
         Disk Status: Dead, 71356416 blocks
    1:1  Vendor: IBM       Model: DDYS-T36950M      Revision: S96H
         Wide Synchronous at 160 MB/sec
         Serial Number:         DZLF6660
         Disk Status: Online, 71651328 blocks
    1:2  Vendor: IBM       Model: DDYS-T36950M      Revision: S96H
         Wide Synchronous at 160 MB/sec
         Serial Number:         DZLHG902
         Disk Status: Online, 71651328 blocks
    1:6  Vendor: ESG-SHV   Model: SCA HSBP M14      Revision: 0.03
         Asynchronous
    1:7  Vendor: MYLEX     Model: AcceleRAID 352    Revision: 0600
         Wide Synchronous at 160 MB/sec
         Serial Number:   
  Logical Drives:
    /dev/rd/c0d0: RAID-5, Critical, 142712832 blocks
                  Logical Device Initialized, BIOS Geometry: 255/63
                  Stripe Size: 64KB, Segment Size: 8KB
                  Read Cache Disabled, Write Cache Disabled
    /dev/rd/c0d1: RAID-0, Online, 286679040 blocks
                  Logical Device Uninitialized, BIOS Geometry: 255/63
                  Stripe Size: 64KB, Segment Size: 8KB
                  Read Cache Disabled, Write Cache Disabled
  No Rebuild or Consistency Check in Progress
# cat /proc/rd/c0/initial_status 
***** DAC960 RAID Driver Version 2.5.47 of 14 November 2002 *****
Copyright 1998-2001 by Leonard N. Zubkoff <lnz at dandelion.com>
Configuring Mylex AcceleRAID 352 PCI RAID Controller
  Firmware Version: 6.00-01, Channels: 2, Memory Size: 64MB
  PCI Bus: 2, Device: 10, Function: 1, I/O Address: Unassigned
  PCI Address: 0xFD360000 mapped at 0xF8809000, IRQ Channel: 20
  Controller Queue Depth: 512, Maximum Blocks per Command: 2048
  Driver Queue Depth: 511, Scatter/Gather Limit: 128 of 257 Segments
  Physical Devices:
    0:0  Vendor: IBM       Model: IC35L073UCDY10-0  Revision: S21E
         Wide Synchronous at 160 MB/sec
         Serial Number:         E6V12VYB
         Disk Status: Online, 143339520 blocks
    0:1  Vendor: IBM       Model: IC35L073UCDY10-0  Revision: S21E
         Wide Synchronous at 160 MB/sec
         Serial Number:         E6V12RDB
         Disk Status: Online, 143339520 blocks
    0:6  Vendor: ESG-SHV   Model: SCA HSBP M14      Revision: 0.03
         Asynchronous
    0:7  Vendor: MYLEX     Model: AcceleRAID 352    Revision: 0600
         Wide Synchronous at 160 MB/sec
         Serial Number:   
    1:0  Vendor:           Model:                   Revision:     
         Wide Synchronous at 160 MB/sec
         Disk Status: Dead, 71356416 blocks
    1:1  Vendor: IBM       Model: DDYS-T36950M      Revision: S96H
         Wide Synchronous at 160 MB/sec
         Serial Number:         DZLF6660
         Disk Status: Online, 71651328 blocks
    1:2  Vendor: IBM       Model: DDYS-T36950M      Revision: S96H
         Wide Synchronous at 160 MB/sec
         Serial Number:         DZLHG902
         Disk Status: Online, 71651328 blocks
    1:6  Vendor: ESG-SHV   Model: SCA HSBP M14      Revision: 0.03
         Asynchronous
    1:7  Vendor: MYLEX     Model: AcceleRAID 352    Revision: 0600
         Wide Synchronous at 160 MB/sec
         Serial Number:   
  Logical Drives:
    /dev/rd/c0d0: RAID-5, Critical, 142712832 blocks
                  Logical Device Initialized, BIOS Geometry: 255/63
                  Stripe Size: 64KB, Segment Size: 8KB
                  Read Cache Disabled, Write Cache Disabled
    /dev/rd/c0d1: RAID-0, Online, 286679040 blocks
                  Logical Device Uninitialized, BIOS Geometry: 255/63
                  Stripe Size: 64KB, Segment Size: 8KB
                  Read Cache Disabled, Write Cache Disabled

user-command is for management.

-- 
pozdr.  Pawe³ Go³aszewski 
---------------------------------
My jsme borgové. Odpor je marný, budete asimilováni...


More information about the Logwatch-Devel mailing list