HDD死亡のメモ(/dev/hda) IDE
Logwatchでの検出
--------------------- Kernel Begin ------------------------ WARNING: Kernel Errors Present EXT3-fs error (device ide0(3,7 ...: 384 Time(s) end_request: I/O error, dev 03:07 (hda) ...: 280 Time(s) hda: dma_intr: error=0x40 { Uncorrect ...: 280 Time(s) hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } ...: 280 Time(s) ---------------------- Kernel End -------------------------
smartmontoolsでのroot宛メール
Subject: SMART error (OfflineUncorrectableSector) detected on host: example.jp This email was generated by the smartd daemon running on: host name: example.jp DNS domain: my.domain NIS domain: (none) The following warning/error was logged by the smartd daemon: Device: /dev/hda, 1 Offline uncorrectable sectors For details see host's SYSLOG (default: /var/log/messages). You can also use the smartctl utility for further investigation. No additional email messages about this problem will be sent.
smartmontools Health check error
Subject: SMART error (Health) detected on host: example.jp This email was generated by the smartd daemon running on: host name: example.jp DNS domain: my.domain NIS domain: (none) The following warning/error was logged by the smartd daemon: Device: /dev/hda, FAILED SMART self-check. BACK UP DATA NOW! For details see host's SYSLOG (default: /var/log/messages). You can also use the smartctl utility for further investigation. No additional email messages about this problem will be sent.
取り外す前の最後のsmartの値(fsck後)
# /usr/sbin/smartctl -a /dev/hda smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family Device Model: ST3120026AS Serial Number: 3JT46EGC Firmware Version: 3.18 User Capacity: 120,034,123,776 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 Local Time is: Wed Jul 23 06:15:37 2008 JST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 85) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 054 050 006 Pre-fail Always - 153397814 3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 4 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 7 Seek_Error_Rate 0x000f 086 060 030 Pre-fail Always - 449964338 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18942 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 7 194 Temperature_Celsius 0x0022 044 053 000 Old_age Always - 44 195 Hardware_ECC_Recovered 0x001a 054 050 000 Old_age Always - 153397814 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 127 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 127 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 2006 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2006 occurred at disk power-on lifetime: 18941 hours (789 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 c2 32 ff e0 Error: UNC at LBA = 0x00ff32c2 = 16724674 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 00 02 c2 32 ff e0 00 21:20:57.513 READ SECTOR(S) EXT 24 00 18 88 33 ff e0 00 21:20:57.510 READ SECTOR(S) EXT 24 00 80 08 33 ff e0 00 21:20:57.506 READ SECTOR(S) EXT 24 00 02 c2 32 ff e0 00 21:20:57.502 READ SECTOR(S) EXT 24 00 44 c4 32 ff e0 00 21:20:57.499 READ SECTOR(S) EXT Error 2005 occurred at disk power-on lifetime: 18941 hours (789 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 c2 32 ff e0 Error: UNC at LBA = 0x00ff32c2 = 16724674 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 00 02 c2 32 ff e0 00 21:20:57.513 READ SECTOR(S) EXT 24 00 44 c4 32 ff e0 00 21:20:57.510 READ SECTOR(S) EXT 24 00 50 b8 32 ff e0 00 21:20:57.506 READ SECTOR(S) EXT 24 00 18 a0 58 fb e0 00 21:20:57.502 READ SECTOR(S) EXT 24 00 18 b0 57 fb e0 00 21:20:57.499 READ SECTOR(S) EXT Error 2004 occurred at disk power-on lifetime: 18941 hours (789 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 0a c2 32 ff e0 Error: UNC at LBA = 0x00ff32c2 = 16724674 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 00 50 b8 32 ff e0 00 21:20:57.513 READ SECTOR(S) EXT 24 00 18 a0 58 fb e0 00 21:20:57.510 READ SECTOR(S) EXT 24 00 18 b0 57 fb e0 00 21:20:57.506 READ SECTOR(S) EXT 24 00 18 c0 56 fb e0 00 21:20:57.502 READ SECTOR(S) EXT 24 00 18 d8 55 fb e0 00 21:20:57.499 READ SECTOR(S) EXT Error 2003 occurred at disk power-on lifetime: 18941 hours (789 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 c6 32 f7 e0 Error: UNC at LBA = 0x00f732c6 = 16200390 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 00 02 c6 32 f7 e0 00 21:20:12.376 READ SECTOR(S) EXT 24 00 02 c2 32 f7 e0 00 21:20:52.262 READ SECTOR(S) EXT 24 00 04 c0 32 f7 e0 00 21:20:48.435 READ SECTOR(S) EXT 24 00 18 88 33 f7 e0 00 21:20:41.941 READ SECTOR(S) EXT 24 00 80 08 33 f7 e0 00 21:20:41.937 READ SECTOR(S) EXT Error 2002 occurred at disk power-on lifetime: 18941 hours (789 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 c2 32 f7 e0 Error: UNC at LBA = 0x00f732c2 = 16200386 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 24 00 02 c2 32 f7 e0 00 21:20:12.376 READ SECTOR(S) EXT 24 00 04 c0 32 f7 e0 00 21:20:12.361 READ SECTOR(S) EXT 24 00 18 88 33 f7 e0 00 21:20:48.435 READ SECTOR(S) EXT 24 00 80 08 33 f7 e0 00 21:20:41.941 READ SECTOR(S) EXT 34 00 08 c0 32 bb e0 00 21:20:41.937 WRITE SECTORS(S) EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 18938 188166888 # 2 Short offline Completed: read failure 90% 18914 188166888 # 3 Short offline Completed: read failure 90% 18891 188166888 # 4 Short offline Completed: read failure 90% 18867 188166888 # 5 Extended offline Completed: read failure 90% 18866 261165 # 6 Short offline Completed without error 00% 18843 - # 7 Short offline Completed without error 00% 18820 - # 8 Short offline Completed without error 00% 18796 - # 9 Short offline Completed without error 00% 18773 - #10 Short offline Completed without error 00% 18749 - #11 Short offline Completed without error 00% 18726 - #12 Short offline Completed without error 00% 18702 - #13 Extended offline Completed without error 00% 18702 - #14 Short offline Completed without error 00% 18678 - #15 Short offline Completed without error 00% 18655 - #16 Short offline Completed without error 00% 18632 - #17 Short offline Completed without error 00% 18608 - #18 Short offline Completed without error 00% 18584 - #19 Short offline Completed without error 00% 18561 - #20 Short offline Completed without error 00% 18537 - #21 Extended offline Completed without error 00% 18537 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
バックアップは大事だね