02-15-2018 02:10 AM
Dear,
We have installed dozens of DC S4500 SSD in our servers. Several of them triggers smart attributes alert after warm up stress test.
Here is the output from Intel Datacenter tool of linux:
$> sudo isdct show -smart --intelssd 2
**********Some lines are skipped*************
- C5 -
Action : Pass
Description : Pending Sector Count
ID : C5
Normalized : 100
Raw : 12
Status : 18
Threshold : 0
Worst : 100
**********Some lines are skipped*************
Does raw value of smart attribute C5 indicate healthy problem of SSD? Is so, is file system resides on those SSDs are corrupted when these sectors are hit?
Thanks a lot!
Below is firmware version:
$> sudo isdct show -intelssd 2
- Intel SSD DC S4500 Series PHYS738000YP480BGN -
Bootloader : Property not found
DevicePath : /dev/sg2
DeviceStatus : Healthy
Firmware : SCV10100
FirmwareUpdateAvailable : SCV10111
Index : 2
ModelNumber : INTEL SSDSC2KB480G7
ProductFamily : Intel SSD DC S4500 Series
SerialNumber : PHYS738000YP480BGN
02-15-2018 11:00 AM
Hello vanbashan,
Thank you for your interest in the Intel® SSD DC S4500 Series. According to the Intel® Solid State Drive Data Center for SATA SMART Attributes – Application Note (https://www.intel.com/content/dam/support/us/en/documents/solid-state-drives/Intel_SSD_Smart_Attrib_... https://www.intel.com/content/dam/support/us/en/documents/solid-state-drives/Intel_SSD_Smart_Attrib_...), attribute C5 (pending sector count): Shows the number of current unrecoverable read errors that will be reallocated on next write. Regarding the properties that you shared, please notice that unless the Raw property of the device tends to the Worst property (in this case 100) at an abnormal pace, we can talk about an expected degradation of the SSD. In order to fully define what "abnormal pace" means, details such as workload, and system configuration must be taken into account. Your DeviceStatus appears as Healthy, and it is your best reference regarding the current state of the drive. I'm not sure what you mean by "is file system resides on those SSDs are corrupted when these sectors are hit?" Could you please rephrase your question? If you have further questions, please update your firmware version to SCV10111, run the test again and share the output from the Intel® SSD Data Center Tool version 3.0.10 (https://downloadcenter.intel.com/download/27497?v=t https://downloadcenter.intel.com/download/27497?v=t).Also, could you please provide more details regarding the tasks involved in the warm up stress test? What is the percentage of SSDs that have triggered the SMART attribute alert? I'll be waiting for your response.Regards,Andres V.02-15-2018 06:41 PM
Hi, Andres,
Thanks for your quick reply.
The warm up test is just fill up whole space once and then read out data and verify its checksum by FIO.
We observed 25% of drives (5 drives out of 20) with pending sectors above 0. Besides it, raw values of reallocated sector (0x5) are above 0.
"I'm not sure what you mean by "is file system resides on those SSDs are corrupted when these sectors are hit?" Could you please rephrase your question?"
If some sectors are unrecoverable when read, there should be data lost. Is my understanding correct?
However, I did not meet any application error right now. And these drives passed smartctl extended self test without any error. Is it a firmware bug?
I'd like to try upgrading firmware to see whether the issue can be solved. Is there any place to check firmware release notes of SCV10111 before upgrade?
02-16-2018 08:51 AM
Hello vanbashan,
Thank you for the information provided. Loss of data due to unrecoverable sectors is always a possibility, due to this, frequent data back-up and RAID configurations are always recommended. Regarding your inquiry about the firmware release notes, the available information regarding firmware version SCV1011 can be found in the latest Intel® Solid State Drive Data Center Tool – Release Notes (https://downloadmirror.intel.com/27497/eng/Intel_SSD_Data_Center_Tool_3_0_10_Release_Notes_330715-02... https://downloadmirror.intel.com/27497/eng/Intel_SSD_Data_Center_Tool_3_0_10_Release_Notes_330715-02...😞 The following changes are included in this firmware update:Regards,
Andres V.02-16-2018 06:40 PM
Some disks with pending sector count (C5) above zero are failed without any response. I have to return them to resellers... 😞