Solidigm

GSchö · ‎02-10-2015

Hi Intel Team,

I am currently dealing with an Intel DC S3500, 240GB to which ~213TB have been written on.

According to the specification TBW is 140TB, so the SSD has tolerated much more data than specified.

I have taken a look at the SMART attributes, to check whether smartd should have notice me:

D# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE

9 Power_On_Hours -O--CK 100 100 000 - 3379 170 Available_Reservd_Space PO--CK 100 100 010 - 0 233 Media_Wearout_Indicator -O--CK 001 001 000 - 0 241 Host_Writes_32MiB -O--CK 100 100 000 - 7010161

Unfortunately media wearout, which is at 1%, has no threshold and no pre-fail set (maybe it would be

an improvement to the firmware to set a threshold). What puzzles me somehow

is that available reserved space is still at 100%, I thought this would go down also if the SSD wears out?

Just to note, the SSD is over provisioned to 100GB.

My questions:

* Is over provisioning the reason, that available reserved space did not decrease? My understanding was, that this

SMART attribute goes down along with media wearout, if more and more data is written to the SSD.

* Since now I used the SMART attributes to monitor the SSDs. Is there any other way to monitor SSDs and get notifications

if the SSD's lifetime is getting shorter? (I know that there's the Intel toolbox, but I would need it mainly for Linux and under some

circumstances along with RAID controllers.)

Thanks a lot for your help,

all the best Georg

jbenavides · ‎02-10-2015

Hello Gschoenberger,

The Endurance Rating (140 TBW) is calculated using Global testing standards, however, Intel® SSD's are expected to exceed those values in almost all cases.

The Media Wearout indicator is a more realistic indicator of the wear of the chips. It declines linearly from 100 to 1 depending on the number of cycles the NAND media has undergone. If the normalized value reaches 1, it means that the average erase cycle count has reached the maximum rated cycles. Although, it is likely that significant additional wear can be put on the drive, as is in this case.

The SMART counters for the previous attributes are based on writes and expected endurance, however, if the drive is healthy and free of errors, it should work well beyond those thresholds.

The Available Reserved Space reports the number of reserved blocks remaining; this is related to over provisioning. This value will decrease if the reserved space is used. In this case, the SSD still shows 100 percent availability of the reserved space.

There are some SMART attributes that are direct indicators of possible issues in the SSD components or functionality. These vary for each SSD series. For the Intel® SSD DC S3500, it would be advised to http://www.intel.com/p/en_US/support/contactsupport contact the Support Center if you notice a continuous increase in the following attributes: Re-allocated Sector Count, Program Fail Count, Erase Fail Count, Power Loss Protection Count, End-to-End Error Detection Count, Uncorrectable Error Count, CRC Error Count.

GSchö · ‎02-11-2015

THX for the quick reply and the explanation of the SMART attributes!

One question is still left:

* Is there a way/program to monitor the SMART attributes and get notifications on errors/wear out?

To give you an example, I have written a Nagios/Icinga plugin to monitor SMART attributes:

* https://git.thomas-krenn.com/dev/?p=check_smart_attributes.git;a=summary https://git.thomas-krenn.com/dev/?p=check_smart_attributes.git;a=summary

* https://www.thomas-krenn.com/en/wiki/SMART_Attributes_Monitoring_Plugin https://www.thomas-krenn.com/en/wiki/SMART_Attributes_Monitoring_Plugin

The plugin uses it's own JSON database to interpret the SMART attributes and return WARNING/CRITICAL if

some values of the attributes reach a certain threshold. I have used your SSD specifications to interpret the SMART

attributes correctly.

Obviously this is a custom solution for Nagios/Icinga, is there a standalone daemon/program to monitor Intel SSDs?

All the best, Georg

jbenavides · ‎02-11-2015

We have two software applications that can be used to monitor and manage the Intel® SSD DC S3500. These are available from the https://downloadcenter.intel.com/SearchResult.aspx?lang=eng&ProdId=3770 Intel® Download Center for the Intel® SSD DC S3500.

- Intel® Solid-State Drive Data Center Tool: is a CLI application available for Windows* and Linux*. It is a drive management tool for Intel® Solid-State Drive Data Center Family of products, it works for both Intel SATA and Intel PCIe* drives.

- Intel® Solid-State Drive Toolbox: is a Windows* application mainly used for mainstream consumer SSD's, and also with the Intel® SSD DC S3500 and S3700 Series.

These tools are not designed to run or notify automatically when a threshold is reached or exceeded. You would have to monitor the SMART attributes of the Solid-State Drives manually, or use 3rd party software to add them into your system routines.

The report will inform if any of the attributes is abnormal and requires attention. When this happens, you should contact the http://www.intel.com/p/en_US/support Intel Customer Support in order to determine if further actions are required.

Solidigm

Intel DC S3500 SMART Attributes if TBW exceeded