07-31-2015 04:07 AM
Hi,
I have a new Linux machine with two DC S3610 1.6TB SSDs. It's Debian jessie so kernel 3.6.17. Since around one month after installation these errors started appearing:
Jul 30 16:30:59 snaps kernel: [186914.249429] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
Jul 30 16:30:59 snaps kernel: [186914.250465] ata1.00: failed command: WRITE FPDMA QUEUED
Jul 30 16:30:59 snaps kernel: [186914.251505] ata1.00: cmd 61/08:00:39:db:8e/00:00:09:00:00/40 tag 0 ncq 4096 out
Jul 30 16:30:59 snaps kernel: [186914.251505] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 30 16:30:59 snaps kernel: [186914.253613] ata1.00: status: { DRDY }
Jul 30 16:30:59 snaps kernel: [186914.254781] ata1.00: failed command: WRITE FPDMA QUEUED
Jul 30 16:30:59 snaps kernel: [186914.255810] ata1.00: cmd 61/08:08:71:fc:4e/00:00:66:00:00/40 tag 1 ncq 4096 out
Jul 30 16:30:59 snaps kernel: [186914.255810] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 30 16:30:59 snaps kernel: [186914.257940] ata1.00: status: { DRDY }
Jul 30 16:30:59 snaps kernel: [186914.259086] ata1: hard resetting link
Jul 30 16:31:00 snaps kernel: [186914.577366] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 30 16:31:00 snaps kernel: [186914.578307] ata1.00: configured for UDMA/133
Jul 30 16:31:00 snaps kernel: [186914.578310] ata1.00: device reported invalid CHS sector 0
Jul 30 16:31:00 snaps kernel: [186914.578311] ata1.00: device reported invalid CHS sector 0
Jul 30 16:31:00 snaps kernel: [186914.578316] ata1: EH complete
The error is always the same, and the only thing on ata1.00 is one of the SSDs. I switched the two SSDs around and the problem followed the same SSD.
I can't force the error to happen on demand, it just seems to happen every other day or so, though not at the same time of day. All IO is held up briefly while the link is reset. The drive passes a SMART long self-test.
So is this drive faulty? If not, what can I try to fix this? If so, is there an easy way to prove it for RMA purposes?
Jul 27 05:59:30 snaps kernel: [ 33.054376] ata1.00: ATA-9: INTEL SSDSC2BX016T4, G2010110, max UDMA/133
Jul 27 05:59:30 snaps kernel: [ 33.054474] ata1.00: 3125627568 sectors, multi 1: LBA48 NCQ (depth 31/32)
Jul 27 05:59:30 snaps kernel: [ 33.054567] ata2.00: ATA-9: INTEL SSDSC2BX016T4, G2010110, max UDMA/133
Jul 27 05:59:30 snaps kernel: [ 33.054657] ata2.00: 3125627568 sectors, multi 1: LBA48 NCQ (depth 31/32)
$ sudo smartctl -i /dev/sda
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BX016T4
Serial Number: BTHC511604V41P6PGN
LU WWN Device Id: 5 5cd2e4 04b7b1bfa
Firmware Version: G2010110
User Capacity: 1,600,321,314,816 bytes [1.60 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 31 11:04:09 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
$ sudo smartctl -i /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BX016T4
Serial Number: BTHC511604SD1P6PGN
LU WWN Device Id: 5 5cd2e4 04b7b1ba2
Firmware Version: G2010110
User Capacity: 1,600,321,314,816 bytes [1.60 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 31 11:04:35 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Message was edited by: Andy Smith Now seeing same problems with other SSD, so this is not restricted to a single drive.
09-17-2015 12:37 AM
Our 5 machines with S3610 SSDs have now been running for over two weeks with the new firmware (rev G2010140), and no re-occurrence of this issue during that time.
Regards,
Daniel
09-18-2015 03:40 PM
We are glad to know that the firmware update helped to resolve this condition. We would like to thank you for your patience and feedback on this matter.
11-12-2015 07:40 AM
Hi Jonathan,
I am currently also dealing with problems concerning the task aborts and DC S3710 SSDs. I have updated to G2010140 today and will inspect the system tomorrow.
Just on question - why is there no official information from Intel (e.g. release notes) about these problems?
THX a lot, cheers Georg
11-12-2015 11:51 AM
Hello,
The details about this fix were not added to the Release notes of the Firmware revision G2010140. We appreciate your feedback and would like to apologize for this situation.
The original issue mentioned in this thread was quickly solved with the fix added to the new firmware just before its release, so this was not documented in the https://downloadmirror.intel.com/18363/eng/Intel_SSD_Firmware_Update_Tool_2.1.0_Release_Notes_328292... Intel® SSD Firmware Release Notes.
11-12-2015 11:22 PM
OK good to know, even if I think release notes are the best place to mention such issues.
Now do you have any other Intel sources (blog posting, wiki article, white paper) where you go deeper into the firmware update an the task abort issues?
Just FYI, we need some kind of authentic Intel source for our customers to prove that a firmware update is necessary.
Cheers, Georg