01-12-2017 04:25 PM
About 6 months ago I bought the Intel SSD 750 400GB, and have been using it for various database-related benchmarking tasks and such. It was working fine until this week, when the kernel suddenly started reporting strange issues about aborted commands:
Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 0 QID 12 timeout, aborting
Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 1 QID 12 timeout, aborting
Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 2 QID 12 timeout, aborting
Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 3 QID 12 timeout, aborting
Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:27 bench2 kernel: nvme nvme0: Abort status: 0x0
Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
...
Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 196 QID 12 timeout, aborting
Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 212 QID 12 timeout, aborting
Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 273 QID 12 timeout, aborting
Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 275 QID 12 timeout, aborting
Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
...
Jan 12 13:16:59 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:16:59 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:16:59 bench2 kernel: nvme nvme0: completing aborted command with status: 0000
Jan 12 13:17:00 bench2 kernel: nvme nvme0: completing aborted command with status: fffffffc
Jan 12 13:17:00 bench2 kernel: blk_update_request: I/O error, dev nvme0n1, sector 422162944
Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770079, lost async page write
Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770080, lost async page write
Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770081, lost async page write
Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770082, lost async page write
Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770083, lost async page write
I'm regularly testing new kernels / distributions, so at first I thought it's a bug in one of these, but after a lot of experiments I doubt that - I can reproduce the same issue even with older kernels that I've used without any issue.
Interestingly enough, this only affects writes - the reads seem to be working just fine (easily >2GB/s in sequential workload), but only 2MB/s in writes. Not a filesystem issue either - this happens even with simple dd writing /dev/nvme0n1 directly.
I've tried to install the newest firmware using the isdct tool (v 3.0.0), and `isdct show` now reports this:
[root@bench2 ~]# isdct show -a -intelssd 0
- Intel SSD 750 Series CVCQ55020067400AGN -
AggregationThreshold : 0
AggregationTime : 0
ArbitrationBurst : 0
Bootloader : 8B1B0131
CoalescingDisable : 1
DevicePath : /dev/nvme0n1
Device...
Solved! Go to Solution.
01-13-2017 07:34 AM
Hello Tomas_V,
Thanks for posting in our forum. We would like to review the information you've sent to us and try to replicate the situation.In the meantime, we can recommend you to install the latest firmware update, which you can find https://downloadcenter.intel.com/download/26491/Intel-SSD-Firmware-Update-Tool here.The other program says you have the latest version, but it is because that one does not include the latest one. FW: 8EV101F0 with Bootloader 8B1B0133Let us know if after the firmware update it fixes, if not we will be checking the information provided.Regards,NC01-22-2017 03:53 AM
Hi, thanks. I've been running some tests on the SSD the whole week, and it seems to be working fine. So the firmware upgrade likely resolved the issues.
Thanks!
01-23-2017 05:48 AM
Hi Tomas_V,
Those are great news, we are glad to hear it is been running fine now.Regards,NC01-13-2017 07:45 AM
I noticed that AvgNandEraseCycles has reached 3148 and Percentage Used was 105%, which means the NAND Flash was significantly worn out.
01-13-2017 12:11 PM
While I understand the basic theory behind erase cycles, I don't see any details in the 750 specs what are the limits, so I can't judge if 3148 is high or not 😞
That being said, it'd be surprising (and sad) if the 750 SSD worn out so quickly - I only have it for ~6 months, and while I occasionally do write-heavy benchmarking (I'm ad database engineer), I do have a bunch of S3700 drives that I've used for the same thing, and those seem to be perfectly fine after a few years.