Solidigm

SGoel5 · ‎02-06-2018

Hi,

We are experiencing persistent I/O request timeouts on Linux with P3520/P4600 SSDs. We have tried multiple different kernels (3.10, 4.4, 4.9) and see the timeouts on all of them. The P4600 seems to be more prone to these than the P3520 though we see them on the latter as well. We have the latest firmware installed on both drives which are housed in the same machine (Supermicro 5018R-WR with X10SRW-F motherboard and E5-1650 V4 CPU). We can reproduce the timeouts by simply running mkfs -t xfs on the drive.

Here is the output from isdct (version isdct-3.0.9.400-17.x86_64):

- Intel SSD DC P3520 Series CVPF717100L01P2JGN -

Bootloader : MB1B0105

DevicePath : /dev/nvme0n1

DeviceStatus : Healthy

Firmware : MDV10271

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 0

ModelNumber : INTEL SSDPEDMX012T7

ProductFamily : Intel SSD DC P3520 Series

SerialNumber : CVPF717100L01P2JGN

- Intel SSD DC P4600 Series BTLE736007F54P0KGN -

Bootloader : 0110

DevicePath : /dev/nvme1n1

DeviceStatus : Healthy

Firmware : QDV10150

FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

Index : 1

ModelNumber : INTEL SSDPEDKE040T7

ProductFamily : Intel SSD DC P4600 Series

SerialNumber : BTLE736007F54P0KGN

Here are the messages the 4.9 kernel prints when using the P4600

[ 151.297903] nvme nvme1: I/O 568 QID 1 timeout, aborting

[ 151.303130] nvme nvme1: I/O 569 QID 1 timeout, aborting

[ 151.308347] nvme nvme1: I/O 570 QID 1 timeout, aborting

[ 151.313562] nvme nvme1: I/O 571 QID 1 timeout, aborting

[ 151.355465] nvme nvme1: completing aborted command with status: 0000

[ 151.411273] nvme nvme1: completing aborted command with status: 0000

[ 151.466903] nvme nvme1: completing aborted command with status: 0000

[ 151.522609] nvme nvme1: completing aborted command with status: 0000

[ 151.578226] nvme nvme1: completing aborted command with status: 0000

...

[ 165.395295] nvme nvme1: Abort status: 0x0

[ 165.399296] nvme nvme1: Abort status: 0x0

[ 165.403299] nvme nvme1: Abort status: 0x0

[ 165.407304] nvme nvme1: Abort status: 0x0

We would appreciate your help in resolving this issue.

Regards,

Shantanu Goel

idata · ‎02-22-2018

Hello Shantanu,

I would like to inform you that we have performed several tests to try and reproduce the issue you are experiencing, and that we are still doing research in order to find out what could be causing the timeout error messages. I'll contact you as soon as we find something relevant. Regards,Andres V.

idata · ‎03-01-2018

Hello Shantanu,

I just want to inform you that we have been trying to reproduce the issue you are experiencing but currently haven't got the same output as you do.Would it be possible for you to run the command on CentOS 7, kernel 4.15, and share the output?In case you have any update don't hesitate to contact us.I'll be waiting for your response.Regards,Andres V.

idata · ‎03-02-2018

Hello Shantanu,

I was wondering if you would be interested in trying the workaround kindly suggested by community member berthierp?

In case you do, please share your results with us.

Regards,

Andres V.

SGoel5 · ‎03-05-2018

Hi,

Thank you both for your suggestions and I can confirm that increasing the timeout to 30 seconds as upstream kernel.org has done or passing the -K flag to mkfs fixes the issue.

Regards,

Shantanu

idata · ‎03-06-2018

Hello Shantanu,

I'm glad to hear that you found a solution to the issue.Thank you for sharing the workaround, the community really appreciates it.In case you have another question, don't hesitate to contact us. Regards,Andres V.

Solidigm

I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs