12-16-2016 08:40 AM
Hi all,
On my
server (HPE DL380 G9) I have a Problem with my P3608 4TB Card.After about 3 months in operations, I lose the first controller after a reboot, the second one by the next reboot. And now, I can't access the card with the isdct tool. It seems that the controller is offline.
Short description:
I have created a raid 0 with two cards (one file system whit 7.4 TB capacity).
I use a Debian 8 system with the newest SSD firmware (8DV101F0)Does anyone have a good idea?ThxRoger
12-16-2016 02:40 PM
Hello Roger_MCH,
First of all, we would like to know if you followed http://www.intel.com/content/dam/support/us/en/documents/ssdc/data-center-ssds/Intel_Linux_NVMe_Guid... these instructions when it was working?What kind of workload do you put on the SSD?We will be waiting for your response, in case you need further assistance let us know here, or contact our http://www.intel.de/content/www/de/de/support/contact-support.html# @18 support department.Regards,NC12-20-2016 08:18 AM
Hello NC,
Many thanks for your Feedback. We use "Proxmox" as a virtualization solution. This solution is using the Ubuntu 16.04 LTS 4.4 kernel. The driver is activated by default.
In the current system, I have combine two P3608 4TB cards to a Raid 0 volume (# zpool create -f -o ashift = 12 ssd_pool /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1)On the server runs a weather application which imports and writes many data. The data can be weather models (large files) or small files like point data.It is already the second card which has logged off respectively has deactivated. I have opened a support case by Intel; the result was a 1:1 Replacement without any additional information.
Regards,
Roger
12-21-2016 08:30 AM
Hello Roger_MCH,
Without performing more troubleshooting and getting more data, it may be hard to positively diagnose the cause of your drive failures.However, it might be worth noting that each P3608 counts as two drives. So if you RAID 0 two pairs of these SSDs, it would be the same as creating a RAID 0 array with four SSDs. Which does increase the expected failure rate by about 80%. Perhaps a RAID 10 would be a better option?The endurance rating of the Intel® SSD DC P3608 allows for 3 drive writes per day, or 21.90 Petabytes Written. Exceeding this would be another possible cause for the drive to fail earlier than expected.In many cases if an SSD fails in a raid, we recommend removing the drive from the array and performing a secure erase/low level format. More often than not, this allows the drive to recover successfully. Although this may have been out of the question if your drives were no longer detected at BIOS level.Best regards,Carlos A.12-22-2016 02:40 AM
Hi Carlos
I also think that a Raid 10 would be better but the cost is also much higher. We write about 500GB ~ 700GB data per day to the disc. This should be not a problem.
Question, how I can do a secure erase/low Level format from the ssd-card respectively from the effected controller?Error message for the /var/log/messages
Dec 14 17:58:51 zuenjv05 kernel: [ 9.822799] nvme 0000:87:00.0: Failed status: 0x3, reset controller.
Dec 14 17:58:51 zuenjv05 kernel: [ 9.823291] nvme 0000:87:00.0: Cancelling I/O 0 QID 4
Dec 14 17:58:51 zuenjv05 kernel: [ 9.823294] nvme 0000:87:00.0: Cancelling I/O 1 QID 4
Dec 14 17:58:51 zuenjv05 kernel: [ 9.823296] nvme 0000:87:00.0: Cancelling I/O 2 QID 4
Dec 14 17:58:51 zuenjv05 kernel: [ 9.823298] nvme 0000:87:00.0: Cancelling I/O 0 QID 1
Dec 14 17:58:51 zuenjv05 kernel: [ 11.383135] nvme 0000:87:00.0: IO queues not created
Some additional details:mailto:root@zuenjv05:/sys root@zuenjv05:/sys# lspci -nn | grep -i ssd
0d:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)
0e:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)
86:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)
87:00.0 Non-Volatile memory controller [0108]: Intel Corporation PCIe Data Center SSD [8086:0953] (rev 02)
root@zuenjv05:/sys#
find . -name "*nvme*"./bus/pci/drivers/nvme
./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme
./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme/nvme0
./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:01.0/0000:0d:00.0/nvme/nvme0/nvme0n1
./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme
./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme/nvme1
./devices/pci0000:00/0000:00:03.2/0000:0b:00.0/0000:0c:02.0/0000:0e:00.0/nvme/nvme1/nvme1n1
./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:01.0/0000:86:00.0/nvme
./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:01.0/0000:86:00.0/nvme/nvme2
./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:02.0/0000:87:00.0/nvme
./devices/pci0000:80/0000:80:02.0/0000:84:00.0/0000:85:02.0/0000:87:00.0/nvme/nvme3
./block/nvme0n1
./block/nvme1n1
./class/nvme
./class/nvme/nvme0
./class/nvme/nvme1
./class/nvme/nvme2
./class/nvme/nvme3
./class/block/nvme0n1
./class/block/nvme1n1
./module/nvme
./module/nvme/drivers/pci:nvme
2234 Handle 0x00EC, DMI type 203, 34 bytes
2235 OEM-specific Type
2236
Header and Data:2237 CB 22 EC 00 FE FF FE FF 86 80
53 09 86 80 09 372238 01 08 EB 00 00 00 10 0A 02 01
FF FF 01 02 03 042239 00 00
2240
Strings:2241
PciRoot(0x0)/Pci(0x3,0x2)/Pci(0x0,0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)2242 NVMe.Slot.2.1
2243 NVM Express Controller
2244 Slot 2
2245
2246 Handle 0x00ED, DMI type 203, 34 bytes
2247 OEM-specific Type
2248
Header and Data:2249 CB 22 ED 00 FE FF FE FF 86 80 53 09 86 80 09
372250 01 08 EB 00 00 00 10 0A 02 02
FF FF 01 02 03 042251 00 00
2252
Strings:2253
PciRoot(0x0)/Pci(0x3,0x2)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)2254 NVMe.Slot.2.2
2255 NVM Express Controller
2256 Slot 2
2282 Handle
0x00F0, DMI type 203, 34 bytes2283 OEM-specific Type
2284
Header and Data:2285
CB 22 F0 00 FE FF FE FF 8680 53 09 86 80 09 372286 01 08 EF 00 00 00 09 0A 05 02
FF FF 01 02 03 042287 00 00
2288
Strings:2289
PciRoot(0x1)/Pci(0x2,0x0)/Pci(0x0,0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)2290 PCI.Slot.5.2
2291 NVM Express Controller
2292 Slot 5
2293
2294 Handle 0x00F1, DMI type 203, 34 bytes
2295 OEM-specific Type
2296
Header and Data:2297
CB 22 F1 00 FE FF FEFF 86 80 53 09 86 80 09 372298 01 08 EF 00 00 00 09 0A 05 03
FF FF 01 02 03 042299 00 00
2300
Strings:2301
PciRoot(0x1)/Pci(0x2,0x0)/Pci(0x0,0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)2302 PCI.Slot.5.3
2303 NVM Express Controller
2304 Slot 5
2305
root@zuenjv05:/dev#
dmidecode --type 9# dmidecode
2.12SMBIOS 2.8
present.....
....
Handle
0x00BC, DMI type 9, 17 bytesSystem Slot
X...