cancel
Showing results for 
Search instead for 
Did you mean: 

P44 Pro nvme controller is down will reset

andy
New Contributor II

NVMe (P44 Pro, model SSDPFKKW020X7) sometimes just disconnects and stops working, smart(attached) reports everything being normal, temperatures are also within reason (around 45-50C, graph of a few minutes before the incident attached). Drive disconnects after about a week or two of uptime (although it was fine for the first month and a half of use). OS is arch linux running kernel version 6.4 (and a few older ones, but this is for the most recent occurrence).

Firmware is on the latest version (checked with the update tool).

I removed the serial number from my smart output (just to be safe), but can send it if needed.

I saw someone with what looks like a related issue (on windows, I assume it's blue-screen-ing due to a spontaneous disconnect): https://community.solidigm.com/t5/solid-state-drives-nand/p44pro-too-hot-to-lose-disk/td-p/24074


Tempurture of nvme driveTempurture of nvme drivekernel logs of  drive disconnectingkernel logs of drive disconnectingsmart output of drive after rebootingsmart output of drive after rebooting

1 ACCEPTED SOLUTION

oscarfowler
New Contributor II

Was there ever a resolution to this? I'm seeing the same behavior on Windows 10.  I've had a crash around once a month for the last 7 months or so ever since I replaced my system drive with a Solidigm P44 Pro 2TB.

The system stops responding and fails to write a crash report to the drive. Upon reboot, the drive no longer shows up in the BIOS. (I have two of these NVMe drives installed, and only the one with the boot/system partition on it is missing when this happens. The other drive still shows as normal.)  Powering-down the system and restarting restores normal operation until the next time it crashes.

I tried contacting support, but they wouldn't do anything about it, since the drive's SMART data shows no problems.

I finally got sick of the issue after another crash yesterday and replaced the drive with a Samsung 990 Pro.

 

View solution in original post

22 REPLIES 22

iskxcr
New Contributor

Around Nov. 01 I noticed that my P44 Pro 2TB (purchased summer of 2023) as system drive would sometimes fail after a couple of days of heavy use, for which I tried swapping out the motherboard and even ordering another entire new P44 Pro.

I am using a fresh new installation of Windows 23H2 on these two P44 Pro drives and any driver-related problems should never exist.

It took me like two weeks to find this post, and many thanks to you guys so that I can now confidently switch to 990P without swapping out my CPU again to check if I've got its PCIe controller damaged or something else. To confirm that I've already switched my mobo from X670E Hero to X870 Hero.

My condition is similar to OP's @andy : No SMART errors, no critical/error events in the Event Viewer. The OS just suddenly freezes and after a while goes to black screen (or BSOD somtimes). It then automatically restarts into BIOS, and my system drive would no longer appear in the boot list. (However, the other P44 Pro, which I use as a storage drive, is completely fine). I have to do a hard reset on my PSU so that the drive can be recognized again by my hardware.

This is definitely a hardware issue, and I am not expecting it to be solved by any software updates. After all it's been around 1 years since this post and people are still experiencing the same issue. From what I see in @Lagoochu360 's reply, I'd say this is irrelevant to my VM/Docker configuration I guess. What's funnier is that I haven't even stressed my drive to experience this because the last time it fails, which is about 20 minutes before this post, I was only writing some C code (not even compiling) in VSCode.

Dear iskxcr,

Thank you for sharing your experience with the P44 Pro 2TB. 

Since your troubleshooting steps, including swapping to different hardware, have not resolved the issue, and given the nature of the symptoms you've described, this does indeed point towards a potential fault with the SSD itself rather than other components or software configurations.

For further assistance, it would be helpful to gather more specific details We’d be happy to take a closer look and assist you.  You can contact by using this link- Create Case  · Customer Self-Service (solidigm.com)   

If you have any other questions, please let us know

Kind regards,
Gleb
Solidigm Customer Support

SolidigmGleb_0-1731944818408.png

 

 

oscarfowler
New Contributor II

One more observation to add:

I've gone six months without a single crash since switching my system drive to a Samsung. However, I kept a second P44 Pro as a data drive, and that one hasn't had any issues at all.

A few possibilities come to mind: 1.) not all drives are affected; 2.) it has something to do with usage patterns, and those patterns don't present themselves on a non-system drive; 3.) it has something to do with the NVMe controller being used.

On my Gigabyte motherboard, the NVMe socket I'm using for the system drive is connected directly to the CPU (AMD Ryzen 5900X). The socket for the data drive is connected to the chipset (AMD X570).

Of course, it shouldn't matter, and clearly the Samsung drive doesn't have this problem, but if you want to try something before giving up on the drive, you could try seeing if switching to a different socket helps.