cancel
Showing results for 
Search instead for 
Did you mean: 

P44 Pro nvme controller is down will reset

andy
New Contributor II

NVMe (P44 Pro, model SSDPFKKW020X7) sometimes just disconnects and stops working, smart(attached) reports everything being normal, temperatures are also within reason (around 45-50C, graph of a few minutes before the incident attached). Drive disconnects after about a week or two of uptime (although it was fine for the first month and a half of use). OS is arch linux running kernel version 6.4 (and a few older ones, but this is for the most recent occurrence).

Firmware is on the latest version (checked with the update tool).

I removed the serial number from my smart output (just to be safe), but can send it if needed.

I saw someone with what looks like a related issue (on windows, I assume it's blue-screen-ing due to a spontaneous disconnect): https://community.solidigm.com/t5/solid-state-drives-nand/p44pro-too-hot-to-lose-disk/td-p/24074


Tempurture of nvme driveTempurture of nvme drivekernel logs of  drive disconnectingkernel logs of drive disconnectingsmart output of drive after rebootingsmart output of drive after rebooting

1 ACCEPTED SOLUTION

oscarfowler
New Contributor II

Was there ever a resolution to this? I'm seeing the same behavior on Windows 10.  I've had a crash around once a month for the last 7 months or so ever since I replaced my system drive with a Solidigm P44 Pro 2TB.

The system stops responding and fails to write a crash report to the drive. Upon reboot, the drive no longer shows up in the BIOS. (I have two of these NVMe drives installed, and only the one with the boot/system partition on it is missing when this happens. The other drive still shows as normal.)  Powering-down the system and restarting restores normal operation until the next time it crashes.

I tried contacting support, but they wouldn't do anything about it, since the drive's SMART data shows no problems.

I finally got sick of the issue after another crash yesterday and replaced the drive with a Samsung 990 Pro.

 

View solution in original post

15 REPLIES 15

oscarfowler
New Contributor II

I'll report back if anything changes, but I haven't had an issue since switching to the 990. If Solidigm acknowledge there's an issue with these drives, I'm might try them again in the future, but the lack of any interest in investigating this means they're off my list permanently.

Drayvn
New Contributor II

Thankfully i found this thread, i've been figuring out why my System keeps crashing just like a everybody else in here, everything would just hang, then BSOD, and if restarted normally the SSD would be missing from Disk Management

It would require a hard reset for the SSD to show up again. This only seems to occur in Diablo 4 as sometimes it seems it needs to access a lot small data files at the same time and the SSD goes to 100% utilisation with pretty much no MB/S throughput.

I was able to rectify the situation a little bit but installing all my games on a 2nd P44 Pro SSD and instead of BSODing it would just undock the 2nd SSD and so the OS would be safe to continue. Though there would still be issues.

So it looks like i just need to replace the P44 Pro, such a shame as the cost of these are great and are just about as good as the 980/990 Pro. 

Drayvn
New Contributor II

Just in case anyone reads this, i've limited my M2 slot to Gen 3 speeds in the BIOS and so far i haven't had a crash since, will keep testing and abusing to see if this problem rears it's ugly head. But so far, limiting your M2 slot in the BIOS to Gen 3 speeds seems to maybe fix it?!

oscarfowler
New Contributor II

I suspect it's more likely to delay the problem than eliminate it. But even if it does keep it from happening again, I'd call this a workaround, not a fix.

Drayvn
New Contributor II

You're right, it is a work around and it didn't last, i've since replaced the drive with a 990 Pro and haven't had the problem since, will still use the P44 Pro SSD as purely storage but will never use it for heavy workloads or even sustained workloads again