cancel
Showing results for 
Search instead for 
Did you mean: 

P44 Pro nvme controller is down will reset

andy
New Contributor II

NVMe (P44 Pro, model SSDPFKKW020X7) sometimes just disconnects and stops working, smart(attached) reports everything being normal, temperatures are also within reason (around 45-50C, graph of a few minutes before the incident attached). Drive disconnects after about a week or two of uptime (although it was fine for the first month and a half of use). OS is arch linux running kernel version 6.4 (and a few older ones, but this is for the most recent occurrence).

Firmware is on the latest version (checked with the update tool).

I removed the serial number from my smart output (just to be safe), but can send it if needed.

I saw someone with what looks like a related issue (on windows, I assume it's blue-screen-ing due to a spontaneous disconnect): https://community.solidigm.com/t5/solid-state-drives-nand/p44pro-too-hot-to-lose-disk/td-p/24074


Tempurture of nvme driveTempurture of nvme drivekernel logs of  drive disconnectingkernel logs of drive disconnectingsmart output of drive after rebootingsmart output of drive after rebooting

1 ACCEPTED SOLUTION

oscarfowler
New Contributor II

Was there ever a resolution to this? I'm seeing the same behavior on Windows 10.  I've had a crash around once a month for the last 7 months or so ever since I replaced my system drive with a Solidigm P44 Pro 2TB.

The system stops responding and fails to write a crash report to the drive. Upon reboot, the drive no longer shows up in the BIOS. (I have two of these NVMe drives installed, and only the one with the boot/system partition on it is missing when this happens. The other drive still shows as normal.)  Powering-down the system and restarting restores normal operation until the next time it crashes.

I tried contacting support, but they wouldn't do anything about it, since the drive's SMART data shows no problems.

I finally got sick of the issue after another crash yesterday and replaced the drive with a Samsung 990 Pro.

 

View solution in original post

8 REPLIES 8

andy
New Contributor II

> Was there ever a resolution to this?
TL;DR: Same thing that happened to you, I switched to samsung and haven't looked back (no issues in the last 8 months)

Full story:
A ticket was opened and I talked to support. They had me send a smart log using one of their tools, but no errors were found there. They had me try a few things like re-installing the OS, switching the port the NVMe was plugged into, try it in a different computer - none of which really worked (but was hard to test because it took weeks for the issue to manifest).

Eventually they sent me an email stating that I have an unsupported configuration and a link to the warranty - having read through that I noticed that it said that they do not support "any computer ... *that supports* workloads ... of more than one concurrent user or one or more remote client device concurrently". Which to me sounds like they don't support WIndows 10/11 Pro (you can RDP onto a machine - remote user; you can share files on the drive over samba). So it "supports" multiple users or concurrent client devices accessing the drives. The "supports" bit means that you don't even have to use those features for them to tell you that you're out of the warranty.

I did end up giving the drive away to a friend (having told him this story) in hopes that the windows drivers would function better... but I guess not.

-----

Having written this again, I would be interested in trying to set the NVMe drive to use a PCIe4 x1 connection (instead of x4 lanes like usual). This should get rid of the concurrent accesses (which seems to be a know bug given the wording of the warranty). But that would negate the performance benefits of using NVMe.

oscarfowler
New Contributor II

That exception in the warranty is crazy, since Linux, Windows and Mac all support concurrent users in one way or another (e.g. opening up SSH).

Thanks for posting your reply - I wasn't willing to waste the time re-installing the operating system, testing with a different drive, or putting the drive in another machine. All of those tests are an onerous burden on the customer, especially considering you'd have to wait weeks or longer after the change to confirm whether any change had occurred. If I was going to spend the time swapping out the drive and/or reinstalling the OS, I was going to do it with a different brand drive and greatly improve my chances of not having to do it again.

I saw your post before I called their support. Though it's possible that it's a software bug that affects both Linux and Windows, I think it's more likely a hardware bug. I guess we'll never know since they seem disinterested in diagnosing the issue.

 

komugi
New Contributor

I hoped this topic would solve my problem, but unfortunately seems like I need to just get a Samsung 990P.

Running Windows with Wsl2 for Dev, I'm definitely getting random restarts, I got lucky on one of crashes and had task manager up and saw that the P44 Pro jumped to 100% utilization, and I had system stutters before completely disappearing from the task manager, which then crashed out.

It also happens sometimes during system daily backups running Veeam agent, got a BSOD of 'Critical Process Died' and on a restart, the Nvme would disappear from the EFI detection until I did a full power off and restart.

Documenting this just as FYI. I have a 990 on order to replace this drive since it seems like it's a hardware issue. 

oscarfowler
New Contributor II

I'll report back if anything changes, but I haven't had an issue since switching to the 990. If Solidigm acknowledge there's an issue with these drives, I'm might try them again in the future, but the lack of any interest in investigating this means they're off my list permanently.