07-30-2018 07:33 PM
I get a problem and want to get an answer .
Linux version 4.9.37
hardware is Soc with arm cpu inside.
when it boots up, the error reports like :
irq 45: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.37 # 9
Hardware name: xxxDEMO Board (DT)
Call trace:
[] dump_backtrace+0x0/0x198
[] show_stack+0x14/0x20
[] dump_stack+0x94/0xb8
[] __report_bad_irq+0x38/0xe8
[] note_interrupt+0x20c/0x2e0
[] handle_irq_event_percpu+0x44/0x58
[] handle_irq_event+0x44/0x78
[] handle_fasteoi_irq+0xb4/0x1c0
[] generic_handle_irq+0x24/0x38
[] __handle_domain_irq+0x5c/0xb0
[] gic_handle_irq+0x64/0xc0
Exception stack(0xffffffc023b7ee10 to 0xffffffc023b7ef40)
ee00: ffffffc023b7ee40 0000007fffffffff
ee20: ffffffc023b7ef70 ffffff800809fe3c 0000000040000005 ffffff8008880000
ee40: 0000000000000000 0000000000000000 00000000fffb6edd ffffff800851d318
ee60: 000000000ccccccd 0000000000000020 0000000003687eb1 0000000000000066
ee80: 0000000000000008 0000000200000000 0000000000000002 7fffffffffffffff
eea0: 0000000000000000 0000000029a9e4c0 000000000000c350 0000000000000033
eec0: 0000000000000019 0000000000000001 0000000000000007 ffffff8008858000
eee0: ffffff8008856b08 0000000000000000 ffffff80088dc600 ffffffc022408000
ef00: ffffff8008880000 00000000fffb6edb ffffffc023b7f090 ffffff800889a136
ef20: 0000000000000082 ffffffc023b7ef70 ffffff80080a025c ffffffc023b7ef70
[] el1_irq+0xac/0x140
[] irq_exit+0x94/0xb8
[] __handle_domain_irq+0x60/0xb0
[] gic_handle_irq+0x64/0xc0
Exception stack(0xffffff8008883df0 to 0xffffff8008883f20)
3de0: 0000000000000000 0000000000000000
3e00: ffffffc023b7fbcc 000000401b329000 0000000000000080 0100000000000000
3e20: 0000000000000155 00000000fffb6ed5 ffffff800888d300 ffffff8008880000
3e40: 0000000000000820 ffffff800885a000 ffffffc021c90080 0000000000000002
3e60: 0000000000000001 dead000000000100 0000000000000019 0000000000000001
3e80: ffffff80088e4578 ffffff8008880000 ffffff8008887240 ffffff80088871a8
3ea0: 0000000000000001 ffffff8008880000 ffffff8008880000 0000000000000001
3ec0: ffffff8008887000 ffffff800889a136 0000000044820018 ffffff8008883f20
3ee0: ffffff8008084eac ffffff8008883f20 ffffff8008084eb0 0000000060000005
3f00: ffffff8008883f20 ffffff8008653cbc ffffffffffffffff ffffff80080d3cf4
[] el1_irq+0xac/0x140
[] arch_cpu_idle+0x10/0x18
[] cpu_startup_entry+0xd0/0x140
[] rest_init+0x6c/0x78
[] start_kernel+0x2dc/0x2f0
[] __primary_switched+0x5c/0x64
handlers:
[] nvme_irq
Disabling IRQ # 45
but the other brand 'KingBand ' works well on this platform.
any reply will be appreciated.
Regards,
JiaGang
07-31-2018 10:11 AM
Hello Jijiagang.
Thank you for contacting Intel Technical Support. As we understand, you are requesting support for your Intel® SSD 760p Series.If we infer correctly, to begin diagnosis and consequent troubleshooting that could take us to a resolution, we would appreciate if you could, please, reply to this post with the following, important, basic information:07-31-2018 07:54 PM
Hi Josh,
Thanks for your reply.
Yes, its 128GB of SSD 760P series.
it's embedded system, the Soc is designed by ourselves , the Soc has a PCIE controller. we run Linux 4.9.37 on this platform, and we select these options in Linux:
<*> NVM Express block device
[*] SCSI emulation for NVMe device nodes
<*> NVMe Target support
<*> NVMe loopback device support
By adding debug code, we found that it couldn't get the right state when reading one status register, then it caused the interrupt exception.
the below is the output log or Intel Ssd and KingBand Ssd.
ERROR:
# nvmeq->cqes[0].status = 0, phase = 1
# nvmeq->cqes[0].status = 0, phase = 1
# (le16_to_cpu(nvmeq->cqes[0].status) & 1) = 0
# (le16_to_cpu(nvmeq->cqes[0].status) & 1) = 0
------------------__nvme_process_cq,731--------------------
TURE:
# nvmeq->cqes[0].status = 0, phase = 1
# nvmeq->cqes[0].status = 1, phase = 1
# (le16_to_cpu(nvmeq->cqes[0].status) & 1) = 1
# (le16_to_cpu(nvmeq->cqes[0].status) & 1) = 1
the position of debug code :
/* We read the CQE phase first to check if the rest of the entry is valid */
static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head,
u16 phase)
{
printk("# nvmeq->cqes[%hd].status = %hd, phase = %hd\n",head, nvmeq->cqes[head].status, phase);
asm("nop");
printk("# nvmeq->cqes[%hd].status = %hd, phase = %hd\n",head, nvmeq->cqes[head].status, phase);
asm("nop");
asm("nop");
printk("# (le16_to_cpu(nvmeq->cqes[%hd].status) & 1) = %hd\n",head,((le16_to_cpu(nvmeq->cqes[head].status)) & 1));
asm("nop");
printk("# (le16_to_cpu(nvmeq->cqes[%hd].status) & 1) = %hd\n",head,((le16_to_cpu(nvmeq->cqes[head].status)) & 1));
return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
}
volatile struct nvme_completion *cqes;
through the log, it didn't return 1 when read nvmeq->cqes[%hd].status, and it's ssd's status register.
on our platform, it will report this error every time when the system boots up . but the kingbank works well, so I think maybe Intel needs some quirk things to do like this code in drivers/nvme/host/pci.c
static const struct pci_device_id nvme_id_table[] = {
{ PCI_VDEVICE(INTEL, 0x0953),
.driver_data = NVME_QUIRK_STRIPE_SIZE |
NVME_QUIRK_DISCARD_ZEROES, },
please give us a hand, thanks.
Best regards,
jijiagang
08-01-2018 02:50 PM
Hello Jijiagang.
Thank you for your reply. As we understand, you are trying to use your Intel® SSD 760p Series 128GB, M.2 80mm, PCIe NVMe 3.1 x4, 3D2, TLC on an embedded system, the Soc is custom designed by you and your team and this platform is running Linux 4.9.37. Please, take in consideration the following information:08-02-2018 04:20 AM
Hi Josh,
thanks for your reply.
I still don't know how to resolve my problem although.
we get the Linux kernel from Linux open source community, port it to our platform. And KingBand is ok, but 760P not.
could you please give us any advices, like add code , get some registers.
that website you showed me don't describe how to add driver for Linux to support this SSD.
I want to get further help from you.
Best regards.
Jijiagang