Forum - NEC VE 10C stuck in state unavailable with odly small PCIe Region 0 allocation and how to start NEC VE on less than 16 PCIe lanes

Jump to navigation Jump to search
Overview > Topics > Aurora Administration > NEC VE 10C stuck in state unavailable with odly small PCIe Region 0 allocation and how to start NEC VE on less than 16 PCIe lanes
[#115]

Hello i am a student and HPC hobbyist from Germany and i got one Vector Engine type 10c under my desk. I attempt to get the card working in an Fujitsu PRIMERGY TX140 S2 which boasts a XEON 1230v3 (Haswell) and two 8x PCIe 3.0 slot. This CPU is outside the range of supported processors of the Vector Engine 2.0 family. However i found no such information about the 1.0 generation.

When i start up the machine the cards fan runs a…e/pdfs/Investigation_guide_for_VE_System_Trouble_E.pdf|Vector Engine 2.0 Troubleshooting Guide]] leads me to a server error. This is not helpful in identifying or fixing the server error.

One interesting fact is that the Region 0 of the card is not anywhere near 64G but 128M instead. Here is the relevant lspci -vvv excerpt https://pastebin.com/5agyjmtv I hope someone can give a pointer how to learn why Region 0 is not 64G as it should be which i suspect is related to the root cause of this issue.

Posted by Johann-Tobias Schäg on 26 December 2022 at 17:10.
Edited by Johann-Tobias Schäg on 6 April 2023 at 23:27.

I have not been able to resolve this. However the core problem was that the NEC VectorEngine refused to start outside of an 16x slot. I were able to trace this issue back to a check inside a binary i did not have the source code to. Running /opt/nec/ve/bin/vecmd mconfig set linkdown_err 0x0101 as root fixed the issue and the card came up fine. It is possible that for PCIe 2.0 slots a different change is needed.

Posted by Johann-Tobias Schäg on 6 April 2023 at 23:26.