Hello,
I've been using EC2, FPGAs for about a 15 months now. I've always been using f1.2xlarge instances, with Ubuntu OS installed, and it worked as expected. Now, due to the amount of CPU intensive work I need to do, I've decided to try using a more robust f1.16xlarge. However, I ran into problems here. I've done all the steps: Loaded the AGFI, checked via lspci
is it available, and then tried some simple XDMA read/write tests, just to make sure the connection is still there. Sadly, I get no communication with the PCIe FPGA board.
Below is the dmesg output, that reports that the "magic" error in the descriptor happened. Again, I'm using the same driver, same AGFI, and the same Python wrappers around C invocation of kernel.
[ 1683.589120] xdma:engine_service_final_transfer: engine 0-H2C0-MM, status error 0x80010.
[ 1683.589123] xdma:engine_status_dump: SG engine 0-H2C0-MM status: 0x00080010: MAGIC_STOPPED,DESC_ERR:UNSUPP_REQ
[ 1683.589126] 0-H2C0-MM, s 0x80010, aborted xfer 0x00000000e19d64e9, cmpl 0/1
[ 1683.589136] xdma:xdma_xfer_submit: xfer 0x00000000e19d64e9,1024, failed, ep 0x0.
EDIT: I've figured out the problem, after looking more closely at the dmesg
output, I figured out that the AGFI was loaded on a different FPGA slot. I've loaded the AGFI as I always do: sudo fpga-load-local-image -S 0 -I $MY_AGFI_ID -H
. I don't see how it could end up on Slot 8? When I try to adjust my test and run it on Slot #8, all works as expected!
To be honest dmesg
shows this pretty straightforward:
[ 183.021774] xdma:remove_one: pdev 0x00000000999de0ae, xdev 0x0000000076ff6236, 0x00000000ca7f10f1.
[ 183.021777] xdma:xpdev_free: xpdev 0x0000000076ff6236, destroy_interfaces, xdev 0x00000000ca7f10f1.
[ 183.024133] xdma:xpdev_free: xpdev 0x0000000076ff6236, xdev 0x00000000ca7f10f1 xdma_device_close.
[ 186.066065] pci 0000:00:0f.0: [1d0f:f000] type 00 class 0x058000
[ 186.066817] pci 0000:00:0f.0: reg 0x10: [mem 0x86000000-0x87ffffff]
[ 186.067206] pci 0000:00:0f.0: reg 0x14: [mem 0x85200000-0x853fffff]
[ 186.067855] pci 0000:00:0f.0: reg 0x18: [mem 0x5e000410000-0x5e00041ffff 64bit pref]
[ 186.068493] pci 0000:00:0f.0: reg 0x20: [mem 0x5c000000000-0x5dfffffffff 64bit pref]
[ 186.083784] pci 0000:00:0f.0: BAR 4: assigned [mem 0x5c000000000-0x5dfffffffff 64bit pref]
[ 186.084214] pci 0000:00:0f.0: BAR 0: assigned [mem 0x86000000-0x87ffffff]
[ 186.084317] pci 0000:00:0f.0: BAR 1: assigned [mem 0x85200000-0x853fffff]
[ 186.084421] pci 0000:00:0f.0: BAR 2: assigned [mem 0x5e000410000-0x5e00041ffff 64bit pref]
[ 186.084996] xdma:xdma_device_open: xdma device 0000:00:0f.0, 0x000000006c4610d7.
[ 186.086074] xdma:map_single_bar: BAR0 at 0x86000000 mapped at 0x00000000f2cc5fa3, length=33554432(/33554432)
[ 186.086088] xdma:map_single_bar: BAR1 at 0x85200000 mapped at 0x00000000caa22a31, length=2097152(/2097152)
[ 186.086106] xdma:map_single_bar: BAR2 at 0x5e000410000 mapped at 0x000000003049f8eb, length=65536(/65536)
[ 186.086109] xdma:map_bars: config bar 2, pos 2.
[ 186.086110] xdma:map_single_bar: Limit BAR 4 mapping from 137438953472 to 2147483647 bytes
[ 186.086115] xdma:map_single_bar: BAR4 at 0x5c000000000 mapped at 0x00000000bfd17001, length=2147483647(/137438953472)
[ 186.086116] xdma:identify_bars: 4 BARs: config 2, user 0, bypass 4.
[ 186.095983] xdma:pci_keep_intx_enabled: 0000:00:0f.0: clear INTX_DISABLE, 0x406 -> 0x6.
[ 186.096158] xdma:irq_msix_channel_setup: engine 8-H2C0-MM, irq#572.
[ 186.096193] xdma:irq_msix_channel_setup: engine 8-H2C1-MM, irq#573.
[ 186.096225] xdma:irq_msix_channel_setup: engine 8-H2C2-MM, irq#574.
[ 186.096270] xdma:irq_msix_channel_setup: engine 8-H2C3-MM, irq#575.
[ 186.096301] xdma:irq_msix_channel_setup: engine 8-C2H0-MM, irq#576.
[ 186.096334] xdma:irq_msix_channel_setup: engine 8-C2H1-MM, irq#577.
[ 186.096366] xdma:irq_msix_channel_setup: engine 8-C2H2-MM, irq#578.
[ 186.096397] xdma:irq_msix_channel_setup: engine 8-C2H3-MM, irq#579.
[ 186.096431] xdma:irq_msix_user_setup: 8-USR-0, IRQ#580 with 0x000000000932b671
[ 186.096463] xdma:irq_msix_user_setup: 8-USR-1, IRQ#581 with 0x000000005edcc121
[ 186.096511] xdma:irq_msix_user_setup: 8-USR-2, IRQ#582 with 0x00000000249674d9
[ 186.096560] xdma:irq_msix_user_setup: 8-USR-3, IRQ#583 with 0x00000000d26d07c5
[ 186.096594] xdma:irq_msix_user_setup: 8-USR-4, IRQ#584 with 0x00000000c940ac79
[ 186.096627] xdma:irq_msix_user_setup: 8-USR-5, IRQ#585 with 0x000000001fccab2f
[ 186.096666] xdma:irq_msix_user_setup: 8-USR-6, IRQ#586 with 0x0000000009c457eb
[ 186.096699] xdma:irq_msix_user_setup: 8-USR-7, IRQ#587 with 0x000000002bedefd1
[ 186.096732] xdma:irq_msix_user_setup: 8-USR-8, IRQ#588 with 0x000000004ca712de
[ 186.096765] xdma:irq_msix_user_setup: 8-USR-9, IRQ#589 with 0x00000000e191ad7b
[ 186.096799] xdma:irq_msix_user_setup: 8-USR-10, IRQ#590 with 0x00000000026a9f8b
[ 186.096833] xdma:irq_msix_user_setup: 8-USR-11, IRQ#591 with 0x00000000a7138ee8
[ 186.096868] xdma:irq_msix_user_setup: 8-USR-12, IRQ#592 with 0x00000000b0c4b138
[ 186.096902] xdma:irq_msix_user_setup: 8-USR-13, IRQ#593 with 0x000000007f7aa664
[ 186.096934] xdma:irq_msix_user_setup: 8-USR-14, IRQ#594 with 0x0000000070f6c0f6
[ 186.096970] xdma:irq_msix_user_setup: 8-USR-15, IRQ#595 with 0x000000009aed6be9
[ 186.096978] xdma:probe_one: 0000:00:0f.0 xdma8, pdev 0x000000006c4610d7, xdev 0x0000000044888d47, 0x0000000058d876e9, usr 16, ch 4,4.
Is this a bug of some kind? How to protect myself?
Thank you in advance.