Vitis Error code -44 src/host.cpp:86 Error calling cl::Program program...

0

I am trying to follow the Vitis Hello World example, and it fails @ "" from https://github.com/aws/aws-fpga/blob/master/Vitis/README.md#3-run-the-fpga-accelerated-application-on-amazon-fpga-instances

[centos@ip-172-31-28-216 ~]$   ./host vadd.awsxclbin 

I tried the following recommendations from https://github.com/cdr/code-server/issues/347, and got it to work by installing anaconda and copying over the needed library!

I then reran the source command and then executed the following:

[centos@ip-172-31-28-216 ~]$ ./host vadd.awsxclbin 
Found Platform
Platform Name: Xilinx
INFO: Reading vadd.awsxclbin
Loading: 'vadd.awsxclbin'
Trying to program device[0]: xilinx_aws-vu9p-f1_dynamic_5_0
XRT build version: 2.3.0
Build hash: 9e13d57c4563e2c19bf5f518993f6e5a8dadc18a
Build date: 2020-02-06 15:08:44
Git branch: 2019.2
PID: 24766
UID: 1000
[Sat Jul 11 18:03:18 2020]
HOST: ip-172-31-28-216.us-west-2.compute.internal
EXE: /home/centos/host
[XRT] ERROR: See dmesg log for details. err=-5
[XRT] ERROR: Failed to load xclbin.
src/host.cpp:86 Error calling cl::Program program(context, {device}, bins, NULL, &err), error code is: -44

And here is the dmesg:

[ 1873.568692] xocl 0000:00:1d.0: xocl_axlf_section_header: could not find section header 20
[ 1873.576292] [drm] Finding MEM_TOPOLOGY section header
[ 1873.580670] [drm] Section MEM_TOPOLOGY details:
[ 1873.585269] [drm]   offset = 0x2f8
[ 1873.587052] [drm]   size = 0x120
[ 1873.590570] icap.u icap.u.15728640: get_axlf_section_hdr: could not find section header 20
...
b-92d6-9eea49579b1b
on device xclbin: 00000000-0000-0000-0000-000000000000
[ 3191.934552] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 8 via SW
[ 3191.939632] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 3192.701050] icap.u icap.u.15728640: __icap_peer_xclbin_download: peer xclbin download err: -5
[ 3192.706628] icap.u icap.u.15728640: get_axlf_section_hdr: section 8 offset: 1048, size: 88
[ 3192.712188] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 8, err: 0
[ 3192.719475] icap.u icap.u.15728640: get_axlf_section_hdr: section 6 offset: 760, size: 288
[ 3192.724644] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 6, err: 0
[ 3192.731618] icap.u icap.u.15728640: get_axlf_section_hdr: section 7 offset: 1136, size: 40
[ 3192.737304] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 7, err: 0
[ 3192.744339] icap.u icap.u.15728640: get_axlf_section_hdr: could not find section header 9
[ 3192.749656] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 9, err: -22
[ 3192.756843] icap.u icap.u.15728640: get_axlf_section_hdr: section 11 offset: 1176, size: 682
[ 3192.762185] icap.u icap.u.15728640: get_axlf_section_hdr: section 8 offset: 1048, size: 88
[ 3192.767350] icap.u icap.u.15728640: get_axlf_section_hdr: section 6 offset: 760, size: 288
[ 3192.772860] icap.u icap.u.15728640: icap_download_bitstream_axlf: icap_download_bitstream_axlf err: -5
[ 3192.778906] xocl 0000:00:1d.0: exec_reset: exec_reset(1) cfg(0)
[ 3192.782742] xocl 0000:00:1d.0: exec_reset: exec_reset resets
[ 3192.786417] xocl 0000:00:1d.0: exec_reset: exec->xclbin(00000000-0000-0000-0000-000000000000),xclbin(00000000-0000-0000-0000-000000000000)
[ 3192.794364] xocl 0000:00:1d.0: xocl_read_axlf_helper: Failed to download xclbin, err: -5
[ 3192.807666] [drm] client exits pid(24766)
[ 3192.810495] xocl 0000:00:1d.0: xocl_drvinst_close: CLOSE 2
[ 3192.813938] xocl 0000:00:1d.0: xocl_drvinst_close: NOTIFY ffff92c467415010

Edited by: Frankie-guz on Jul 11, 2020 11:20 AM

Edited by: Frankie-guz on Jul 13, 2020 7:46 AM

asked 4 years ago372 views
15 Answers
0
Accepted Answer

The reason why you're getting Error -5 is that the AFI specified in the .awsxclbin file can't be loaded. That could either be because the AFI was created in a different region than us-west-2 or it was created in a different account. In your case we verified that the AFI is available in us-west-2.

Can you try running this command to see if you can load the default AFI:

sudo fpga-load-local-image -S0 -I agfi-069ddd533a748059b

Another thing to try is to see if you are able to load the AFI from your helloworld application manually:

sudo fpga-load-local-image -S0 -I agfi-0387575535db6c0eb

Verify that this is the same AGFI as the one in your awsxclbin file:

strings vadd.awsxclbin | grep agfi

# Output should be agfi-0387575535db6c0eb
# if it is different, then you need to make sure the AFI is loadable by your account in us-west-2

Here is what I tried just now:

I created a helloworld_ocl AFI in us-east-1 and tried to run the application in us-west-2:

us-east-1 AFI:

{
    "FpgaImageId": "afi-05fff64a3b07c697b", 
    "FpgaImageGlobalId": "agfi-0c550c22ccf7daeb4"
}

The AGFI was recorded in the awsxclbin file:

[centos@ip-172-31-19-206 cl_helloworld]$ strings xclbin/vector_addition.awsxclbin | grep agfi
agfi-0c550c22ccf7daeb4

On running the same example in us-west-2, I got the same error as you:

Jul 16 16:04:35 ip-172-31-19-206.us-west-2.compute.internal mpd[1950]: Failed to load AFI, error: 5
Jul 16 16:04:35 ip-172-31-19-206.us-west-2.compute.internal mpd[1950]: [0:0:1d.0] mpd daemon: response 8 sent ret = -5
Jul 16 16:04:35 ip-172-31-19-206.us-west-2.compute.internal mpd[1950]: [0:0:1d.0] write 36 bytes out of 36 bytes to fd 5

I then copied my AFI to us-west-2:

aws ec2 copy-fpga-image --name copy-afi --source-fpga-image-id afi-05fff64a3b07c697b--source-region us-east-1 --region us-west-2
{
    "FpgaImageId": "afi-0fb9d9759904c0098"
}

Reran the same example:

./helloworld xclbin/vector_addition.awsxclbin
Found Platform
Platform Name: Xilinx
INFO: Reading xclbin/vector_addition.awsxclbin
Loading: 'xclbin/vector_addition.awsxclbin'
Trying to program device[0]: xilinx_aws-vu9p-f1_dynamic_5_0
Device[0]: program successful!
Result =
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
TEST PASSED

I hope this helps.

-Deep

Deep_P
answered 4 years ago
0

Hi,

In this case, was the AFI generated and available in the region? The error might indicate it was unable to load the AFI.

-Deep

Deep_P
answered 4 years ago
0

Hi Deep,

I ran the sample build in the AWS Vitis guide with aws-configure set for my region. Is it possible the Vitis example uses a region different than mine and if so how would i check that?

Thanks!

answered 4 years ago
0

You could check the AFI by calling:

aws ec2 describe-fpga-images --fpga-image-ids <AFI ID> --region <your f1 instance region>

If the AFI does not show up in the response then this might happen. AWS CLI needs to be setup for the region you are running the F1 instance in. To check your region,

aws configure get default.region

Please let us know if this doesn't help.

-Deep

Deep_P
answered 4 years ago
0

It's showing as all good and in the correct region:

[centos@ip-172-31-28-216 Vitis]$ aws ec2 describe-fpga-images --fpga-image-ids afi-0c898069689a52133 --region us-west-2
{
    "FpgaImages": [
        {
            "UpdateTime": "2020-07-13T19:47:16.000Z", 
            "Name": "vadd", 
            "Tags": [], 
            "PciId": {
                "SubsystemVendorId": "0xfedd", 
                "VendorId": "0x1d0f", 
                "DeviceId": "0xf010", 
                "SubsystemId": "0x1d51"
            }, 
            "FpgaImageGlobalId": "agfi-0387575535db6c0eb", 
            "Public": false, 
            "State": {
                "Code": "available"
            }, 
            "ShellVersion": "0x04261818", 
            "OwnerId": "491827336117", 
            "FpgaImageId": "afi-0c898069689a52133", 
            "CreateTime": "2020-07-13T19:10:22.000Z", 
            "Description": "vadd"
        }
    ]
}
answered 4 years ago
0

In this case,

Could you share the output of:

sudo systemctl status mpd
sudo systemctl restart mpd
sudo systemctl status mpd

-Deep

Deep_P
answered 4 years ago
0

Sure!

[centos@ip-172-31-28-216 aws-fpga]$ sudo systemctl status mpd
● mpd.service - Xilinx Management Proxy Daemon (MPD)
   Loaded: loaded (/etc/systemd/system/mpd.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-07-14 21:19:01 UTC; 4s ago
 Main PID: 3958 (mpd)
    Tasks: 1
   CGroup: /system.slice/mpd.service
           └─3958 /opt/xilinx/xrt/bin/mpd

Jul 14 21:19:01 ip-172-31-28-216.us-west-2.compute.internal systemd[1]: Started Xilinx Management P....
Jul 14 21:19:01 ip-172-31-28-216.us-west-2.compute.internal mpd[3958]: started
Jul 14 21:19:01 ip-172-31-28-216.us-west-2.compute.internal mpd[3958]: found mpd plugin: /opt/xilin...o
Jul 14 21:19:01 ip-172-31-28-216.us-west-2.compute.internal mpd[3958]: aws: load default afi to 000...0
Hint: Some lines were ellipsized, use -l to show in full.

Concurrently, I've also been desperately trying to get a simple Vitis getting_started example working on an T2 with the FPGA AMI setup.

It's very confusing that the examples (besides vadd which is what we have been discussing thus far and giving me no luck) are through Xilinx github and not aimed at AWS at all it seems. I followed the steps here:

https://github.com/Xilinx/Vitis-Tutorials/blob/master/docs/mixing-c-rtl-kernels/README.md

But can't get the "xilinx_u200_xdma_201830_2" platform running. I followed these steps on xilinx website:

https://www.xilinx.com/products/boards-and-kits/alveo/package-files-archive/u200-2018-3-2.html

And am unable to call yum install successfully:

Error: Package: xilinx-u200-xdma-dev-201830.2-2580015.x86_64 (/xilinx-u200-xdma-dev-201830.2-2580015.x86_64)
           Requires: xilinx-u200-xdma >= 201830.2

Even though i followed the step before and installed the latest xrt succesfully:

  Installing : xrt-2.3.1301-1.x86_64   
Complete!

Maybe these getting_started examples just aren't for AWS? I also considered changing the target platform to the one that is found in /opt/Xilinx/Vitis/2019.2/platforms, xcvc1902_fixed. Is this the standard platform for working with Amazon's F1 UltraScale+ xcvu9p FPGA's?

I don't know how to go about changing this as the target platform in the Vitis getting started example though (the mixing rtl kernels mentioned above).

answered 4 years ago
0

Hi,

Our github setup scripts should link to Vitis examples in the Vitis/examples directory that are tested to run on the AWS F1 platform.

We will remove Vitis tutorial link that points users to the Xilinx tutorials that are aimed towards their Alveo platform. The xilinx u200 example you listed will not work out of the box for F1. We will also take that feedback and see if we can come up with a guide to lift and shift examples from Alveo cards to F1.

Coming back to the original vadd example.

Is there any output from the dmesg AND sudo journalctl -u mpd command that you can share. The error -44 relates to the default AFI not being loaded on mpd start. A systemctl restart mpd command should have retried and loaded a default AFI for the xocl driver to bind to the pci device.

Please let me know if you can share the output here so we can debug further.

-Deep

Deep_P
answered 4 years ago
0

Another thing to verify is. Are you running these commands on an F1 instance? A T2 instance will not have FPGA's so you won't be able to run the application on it.

-Deep

Deep_P
answered 4 years ago
0

You mean running the original vadd example? That one I am.
Or rather the Xilinx Vitis RTL Kernel example? That's being done on a t2

End of Dmesg, let me know if more would be helpful!

[ 1198.611367] mailbox.u mailbox.u.13631488: mailbox_probe: successfully initialized
[ 1198.615023] xocl 0000:00:1d.0: __xocl_subdev_create: Created subdev mailbox inst 13631488
[ 1198.618959] xocl 0000:00:1d.0: __xocl_subdev_create: subdev mailbox.u inst 13631488 is active
[ 1198.622949] xocl 0000:00:1d.0: __xocl_subdev_create: creating subdev icap.u
[ 1198.626186] icap.u icap.u.15728640: icap_probe: successfully initialized FPGA IDCODE 0x0
[ 1198.629901] xocl 0000:00:1d.0: __xocl_subdev_create: Created subdev icap inst 15728640
[ 1198.633610] xocl 0000:00:1d.0: __xocl_subdev_create: subdev icap.u inst 15728640 is active
[ 1198.637313] xocl 0000:00:1d.0: xocl_p2p_mem_reserve: reserve p2p mem, bar 4, len 137438953472
[ 1200.232059] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 15 via SW
[ 1202.255455] mailbox.u mailbox.u.13631488: timeout_msg: found outstanding msg time'd out
[ 1202.263471] mailbox.u mailbox.u.13631488: timeout_msg: peer becomes dead
[ 1202.271689] xocl 0000:00:1d.0: xocl_mb_read_p2p_addr: dropped request (15), failed with err: -62
[ 1202.281008] [drm] Initialized xocl 2.3.0 20200206 for 0000:00:1d.0 on minor 1
[ 1202.290153] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 11 via SW
[ 1202.300274] xocl 0000:00:1d.0: xocl_mb_connect: ch_state 0x0, ret -107
[ 1202.309467] xocl 0000:00:1d.0: xocl_refresh_subdevs: get fdt from peer
[ 1202.316167] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 10 via SW
[ 1202.422765] mailbox.u mailbox.u.13631488: _xocl_drvinst_open: OPEN 1
[ 1202.428772] mailbox.u mailbox.u.13631488: dequeue_rx_msg: peer becomes active
[ 1202.435588] mailbox.u mailbox.u.13631488: process_request: received request from peer: 12, passed on
[ 1202.443579] xocl 0000:00:1d.0: xocl_mailbox_srv: received request (12) from peer
[ 1202.450381] xocl 0000:00:1d.0: xocl_mailbox_srv: mgmt driver online
[ 1202.456142] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 11 via SW
[ 1202.463076] mailbox.u mailbox.u.13631488: xocl_drvinst_close: CLOSE 2
[ 1202.468804] mailbox.u mailbox.u.13631488: xocl_drvinst_close: NOTIFY ffff8ce0389ee010
[ 1204.482437] mailbox.u mailbox.u.13631488: timeout_msg: found outstanding msg time'd out
[ 1204.489572] mailbox.u mailbox.u.13631488: timeout_msg: peer becomes dead
[ 1204.496197] xocl 0000:00:1d.0: xocl_mb_connect: ch_state 0x0, ret -62
[ 1204.502142] mailbox.u mailbox.u.13631488: process_request: received request from peer: 12, passed on
[ 1204.511058] xocl 0000:00:1d.0: xocl_mailbox_srv: received request (12) from peer
[ 1204.511067] xocl 0000:00:1d.0: xocl_refresh_subdevs: get fdt from peer
[ 1204.511512] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 10 via SW
[ 1204.531197] xocl 0000:00:1d.0: xocl_mailbox_srv: mgmt driver offline
[ 1205.423167] mailbox.u mailbox.u.13631488: _xocl_drvinst_open: OPEN 1
[ 1205.429378] mailbox.u mailbox.u.13631488: dequeue_rx_msg: peer becomes active
[ 1205.435456] mailbox.u mailbox.u.13631488: process_request: received request from peer: 12, passed on
[ 1205.443125] xocl 0000:00:1d.0: xocl_mailbox_srv: received request (12) from peer
[ 1205.449799] xocl 0000:00:1d.0: xocl_mailbox_srv: mgmt driver online
[ 1205.455206] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 11 via SW
[ 1205.462065] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 1205.469928] xocl 0000:00:1d.0: xocl_mb_connect: ch_state 0x1, ret 0
[ 1205.475515] xocl 0000:00:1d.0: xocl_refresh_subdevs: get fdt from peer
[ 1205.481311] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 10 via SW
[ 1205.488025] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 1247.720205] xocl 0000:00:1d.0: _xocl_drvinst_open: OPEN 1
[ 1247.724226] [drm] creating scheduler client for pid(3617), ret: 0
[ 1247.739764] icap.u icap.u.15728640: icap_read_from_peer: reading from peer
[ 1247.744720] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 10 via SW
[ 1247.750278] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 1247.786006] xocl 0000:00:1d.0: xocl_axlf_section_header: trying to find section header for axlf section 20
[ 1247.792887] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 0
[ 1247.798070] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 6
[ 1247.803359] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 8
[ 1247.808584] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 7
[ 1247.813702] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 11
[ 1247.818884] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 14
[ 1247.824395] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 2
[ 1247.830187] xocl 0000:00:1d.0: xocl_axlf_section_header: could not find section header 20
[ 1247.836561] [drm] Finding MEM_TOPOLOGY section header
[ 1247.839855] [drm] Section MEM_TOPOLOGY details:
[ 1247.842977] [drm]   offset = 0x2f8
[ 1247.845751] [drm]   size = 0x120
[ 1247.848317] icap.u icap.u.15728640: get_axlf_section_hdr: could not find section header 20
[ 1247.854924] icap.u icap.u.15728640: icap_download_bitstream_axlf: incoming xclbin: ba194825-4cbf-497b-92d6-9eea49579b1b
on device xclbin: 00000000-0000-0000-0000-000000000000
[ 1247.867243] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 8 via SW
[ 1247.874055] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 1248.660322] icap.u icap.u.15728640: __icap_peer_xclbin_download: peer xclbin download err: -5
[ 1248.666412] icap.u icap.u.15728640: get_axlf_section_hdr: section 8 offset: 1048, size: 88
[ 1248.672341] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 8, err: 0
[ 1248.679947] icap.u icap.u.15728640: get_axlf_section_hdr: section 6 offset: 760, size: 288
[ 1248.685757] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 6, err: 0
[ 1248.693088] icap.u icap.u.15728640: get_axlf_section_hdr: section 7 offset: 1136, size: 40
[ 1248.698965] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 7, err: 0
[ 1248.707267] icap.u icap.u.15728640: get_axlf_section_hdr: could not find section header 9
[ 1248.713384] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 9, err: -22
[ 1248.721285] icap.u icap.u.15728640: get_axlf_section_hdr: section 11 offset: 1176, size: 682
[ 1248.727595] icap.u icap.u.15728640: get_axlf_section_hdr: section 8 offset: 1048, size: 88
[ 1248.733766] icap.u icap.u.15728640: get_axlf_section_hdr: section 6 offset: 760, size: 288
[ 1248.740489] icap.u icap.u.15728640: icap_download_bitstream_axlf: icap_download_bitstream_axlf err: -5
[ 1248.747067] xocl 0000:00:1d.0: exec_reset: exec_reset(1) cfg(0)
[ 1248.751251] xocl 0000:00:1d.0: exec_reset: exec_reset resets
[ 1248.755260] xocl 0000:00:1d.0: exec_reset: exec->xclbin(00000000-0000-0000-0000-000000000000),xclbin(00000000-0000-0000-0000-000000000000)
0000-0000-0000-0000-000000000000)
[ 1248.764029] xocl 0000:00:1d.0: xocl_read_axlf_helper: Failed to download xclbin, err: -5
[ 1248.777597] [drm] client exits pid(3617)
[ 1248.780579] xocl 0000:00:1d.0: xocl_drvinst_close: CLOSE 2
[ 1248.784559] xocl 0000:00:1d.0: xocl_drvinst_close: NOTIFY ffff8ce86da1ec10
[ 1291.910334] xocl 0000:00:1d.0: _xocl_drvinst_open: OPEN 1
[ 1291.915017] [drm] creating scheduler client for pid(3660), ret: 0
[ 1291.931838] icap.u icap.u.15728640: icap_read_from_peer: reading from peer
[ 1291.937376] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 10 via SW
[ 1291.943854] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 1291.984157] xocl 0000:00:1d.0: xocl_axlf_section_header: trying to find section header for axlf section 20
[ 1291.991457] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 0
[ 1291.996906] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 6
[ 1292.002201] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 8
[ 1292.007399] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 7
[ 1292.012829] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 11
[ 1292.018248] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 14
[ 1292.023815] xocl 0000:00:1d.0: xocl_axlf_section_header: saw section header: 2
[ 1292.028894] xocl 0000:00:1d.0: xocl_axlf_section_header: could not find section header 20
[ 1292.034763] [drm] Finding MEM_TOPOLOGY section header
[ 1292.038237] [drm] Section MEM_TOPOLOGY details:
[ 1292.041520] [drm]   offset = 0x2f8
[ 1292.042946] [drm]   size = 0x120
[ 1292.045411] icap.u icap.u.15728640: get_axlf_section_hdr: could not find section header 20
[ 1292.051343] icap.u icap.u.15728640: icap_download_bitstream_axlf: incoming xclbin: ba194825-4cbf-497b-92d6-9eea49579b1b
on device xclbin: 00000000-0000-0000-0000-000000000000
[ 1292.063285] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 8 via SW
[ 1292.068879] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 1293.109488] icap.u icap.u.15728640: __icap_peer_xclbin_download: peer xclbin download err: -5
[ 1293.115704] icap.u icap.u.15728640: get_axlf_section_hdr: section 8 offset: 1048, size: 88
[ 1293.121865] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 8, err: 0
[ 1293.129320] icap.u icap.u.15728640: get_axlf_section_hdr: section 6 offset: 760, size: 288
[ 1293.135307] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 6, err: 0
[ 1293.142882] icap.u icap.u.15728640: get_axlf_section_hdr: section 7 offset: 1136, size: 40
[ 1293.148960] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 7, err: 0
[ 1293.156819] icap.u icap.u.15728640: get_axlf_section_hdr: could not find section header 9
[ 1293.162599] icap.u icap.u.15728640: icap_parse_bitstream_axlf_section: icap_parse_bitstream_axlf_section kind 9, err: -22
[ 1293.170543] icap.u icap.u.15728640: get_axlf_section_hdr: section 11 offset: 1176, size: 682
[ 1293.176570] icap.u icap.u.15728640: get_axlf_section_hdr: section 8 offset: 1048, size: 88
[ 1293.183253] icap.u icap.u.15728640: get_axlf_section_hdr: section 6 offset: 760, size: 288
[ 1293.190034] icap.u icap.u.15728640: icap_download_bitstream_axlf: icap_download_bitstream_axlf err: -5
[ 1293.197932] xocl 0000:00:1d.0: exec_reset: exec_reset(1) cfg(0)
[ 1293.203029] xocl 0000:00:1d.0: exec_reset: exec_reset resets
[ 1293.207974] xocl 0000:00:1d.0: exec_reset: exec->xclbin(00000000-0000-0000-0000-000000000000),xclbin(00000000-0000-0000-0000-000000000000)
[ 1293.217819] icap.u icap.u.15728640: icap_read_from_peer: reading from peer
[ 1293.222881] mailbox.u mailbox.u.13631488: mailbox_request: sending request: 10 via SW
[ 1293.228879] mailbox.u mailbox.u.13631488: mailbox_read: Software TX msg is too big
[ 1293.251554] xocl 0000:00:1d.0: xocl_read_axlf_helper: Failed to download xclbin, err: -5
[ 1293.266506] [drm] client exits pid(3660)
[ 1293.269365] xocl 0000:00:1d.0: xocl_drvinst_close: CLOSE 2
[ 1293.273385] xocl 0000:00:1d.0: xocl_drvinst_close: NOTIFY ffff8ce86da1ec10

Here's the output of journalctl:

[centos@ip-172-31-28-216 ~]$ sudo journalctl -u mpd 
-- Logs begin at Wed 2020-07-15 21:38:55 UTC, end at Wed 2020-07-15 22:01:46 UTC. --
Jul 15 21:58:17 ip-172-31-28-216.us-west-2.compute.internal systemd[1]: Started Xilinx Management Proxy
Jul 15 21:58:17 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: started
Jul 15 21:58:17 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: found mpd plugin: /opt/xilinx/xr
Jul 15 21:58:17 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: aws: load default afi to 0000:00
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: aws mpd plugin init called: 0
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: create thread pair for 0000:00:1
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: 1 pairs of threads running...
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 56 bytes out of
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 56 bytes out of
Jul 15 21:58:25 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd_getMsg thread for
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd_handleMsg thread 
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev: remove /devices/pci0000:00
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev: add /devices/pci0000:00/00
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: create thread pair for 0000:00:1
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: 1 pairs of threads running...
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: udev msg arrived on fd 4
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 56 bytes out of
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] retrieved msg size fr
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] read 72 bytes out of 
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AwsDev: 0000:00:1d.0(index: 0)
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: request 1
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 2104 bytes out 
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] retrieved msg size fr
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] read 80 bytes out of 
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: request 1
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AwsDev: 0000:00:1d.0(index: 0)
Jul 15 21:58:28 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 524360 bytes ou
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] retrieved msg size fr
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] read 80 bytes out of 
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AwsDev: 0000:00:1d.0(index: 0)
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: request 1
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 128 bytes out o
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo

Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] retrieved msg size fr
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] read 13011 bytes out 
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AwsDev: 0000:00:1d.0(index: 0)
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: request 8
Jul 15 21:59:10 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AFI not yet loaded, proceed to d
Jul 15 21:59:11 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: Failed to load AFI, error: 5
Jul 15 21:59:11 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: response 
Jul 15 21:59:11 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 36 bytes out of
Jul 15 21:59:54 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] retrieved msg size fr
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] read 80 bytes out of 
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AwsDev: 0000:00:1d.0(index: 0)
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: request 1
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 128 bytes out o
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] retrieved msg size fr
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] read 13011 bytes out 
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AwsDev: 0000:00:1d.0(index: 0)
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: request 8
Jul 15 21:59:55 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AFI not yet loaded, proceed to d
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: Failed to load AFI, error: 5
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: response 
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 36 bytes out of
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] msg arrived on mailbo
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] retrieved msg size fr
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] read 80 bytes out of 
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: AwsDev: 0000:00:1d.0(index: 0)
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] mpd daemon: request 1
Jul 15 21:59:56 ip-172-31-28-216.us-west-2.compute.internal mpd[3585]: [0:0:1d.0] write 128 bytes out o
answered 4 years ago
0

Awesome, thank you so much--definitely helps and I think I'll be able to get it working now!

On another support thread I asked about the RTL Kernel example in the Vitis directory (which actually links to Xilinx getting started exmples for Alveo U200, incompatible with F1).

Do you know of any examples with Vitis that are officially (or properly) documented to get Verilog code running on the FPGA. What I want is C/C++ code running on a host computer, and my own Verilog running inside the FPGA. It looks like OpenCL makes it easier to pass data between those two. Using the HDK approach I think will be harder.

More specifically, I ultimately want to have TCP packets (no handshake needed) come in from the network and get processed on the FPGA then pushed back onto the network. Vitis (well SDAccel seemed promising but is outdated) I think is the best solution for this but I don't see any Verilog/system verilog details for it anywhere.

answered 4 years ago
0

Hi,

We link to a Xilinx example submodule within Vitis/examples/ that is tested to run on F1. The current 2019.2 release points to the examples here: https://github.com/Xilinx/Vitis_Accel_Examples/tree/bb80c8ec699c3131e8874735bd99475ac6fe2ec7

For RTL designs, you could develop an RTL kernel and use one of the examples here as a starting point: https://github.com/Xilinx/Vitis_Accel_Examples/tree/bb80c8ec699c3131e8874735bd99475ac6fe2ec7/rtl_kernels

Let us know if that isn't helpful and we'd be happy to help.

-Deep

Deep_P
answered 4 years ago
0

Follow-up on the solution listed, when you re-ran the helloworld ocl--I'm unsure what you mean by reran. are you following the "Build the Host Application and Xilinx FPGA Binary" step from the Vitis guide here? https://github.com/aws/aws-fpga/tree/master/Vitis
I retried the two steps for the above and got this error:

[04:51:01] Finished 5th of 6 tasks (FPGA routing). Elapsed time: 00h 25m 35s 

[04:51:01] Starting bitstream generation..
[05:02:12] Run vpl: Step impl: Failed
[05:02:13] Run vpl: FINISHED. Run Status: impl ERROR
WARNING: [VPL 60-732] Link warning: No monitor points found for BD automation.
ERROR: [VPL 60-704] Integration error, problem implementing dynamic region, route_design ERROR, please look at the run log file '/home/centos/aws-fpga/Vitis/examples/xilinx_2019.2/hello_world/build_dir.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/link/vivado/vpl/prj/prj.runs/impl_1/runme.log' for more information
ERROR: [VPL 60-1328] Vpl run 'vpl' failed
ERROR: [VPL 60-806] Failed to finish platform linker
INFO: [v++ 60-1442] [05:02:14] Run run_link: Step vpl: Failed

And the relevant file ending is:

INFO: [Timing 38-480] Writing timing data to binary archive.
Writing XDEF routing.
Writing XDEF routing logical nets.
Writing XDEF routing special nets.
WARNING: [Runs 36-520] DcpWriter() - An error happened while closing the dcp.xml writer
Netlist sorting complete. Time (s): cpu = 00:00:00.06 ; elapsed = 00:00:00.06 . Memory (MB): peak = 11183.848 ; gain = 0.000 ; free physical = 18120 ; free virtual = 41664
Writing placer database...
INFO: [Timing 38-480] Writing timing data to binary archive.
Writing XDEF routing.
Writing XDEF routing logical nets.
Writing XDEF routing special nets.
WARNING: [Runs 36-520] DcpWriter() - An error happened while closing the dcp.xml writer
ERROR: [Common 17-49] Internal Data Exception: HDDMProto::writeMessage failed

   while executing
"write_checkpoint -force top_sp_routed_error.dcp"
   invoked from within
"if {$rc} {
 write_checkpoint -force top_sp_routed_error.dcp
 step_failed route_design
 return -code error $RESULT
} else {
 end_step route_design
..."
   (file "top_sp.tcl" line 362)

Aren't these the Xilinx guides you suggested are not compatible with F1 because they use Alveo U200?

Edited by: Frankie-guz on Jul 20, 2020 10:57 AM

answered 4 years ago
0

I created an AFI in region us-east-1. When I tried to run the AFI in us-west-2, it failed. I then re-ran after copying the AFI to the region I had my instance in for it to work.

Seems like this is a write error, can you make sure there is enough space on disk:

df -h

The Xilinx guides you followed here are focused on alveo, but the ones linked through our repository in the examples folder should be fine:

https://github.com/Xilinx/Vitis-Tutorials/blob/master/docs/mixing-c-rtl-kernels/README.md
https://www.xilinx.com/products/boards-and-kits/alveo/package-files-archive/u200-2018-3-2.html

-Deep

Deep_P
answered 4 years ago
0

thanks!

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions