Vitis host program has to be match with afi category

0

Hello, I meet such an issue (or something that looks not straightforward for understanding)--post here if someone can help to explian..
First of all, two factors:
factor1: a typical Xilinx Vitis developed application can be executed as below on AWS F1 machine:
./host_exec ./(kernel).awsxclbin
factor2: there are two kinds of kernel on AWS F1 based on the tools that created it: a sdk developed kernel (name as sdk_krl ), or a hdk developed kernel (I name it as hdk_krl).

OK, what I find is: when my application kernel file is a sdk_krl kernel, then on AWS F1 machine, I MUST have an sdk_krl kernel preloaded (by using sudo fpga-load-local-image -S 1 -I agfi-xxx, here agfi-xxx must be an sdk_krl kernel, but not necessary to be the exact kernel of the current application).
In the above case, if I have an hdk_krl kernel, then when I run the application I will meet xrt ERROR as below:
[XRT] ERROR: See dmesg log for details. err=-22
[XRT] ERROR: Failed to load xclbin.

This problem hurdle me for quite a while to debug..
So my question:

  1. Why is that? as for local FPGA machine there is no such difference.
  2. is there some better way to handle that--I mean that means if I have two applications one is hdk kernel another is sdk kernel, I can not simply run the different host program to load them (as on a local FPGA machine) but need using "sudo fpga-load-local-image -S 1 -I agfi-xxx" command to switch over--that sound quite weird.
asked 3 years ago247 views
9 Answers
0

Hi macleonsh,

When you start an F1 instance, the FPGA is in a cleared state. The cleared FPGA presents the Device ID 0x1042

Two main things to remember here are:

  1. XRT(XOCL) driver does not bind with Device ID 0x1042.
  2. XRT MPD opens up a file descriptor on a mailbox subdevice when it binds to a Device ID listed in here: https://github.com/Xilinx/XRT/blob/master/src/runtime_src/core/pcie/driver/linux/xocl/devices.h

If you are using Vitis to run your application, whether it is using Vitis rtl kernel or an ocl kernel, you need the xocl driver to bind with the device. If the slot is in a cleared state, it will not bind to the slot.

To get Xocl to bind to a slot, it needs to have a Device ID loaded that it can bind to.
When you call "sudo fpga-load-local-image -S 1 -I agfi-xxx" or restart MPD, the Device ID loaded is such that it lets the XOCL driver bind.
Note that restarting MPD will automatically load a default AFI(in GA regions only)

In your case, are you switching between Vivado generated AFI's and Vitis generated AFI's? I want to try to understand your flow a bit more to see how we can make this easier for you.

-Deep

Deep_P
answered 3 years ago
0

Thanks for the feedback, Deep,
Yes I understand that xocl driver need to bind with the device.

Something special in China region is the default AFI can not be accessed so I manually loaded hello-world afi (agfi-0fcf87119b8e97bf3, which is vivado afi--after loading did is f000).

In such case when I launch up one of my applications, which is Vitis created afi, the XRT error as "ERROR: See dmesg log for details. err=-22" will be reported.

It tooks me quite a while to find the reason as the same application runs well on my local FPGA machine. Later I realized if I must preload one whatever afi created by vitis (after loading did=f010) then my vitis application will run well.

This limitation is not serious --but considering if someone who has two kernels (one from vitis, another from Vivado) then this will become a potential issue --through the workaround is also there (preload the relevant afi).maybe you shall put some reminder in help document, to prevent someone else from spending time to "debug it"..

answered 3 years ago
0

Hi,
I believe this is due to the difference in Device ID.

When when your program wants to load an AFI with a different Device ID the following steps happen:

  • The udev monitor will trigger from a ‘remove’ event. It will cause the MPD main thread to close the 2 child threads for this slot.
  • AWS fpga management library calls will load the new AFI, make system calls to remove the device from the PCI bus and rescan the PCI bus.
  • This causes the XOCL driver to unbind.
  • Now the slot is not programmable from the OpenCL application anymore. When that changes, mpd needs to close the file descriptors while the aws-fpga library makes the call to rescan sysfs

So to keep mpd running, it would be better to load an afi that has the device ID you want all your accelerators to run with. That would remove the need for a rescan and xocl will not unbind.

I hope this helps. Let us know if you have more questions around this and we'd be happy to help.

-Deep

Deep_P
answered 3 years ago
0

Hi Deep.
"It would be better to load an afi that has the device ID you want all your accelerators to run with. " I am not fully understand this statement..

In another word, as soon as I have two different Dids (This will happen suppose I have two kernel, one from vitis and another from vivado)--and if I need switch over from one to another, I shall always preload the relevat afi before I run the application -- There is no alternative way, is that right?

answered 3 years ago
0

Hi,

Due to the way the XRT MPD opens file descriptors into the sub-device, if the Device ID's change(basically a pcie remove and rescan), MPD will shut down as it detects a remove event and you would have to start MPD again after the new AFI is loaded.

If you could use the same device id for both the Vitis and Vivado based kernels, it would not require a re-scan from the management libraries and could help. However, if you are using Vivado and Vitis generated kernels, what driver are you using with both? Are you using two different device id's as you are using two different drivers to bind to the different kernels?

-Deep

Deep_P
answered 3 years ago
0

If you could use the same device id for both the Vitis and Vivado based kernels--> this is interesting to be known, what shall I do to get the same device id? I thought it was designed to be different...

"However, if you are using Vivado and Vitis generated kernels, what driver are you using with both? Are you using two different device id's as you are using two different drivers to bind to the different kernels"
I think the device id is generated during afi creation, to be specifically:

  • for vitis, after fpga binary build, use the script : $VITIS_DIR/tools/create_vitis_afi.sh to convert the xclbin to awsxclbin and generate afi/agfi, a design tar package will be uploaded to S3. check such afi the dvice id is f000
  • for vivado, after fpga binary build, call "aws ec2 create-fpga-image" command to convert the design file to upload the design tar to S3 (I checked out it actually just the dcp file) and get the afi/agfi. check such afi the dvice id is f010

For both build, I did not use the different device driver I suppose? I shall use the same version AWS_FPGA release, same xrt/xocl version. target platform shall be same as well.
Oh I just realized the vitis build was made on the Ubuntu machine and the vivado build was made on one Centos machine -- will that cause any difference on device id?

answered 3 years ago
0

You could update the Device ID in the create_vitis_afi.sh script:
https://github.com/aws/aws-fpga/blob/master/Vitis/tools/create_vitis_afi.sh We suggest keeping the Vitis Device ID F010

You can update the Device ID for the HDK flow in an ID_defines files for eg: https://github.com/aws/aws-fpga/blob/master/hdk/cl/examples/cl_dram_dma/design/cl_id_defines.vh

That should be automatically picked up in the manifest created by this script here: https://github.com/aws/aws-fpga/blob/master/hdk/common/shell_v04261818/build/scripts/aws_build_dcp_from_cl.sh

The XRT XOCL device driver should bind with either of these once you setup the same Device ID.

I hope this helps! Let me know if you have any further questions!

-Deep

Deep_P
answered 3 years ago
0

Thank a lot Deep,
Just to double-check, from this definition:
define CL_SH_ID0 32'hF001_1D0F

this defines the device id=f001, correct? I wonder where the "f000" value of some my AFI came from? (I check out some different older branched, they all have the same value F001, not F000).

If I want to have the same device id with Vitis, I can simple change this value to be 32'hF010_1D0F, correct?

answered 3 years ago
0

To change the device id for your Vivado created cl's, changing the define will work.
The Device ID's are picked up by the aws_build_dcp_from_cl.sh script from that file. If you created AFI's using cl_hello_world as the template, you might see them set to F000 in there: https://github.com/aws/aws-fpga/blob/master/hdk/cl/examples/cl_hello_world/design/cl_id_defines.vh

This is basically picked up from the id_defines file and added to the ingestion manifest by the script. So if you modify the file, the script will use that modification to create the manifest file used for AFI creation.

For the Vitis flow, it is set to F010 by default.

I hope that helps.

-Deep

Deep_P
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions