Does a public AMI need to include the design's .aswxclbin file

0

Hi
I have finished the dev part of my F1 application, and now want to publish an AMI including my application to some beta testers.
I have found the links to creating how to create an AMI (https://github.com/aws/aws-fpga/blob/master/SDAccel/docs/Create_Runtime_AMI.md), however I have a question about whether the public AMI needs to include the design's "binary_container_1.awsxclbin" file?

During development, the opencl driver code reads this file, following the standard Xilinx examples, eg

std::cout << "INFO: loading kernel from  " << vmulBinaryFile << std::endl;  
cl::Program::Binaries vmul_bins = xcl::import_binary_file(vmulBinaryFile);  
devices.resize(1);  
cl::Program program(context, devices, vmul_bins);  
   OCL_CHECK(err, krnl_combined = cl::Kernel(program,"my_kernels_name", &err));  

I understand for the published AMI I could load the FPGA image using the fpga-load-image with the agfi-xxx for my image, to avoid the ocl program statement. However, what would be the ocl code to replace the last line above, since now I don't have the program class.

So, in summary, I understand that the .awsxclbin file is encrypted, but I would rather not include it, if possible.
If I still need to include it in the public AMI, then what is the point of all the agfi/afi stuff?

Cheers

asked 3 years ago207 views
8 Answers
0

Hi,

The awsxclbin file contains information about the AFI to load as well as other memory, connectivity and other kernel metadata that is needed by the runtime. It is the same as an xclbin file, but with the bitstream abstracted out as an AGFI ID.

Is there any specific reason you'd like to not share the awsxclbin file in the AMI? If you have concerns about any proprietary information in the awsxclbin file, please let me know via a private message or via an AWS Support Case and we can look into this use case.

The AFI's by default are only loadable by the AWS account that created them. You have to explicitly share the AFI's with the beta tester accounts: https://github.com/aws/aws-fpga/blob/master/hdk/docs/fpga_image_attributes.md to be loadable by them. You can also make the AFI public to be accessible to all.

So for eg, if you make your AMI public, you can still control who can load your accelerator by limiting AFI load permissions.

If you are planning to publish your AMI via the marketplace, the load permissions are tied to subscribers of your Marketplace product.

Let us know if you still have follow up questions and we'll be happy to answer them for you.

-Deep

Deep_P
answered 3 years ago
0

Hi Deep
Ok, thanks for clearing that up. As long as the xclbin stuff is abstracted out, that should be secure enough.

Also, is there any more detailed info about creating a public AMI for F1 applications other than (https://github.com/aws/aws-fpga/blob/master/SDAccel/docs/Create_Runtime_AMI.md)?
I am trying to create the AMI from a standard Ubuntu 16.04 image, rather than the F1 AMI, and having trouble with xrt versions as I can't run the sdaccel_xxx.sh scripts completely, as Vivado isn't installed.
I understood that I would not be able to publish the AMI if I started with the market place F1 AMI, as this includes the XIlinx tools. If this is not the case then it would be easier to start with the F1 AMI?

Cheers

Greg

answered 3 years ago
0

Hi Greg,

You are right about the Marketplace AMI and Xilinx tools. The EULA limits re-publishing the Developer AMI as it includes Xilinx tools.

However it should be straightforward to build XRT as you do not need Xilinx tools installed. The process described in the document is for when XRT didn't exist and we had to use XOCL and some artifacts from the tool itself.

All you need is the aws-fpga repository cloned and the aws-fpga mgmt library installed to be able to include it in the deb builds.

git clone https://github.com/aws/aws-fpga
cd aws-fpga
export AWS_FPGA_REPO_DIR=$PWD
source sdk_setup.sh
cd SDAccel/Runtime
git clone https://github.com/Xilinx/XRT
cd XRT
git checkout -t origin/<# Replace with XRT Release tag you want to use>
./src/runtime_src/tools/scripts/xrtdeps.sh
cd build
./build.sh

This should get you the debs you need for your runtime. Keep in mind that if you are using Vitis 2019.2 or above, XRT requires mpd which needs a default AFI to be loaded for the driver to bind:https://github.com/aws/aws-fpga/blob/master/Vitis/docs/XRT_installation_instructions.md

Hope this helps. I'll also update the doc to include these instructions.

-Deep

Deep_P
answered 3 years ago
0

Hi Deep

That was a great help, but still not quite there!!
So I followed the instructions to load xrt. I am using SDaccel 18.2 on ubuntu 16.04, by the way.
On reboot xbutil scan showed zero cards, so I stopped mpd, loaded my agfi manually fine (fpga-load-local-image -S 0 -I agfi-xxxx), then started mpd, and checked it:

systemctl status mpd
● mpd.service - Xilinx Management Proxy Daemon (MPD)
Loaded: loaded (/etc/systemd/system/mpd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-10-07 05:51:33 UTC; 7s ago
Main PID: 3136 (mpd)
Tasks: 3
Memory: 680.0K
CPU: 4ms
CGroup: /system.slice/mpd.service
└─3136 /opt/xilinx/xrt/bin/mpd

Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: AwsDev: 0000:00:1d.0(index: 0)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: AwsDev: 0000:00:1d.0(index: 0)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] read 80 bytes out of 80 bytes from fd 5, valid: 1
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] mpd daemon: request 10 received(reqSize: 32)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] write 524360 bytes out of 524360 bytes to fd 5
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] msg arrived on mailbox fd 5
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] retrieved msg size from mailbox: 40 bytes
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] read 72 bytes out of 72 bytes from fd 5, valid: 1
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] mpd daemon: request 11 received(reqSize: 24)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] write 2104 bytes out of 2104 bytes to fd 5

Then, again with xbutil scan:

xbutil scan
INFO: Found total 1 card(s), 1 are usable

System Configuration  
OS name:	Linux  
Release:	4.4.0-1114-aws  
Version:	#127-Ubuntu SMP Fri Sep 4 08:41:12 UTC 2020  
Machine:	x86_64  
Model:		HVM domU  
CPU cores:	8  
Memory:		122894 MB  
Glibc:		2.23  
Distribution:	Ubuntu 16.04.6 LTS  
Now:		Wed Oct  7 05:51:47 2020 GMT  

XRT Information
Version: 2.8.0
Git Hash: 6660c520f73874b221df915c8232230c19eae23d
Git Branch: 65ffad62f427c0bd1bc65b6ea555a810295468b7
Build Date: 2020-10-07 05:13:33
XOCL: 2.8.0,6660c520f73874b221df915c8232230c19eae23d
XCLMGMT: unknown

 \[0] 0000:00:1d.0 xilinx_aws-vu9p-f1_dynamic_5_0(ID=0xabcd) user(inst=128).  
  
So it appears all good, but when I run the application, I get from opencl program command:  
  
SETUP the OPENCL structures  
platform Name: Xilinx  
Vendor Name : Xilinx  
Found Platform  
Found Device=xilinx_aws-vu9p-f1_dynamic_5_0  
INFO: loading kernel from  ./binary_container_1.awsxclbin  
INFO: Importing ./binary_container_1.awsxclbin  
Loading: './binary_container_1.awsxclbin'  
XRT build version: 2.8.0  
Build hash: 6660c520f73874b221df915c8232230c19eae23d  
Build date: 2020-10-07 05:13:33  
Git branch: 65ffad62f427c0bd1bc65b6ea555a810295468b7  
PID: 3183  
UID: 0  
\[Wed Oct  7 05:56:27 2020 GMT]  
HOST: ip-172-31-4-181  
EXE: /home/ubuntu/src/long_read_msc.exe  
\[XRT] ERROR: See dmesg log for details. err=-5  
\[XRT] ERROR: Failed to load xclbin.  
\[XRT] ERROR: program is nullptr  
../src/run_match_sril_combined.cpp:474 Error calling krnl_combined = cl::Kernel(program,"match_sril_combined", &err), error code is: -44  
  
So looking at dmesg, the relevant bit seems to be:  
  
 437.334830] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 __icap_download_bitstream_axlf: incoming xclbin: aefa6df5-4a6a-46c5-a4fa-2057668f2012  
               on device xclbin: 00000000-0000-0000-0000-000000000000  
\[  437.334881] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 icap_cache_bitstream_axlf_section: found kind 6(MEM_TOPOLOGY)  
\[  437.335059] xocl 0000:00:1d.0: mailbox.u.9437184 ffff881e6194bc10 mailbox_request: sending request: 8 via SW  
\[  437.335377] xocl 0000:00:1d.0: mailbox.u.9437184 ffff881e6194bc10 mailbox_read: Software TX msg is too big  
\[  438.274680] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 __icap_peer_xclbin_download: peer xclbin download err: -5*_  
\[  438.284924] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 icap_download_bitstream_axlf: err: -5  
\[  438.284928] xocl 0000:00:1d.0:  ffff881e6cd43098 exec_reset: exec_reset(56) cfg(0)  
\[  438.284931] xocl 0000:00:1d.0:  ffff881e6cd43098 exec_reset: exec_reset resets  
\[  438.284933] xocl 0000:00:1d.0:  ffff881e6cd43098 exec_reset: exec->xclbin(00000000-0000-0000-0000-000000000000),xclbin(00000000-0000-0000-0000-000000000000)  
\[  438.284940] xocl 0000:00:1d.0:  ffff881e6cd43098 xocl_init_mem: Topology count = 4, data_length = 160  
\[  438.284944] xocl 0000:00:1d.0: p2p.u.10485760 ffff881e693b6410 p2p_mem_init: already initialized  
\[  438.284947] xocl 0000:00:1d.0:  ffff881e6cd43098 xocl_read_axlf_helper: Failed to download xclbin, err: -5  
\[  438.330007] \[drm] client exits pid(3174)  
\[  438.330012] xocl 0000:00:1d.0:  ffff881e6cd43098 xocl_drvinst_close: CLOSE 2  
\[  438.330015] xocl 0000:00:1d.0:  ffff881e6cd43098 xocl_drvinst_close: NOTIFY ffff881e76c1a410  
  
so an xclbin download error.  
  
Possibly, the problem is that I generated the designs afi on us-west-2, and now my AMI is on ap-southeast-2. However, I used the copy-fpga-image cli command to generate a new afi (for ap-southeast-2), and it shows as available when I run aws ec2 describe-fpga-images with the new afi.  
  
Also, having to stop mpd, manually load the image, and start it again is a bit of a pain, is there a better way?  
  
Cheers  
  
Greg
answered 3 years ago
0

Hi Deep
I fixed the problem with the afi/agfi, I mistakenly used an old .awsxclbin file, so there was a mismatch.
So I can run my kernel on the AMI I have created, however, I still need to stop mpd, fpga-load-image with my agfi, then start mpd.
After all that I can run my code.
Any ideas about why this happens?
Cheers
Greg

answered 3 years ago
0

Hi Greg,

That is because XOCL won't bind to a cleared AFI state. mpd opens up fd's into a subdevice within XOCL once XOCL binds to a device id. Xilinx has added more details on it here: https://xilinx.github.io/XRT/2019.2/html/cloud_vendor_support.html

MPD is something that was added for Vitis 2019.2 and if you are using SDAccel 2018.2, you should be able use the XRT version validated for SDAccel 2018.2: https://github.com/Xilinx/XRT/releases/tag/2018.2_XDF.RC5

I hope this helps and feel free to let us know if you have any other questions here!

Thanks,

Deep

Deep_P
answered 3 years ago
0

Hi Deep
I can compile the .deb files for XRT version 2018.2_XDF.RC5, but when I try to install xrt_201802.2.1.0_16.04-xrt.deb, it breaks with the following output:

root@ip-172-31-4-181:~/aws-fpga/SDAccel/Runtime/XRT/build/Release# apt install ./xrt_201802.2.1.0_16.04-xrt.deb
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'xrt' instead of './xrt_201802.2.1.0_16.04-xrt.deb'
The following NEW packages will be installed:
xrt
0 upgraded, 1 newly installed, 0 to remove and 19 not upgraded.
Need to get 0 B/4,576 kB of archives.
After this operation, 21.9 MB of additional disk space will be used.
Get:1 /home/ubuntu/aws-fpga-1.4.15a/SDAccel/Runtime/XRT/build/Release/xrt_201802.2.1.0_16.04-xrt.deb xrt amd64 2.1.0 [4,576 kB]
Selecting previously unselected package xrt.
(Reading database ... 131543 files and directories currently installed.)
Preparing to unpack .../xrt_201802.2.1.0_16.04-xrt.deb ...
Unpacking xrt (2.1.0) ...
Setting up xrt (2.1.0) ...
Registering new XRT Linux kernel module sources 2.1.0 with dkms

Creating symlink /var/lib/dkms/xrt/2.1.0/source ->
/usr/src/xrt-2.1.0

DKMS: add completed.
Building XRT Linux kernel modules sources with dkms

Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area....
cd driver/xclng/drm/xocl; make; cd ../../../.......
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/xrt.0.crash'
Error! Build of xocl.ko failed for: 4.4.0-1114-aws (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/xrt/2.1.0/build/ for more information.
Installing XRT Linux kernel modules sources with dkms

Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area....
cd driver/xclng/drm/xocl; make; cd ../../../.......
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/xrt.0.crash'
Error! Build of xocl.ko failed for: 4.4.0-1114-aws (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/xrt/2.1.0/build/ for more information.
Loading new XRT Linux kernel modules
modprobe: FATAL: Module xclmgmt not found in directory /lib/modules/4.4.0-1114-aws
modprobe: FATAL: Module xocl not found in directory /lib/modules/4.4.0-1114-aws

The crash file mentioned is:
root@ip-172-31-4-181:~/src# more /var/crash/xrt.0.crash
ProblemType: Package
DKMSBuildLog:
DKMS make.log for xrt-2.1.0 for kernel 4.4.0-1114-aws (x86_64)
Thu Oct 8 01:19:28 UTC 2020
cd userpf; make all
make[1]: Entering directory '/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf'
echo /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf
/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf
make -C /lib/modules/4.4.0-1114-aws/build M=/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf modules
make[2]: Entering directory '/usr/src/linux-headers-4.4.0-1114-aws'
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../xocl_subdev.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../xocl_ctx.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../xocl_thread.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mm_xdma.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/feature_rom.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mm_qdma.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mb_scheduler.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mailbox.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/xvc.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/icap.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.o
/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.c:689:12: error: static declaration of ‘stream_open’ follows non-static declaration
static int stream_open(struct inode **inode, struct file **file)
^
In file included from include/linux/dma-buf.h:32:0,
from /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.c:19:
include/linux/fs.h:2744:12: note: previous declaration of ‘stream_open’ was here
extern int stream_open(struct inode ** inode, struct file ** filp);
^
scripts/Makefile.build:285: recipe for target '/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.o' failed
make[3]: [/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.o] Error 1
Makefile:1471: recipe for target 'module/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf' failed
make[2]: [module/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf] Error 2
make[2]: Leaving directory '/usr/src/linux-headers-4.4.0-1114-aws'
Makefile:53: recipe for target 'all' failed
make[1]: [all] Error 2
make[1]: Leaving directory '/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf'
Makefile:2: recipe for target 'all' failed
make: [all] Error 2
DKMSKernelVersion: 4.4.0-1114-aws
Date: Thu Oct 8 01:19:35 2020
DuplicateSignature: dkms:xrt:2.1.0:/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.c:689:12: error: static declaration of ‘stream_open’ follows non-static declaration
Package: xrt 2.1.0
PackageVersion: 2.1.0
SourcePackage: xrt
Title: xrt 2.1.0: xrt kernel module failed to build

Any ideas?

Cheers
Greg

answered 3 years ago
0

Hi Deep
Ok, I found a patch for that problem in the forums, and finished the compilation. It all works fine now.
Thanks again for your help
Cheers
Greg

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions