- Newest
- Most votes
- Most comments
Hi,
The awsxclbin file contains information about the AFI to load as well as other memory, connectivity and other kernel metadata that is needed by the runtime. It is the same as an xclbin file, but with the bitstream abstracted out as an AGFI ID.
Is there any specific reason you'd like to not share the awsxclbin file in the AMI? If you have concerns about any proprietary information in the awsxclbin file, please let me know via a private message or via an AWS Support Case and we can look into this use case.
The AFI's by default are only loadable by the AWS account that created them. You have to explicitly share the AFI's with the beta tester accounts: https://github.com/aws/aws-fpga/blob/master/hdk/docs/fpga_image_attributes.md to be loadable by them. You can also make the AFI public to be accessible to all.
So for eg, if you make your AMI public, you can still control who can load your accelerator by limiting AFI load permissions.
If you are planning to publish your AMI via the marketplace, the load permissions are tied to subscribers of your Marketplace product.
Let us know if you still have follow up questions and we'll be happy to answer them for you.
-Deep
Hi Deep
Ok, thanks for clearing that up. As long as the xclbin stuff is abstracted out, that should be secure enough.
Also, is there any more detailed info about creating a public AMI for F1 applications other than (https://github.com/aws/aws-fpga/blob/master/SDAccel/docs/Create_Runtime_AMI.md)?
I am trying to create the AMI from a standard Ubuntu 16.04 image, rather than the F1 AMI, and having trouble with xrt versions as I can't run the sdaccel_xxx.sh scripts completely, as Vivado isn't installed.
I understood that I would not be able to publish the AMI if I started with the market place F1 AMI, as this includes the XIlinx tools. If this is not the case then it would be easier to start with the F1 AMI?
Cheers
Greg
Hi Greg,
You are right about the Marketplace AMI and Xilinx tools. The EULA limits re-publishing the Developer AMI as it includes Xilinx tools.
However it should be straightforward to build XRT as you do not need Xilinx tools installed. The process described in the document is for when XRT didn't exist and we had to use XOCL and some artifacts from the tool itself.
All you need is the aws-fpga repository cloned and the aws-fpga mgmt library installed to be able to include it in the deb builds.
git clone https://github.com/aws/aws-fpga
cd aws-fpga
export AWS_FPGA_REPO_DIR=$PWD
source sdk_setup.sh
cd SDAccel/Runtime
git clone https://github.com/Xilinx/XRT
cd XRT
git checkout -t origin/<# Replace with XRT Release tag you want to use>
./src/runtime_src/tools/scripts/xrtdeps.sh
cd build
./build.sh
This should get you the debs you need for your runtime. Keep in mind that if you are using Vitis 2019.2 or above, XRT requires mpd which needs a default AFI to be loaded for the driver to bind:https://github.com/aws/aws-fpga/blob/master/Vitis/docs/XRT_installation_instructions.md
Hope this helps. I'll also update the doc to include these instructions.
-Deep
Hi Deep
That was a great help, but still not quite there!!
So I followed the instructions to load xrt. I am using SDaccel 18.2 on ubuntu 16.04, by the way.
On reboot xbutil scan showed zero cards, so I stopped mpd, loaded my agfi manually fine (fpga-load-local-image -S 0 -I agfi-xxxx), then started mpd, and checked it:
systemctl status mpd
● mpd.service - Xilinx Management Proxy Daemon (MPD)
Loaded: loaded (/etc/systemd/system/mpd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-10-07 05:51:33 UTC; 7s ago
Main PID: 3136 (mpd)
Tasks: 3
Memory: 680.0K
CPU: 4ms
CGroup: /system.slice/mpd.service
└─3136 /opt/xilinx/xrt/bin/mpd
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: AwsDev: 0000:00:1d.0(index: 0)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: AwsDev: 0000:00:1d.0(index: 0)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] read 80 bytes out of 80 bytes from fd 5, valid: 1
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] mpd daemon: request 10 received(reqSize: 32)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] write 524360 bytes out of 524360 bytes to fd 5
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] msg arrived on mailbox fd 5
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] retrieved msg size from mailbox: 40 bytes
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] read 72 bytes out of 72 bytes from fd 5, valid: 1
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] mpd daemon: request 11 received(reqSize: 24)
Oct 07 05:51:33 ip-172-31-4-181 mpd[3136]: [0:0:1d.0] write 2104 bytes out of 2104 bytes to fd 5
Then, again with xbutil scan:
xbutil scan
INFO: Found total 1 card(s), 1 are usable
System Configuration
OS name: Linux
Release: 4.4.0-1114-aws
Version: #127-Ubuntu SMP Fri Sep 4 08:41:12 UTC 2020
Machine: x86_64
Model: HVM domU
CPU cores: 8
Memory: 122894 MB
Glibc: 2.23
Distribution: Ubuntu 16.04.6 LTS
Now: Wed Oct 7 05:51:47 2020 GMT
XRT Information
Version: 2.8.0
Git Hash: 6660c520f73874b221df915c8232230c19eae23d
Git Branch: 65ffad62f427c0bd1bc65b6ea555a810295468b7
Build Date: 2020-10-07 05:13:33
XOCL: 2.8.0,6660c520f73874b221df915c8232230c19eae23d
XCLMGMT: unknown
\[0] 0000:00:1d.0 xilinx_aws-vu9p-f1_dynamic_5_0(ID=0xabcd) user(inst=128).
So it appears all good, but when I run the application, I get from opencl program command:
SETUP the OPENCL structures
platform Name: Xilinx
Vendor Name : Xilinx
Found Platform
Found Device=xilinx_aws-vu9p-f1_dynamic_5_0
INFO: loading kernel from ./binary_container_1.awsxclbin
INFO: Importing ./binary_container_1.awsxclbin
Loading: './binary_container_1.awsxclbin'
XRT build version: 2.8.0
Build hash: 6660c520f73874b221df915c8232230c19eae23d
Build date: 2020-10-07 05:13:33
Git branch: 65ffad62f427c0bd1bc65b6ea555a810295468b7
PID: 3183
UID: 0
\[Wed Oct 7 05:56:27 2020 GMT]
HOST: ip-172-31-4-181
EXE: /home/ubuntu/src/long_read_msc.exe
\[XRT] ERROR: See dmesg log for details. err=-5
\[XRT] ERROR: Failed to load xclbin.
\[XRT] ERROR: program is nullptr
../src/run_match_sril_combined.cpp:474 Error calling krnl_combined = cl::Kernel(program,"match_sril_combined", &err), error code is: -44
So looking at dmesg, the relevant bit seems to be:
437.334830] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 __icap_download_bitstream_axlf: incoming xclbin: aefa6df5-4a6a-46c5-a4fa-2057668f2012
on device xclbin: 00000000-0000-0000-0000-000000000000
\[ 437.334881] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 icap_cache_bitstream_axlf_section: found kind 6(MEM_TOPOLOGY)
\[ 437.335059] xocl 0000:00:1d.0: mailbox.u.9437184 ffff881e6194bc10 mailbox_request: sending request: 8 via SW
\[ 437.335377] xocl 0000:00:1d.0: mailbox.u.9437184 ffff881e6194bc10 mailbox_read: Software TX msg is too big
\[ 438.274680] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 __icap_peer_xclbin_download: peer xclbin download err: -5*_
\[ 438.284924] xocl 0000:00:1d.0: icap.u.22020096 ffff881e693b4810 icap_download_bitstream_axlf: err: -5
\[ 438.284928] xocl 0000:00:1d.0: ffff881e6cd43098 exec_reset: exec_reset(56) cfg(0)
\[ 438.284931] xocl 0000:00:1d.0: ffff881e6cd43098 exec_reset: exec_reset resets
\[ 438.284933] xocl 0000:00:1d.0: ffff881e6cd43098 exec_reset: exec->xclbin(00000000-0000-0000-0000-000000000000),xclbin(00000000-0000-0000-0000-000000000000)
\[ 438.284940] xocl 0000:00:1d.0: ffff881e6cd43098 xocl_init_mem: Topology count = 4, data_length = 160
\[ 438.284944] xocl 0000:00:1d.0: p2p.u.10485760 ffff881e693b6410 p2p_mem_init: already initialized
\[ 438.284947] xocl 0000:00:1d.0: ffff881e6cd43098 xocl_read_axlf_helper: Failed to download xclbin, err: -5
\[ 438.330007] \[drm] client exits pid(3174)
\[ 438.330012] xocl 0000:00:1d.0: ffff881e6cd43098 xocl_drvinst_close: CLOSE 2
\[ 438.330015] xocl 0000:00:1d.0: ffff881e6cd43098 xocl_drvinst_close: NOTIFY ffff881e76c1a410
so an xclbin download error.
Possibly, the problem is that I generated the designs afi on us-west-2, and now my AMI is on ap-southeast-2. However, I used the copy-fpga-image cli command to generate a new afi (for ap-southeast-2), and it shows as available when I run aws ec2 describe-fpga-images with the new afi.
Also, having to stop mpd, manually load the image, and start it again is a bit of a pain, is there a better way?
Cheers
Greg
Hi Deep
I fixed the problem with the afi/agfi, I mistakenly used an old .awsxclbin file, so there was a mismatch.
So I can run my kernel on the AMI I have created, however, I still need to stop mpd, fpga-load-image with my agfi, then start mpd.
After all that I can run my code.
Any ideas about why this happens?
Cheers
Greg
Hi Greg,
That is because XOCL won't bind to a cleared AFI state. mpd opens up fd's into a subdevice within XOCL once XOCL binds to a device id. Xilinx has added more details on it here: https://xilinx.github.io/XRT/2019.2/html/cloud_vendor_support.html
MPD is something that was added for Vitis 2019.2 and if you are using SDAccel 2018.2, you should be able use the XRT version validated for SDAccel 2018.2: https://github.com/Xilinx/XRT/releases/tag/2018.2_XDF.RC5
I hope this helps and feel free to let us know if you have any other questions here!
Thanks,
Deep
Hi Deep
I can compile the .deb files for XRT version 2018.2_XDF.RC5, but when I try to install xrt_201802.2.1.0_16.04-xrt.deb, it breaks with the following output:
root@ip-172-31-4-181:~/aws-fpga/SDAccel/Runtime/XRT/build/Release# apt install ./xrt_201802.2.1.0_16.04-xrt.deb
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'xrt' instead of './xrt_201802.2.1.0_16.04-xrt.deb'
The following NEW packages will be installed:
xrt
0 upgraded, 1 newly installed, 0 to remove and 19 not upgraded.
Need to get 0 B/4,576 kB of archives.
After this operation, 21.9 MB of additional disk space will be used.
Get:1 /home/ubuntu/aws-fpga-1.4.15a/SDAccel/Runtime/XRT/build/Release/xrt_201802.2.1.0_16.04-xrt.deb xrt amd64 2.1.0 [4,576 kB]
Selecting previously unselected package xrt.
(Reading database ... 131543 files and directories currently installed.)
Preparing to unpack .../xrt_201802.2.1.0_16.04-xrt.deb ...
Unpacking xrt (2.1.0) ...
Setting up xrt (2.1.0) ...
Registering new XRT Linux kernel module sources 2.1.0 with dkms
Creating symlink /var/lib/dkms/xrt/2.1.0/source ->
/usr/src/xrt-2.1.0
DKMS: add completed.
Building XRT Linux kernel modules sources with dkms
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area....
cd driver/xclng/drm/xocl; make; cd ../../../.......
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/xrt.0.crash'
Error! Build of xocl.ko failed for: 4.4.0-1114-aws (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/xrt/2.1.0/build/ for more information.
Installing XRT Linux kernel modules sources with dkms
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area....
cd driver/xclng/drm/xocl; make; cd ../../../.......
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/xrt.0.crash'
Error! Build of xocl.ko failed for: 4.4.0-1114-aws (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/xrt/2.1.0/build/ for more information.
Loading new XRT Linux kernel modules
modprobe: FATAL: Module xclmgmt not found in directory /lib/modules/4.4.0-1114-aws
modprobe: FATAL: Module xocl not found in directory /lib/modules/4.4.0-1114-aws
The crash file mentioned is:
root@ip-172-31-4-181:~/src# more /var/crash/xrt.0.crash
ProblemType: Package
DKMSBuildLog:
DKMS make.log for xrt-2.1.0 for kernel 4.4.0-1114-aws (x86_64)
Thu Oct 8 01:19:28 UTC 2020
cd userpf; make all
make[1]: Entering directory '/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf'
echo /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf
/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf
make -C /lib/modules/4.4.0-1114-aws/build M=/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf modules
make[2]: Entering directory '/usr/src/linux-headers-4.4.0-1114-aws'
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../xocl_subdev.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../xocl_ctx.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../xocl_thread.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mm_xdma.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/feature_rom.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mm_qdma.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mb_scheduler.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/mailbox.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/xvc.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/icap.o
CC [M] /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.o
/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.c:689:12: error: static declaration of ‘stream_open’ follows non-static declaration
static int stream_open(struct inode **inode, struct file **file)
^
In file included from include/linux/dma-buf.h:32:0,
from /var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.c:19:
include/linux/fs.h:2744:12: note: previous declaration of ‘stream_open’ was here
extern int stream_open(struct inode ** inode, struct file ** filp);
^
scripts/Makefile.build:285: recipe for target '/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.o' failed
make[3]: [/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.o] Error 1
Makefile:1471: recipe for target 'module/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf' failed
make[2]: [module/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf] Error 2
make[2]: Leaving directory '/usr/src/linux-headers-4.4.0-1114-aws'
Makefile:53: recipe for target 'all' failed
make[1]: [all] Error 2
make[1]: Leaving directory '/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf'
Makefile:2: recipe for target 'all' failed
make: [all] Error 2
DKMSKernelVersion: 4.4.0-1114-aws
Date: Thu Oct 8 01:19:35 2020
DuplicateSignature: dkms:xrt:2.1.0:/var/lib/dkms/xrt/2.1.0/build/driver/xclng/drm/xocl/userpf/../subdev/str_qdma.c:689:12: error: static declaration of ‘stream_open’ follows non-static declaration
Package: xrt 2.1.0
PackageVersion: 2.1.0
SourcePackage: xrt
Title: xrt 2.1.0: xrt kernel module failed to build
Any ideas?
Cheers
Greg
Hi Deep
Ok, I found a patch for that problem in the forums, and finished the compilation. It all works fine now.
Thanks again for your help
Cheers
Greg
Relevant content
- asked 3 years ago
- asked a month ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 10 months ago