Question regarding f1.4 Instance

0

Hi Deep,
I had launched up one F1.4x instance. the F1.4x has two fpga cards vs f1.2 which has just one.

  1. If I just run one application, which fpga card will be used default? Or should I assign one FPGA to it --and how?

  2. If I have multiple applications, then what fpga will be assigned to which fpga card, how can I do that?

  3. Now I just already install ubuntu and vitis, and then build and installed the XRT (already install xrt.deb and aws-xrt.deb. Then by running xbutil scan, it reports one devic, which is not usable:

xbutil scan
INFO: Found total 1 card(s), 0 are usable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
System Configuration
OS name:	Linux
Release:	4.15.0-1063-aws
Version:	#67-Ubuntu SMP Mon Mar 2 07:24:29 UTC 2020
Machine:	x86_64
Model:		HVM domU
CPU cores:	16
Memory:		245842 MB
Glibc:		2.27
Distribution:	Ubuntu 18.04.4 LTS
Now:		Mon Mar 30 09:12:25 2020
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
XRT Information
Version:	2.6.0
Git Hash:	0be8f75ca7e8a676ae5d385f453636c11567d584
Git Branch:	master
Build Date:	2020-03-30 08:22:33
XOCL:		2.6.0,0be8f75ca7e8a676ae5d385f453636c11567d584
XCLMGMT:	unknown
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*[0] 0000:00:1d.0 xilinx_aws-vu9p-f1_dynamic_5_0(ID=0xabcd) user(inst=129)
WARNING: card(s) marked by '*' are not ready, is MPD runing? run 'systemctl status mpd' to check MPD details.

Here is the mpd journal log (start from reboot, then I did a stop and restart mpd).

-- Reboot --
Mar 30 08:50:03 ip-172-31-19-142 systemd[1]: Started Xilinx Management Proxy Daemon (MPD).
Mar 30 08:50:04 ip-172-31-19-142 mpd[1573]: started
Mar 30 08:50:04 ip-172-31-19-142 mpd[1573]: aws: load default afi to 0000:00:1b.0
Mar 30 08:50:04 ip-172-31-19-142 mpd[1573]: found mpd plugin: /opt/xilinx/xrt/lib/libmpd_plugin.so
Mar 30 09:18:05 ip-172-31-19-142 mpd[1573]: aws: load default afi to 0000:00:1d.0
Mar 30 09:19:10 ip-172-31-19-142 mpd[1573]: mpd caught signal 15
Mar 30 09:19:10 ip-172-31-19-142 systemd[1]: Stopping Xilinx Management Proxy Daemon (MPD)...
Mar 30 09:20:40 ip-172-31-19-142 systemd[1]: mpd.service: State 'stop-sigterm' timed out. Killing.
Mar 30 09:20:40 ip-172-31-19-142 systemd[1]: mpd.service: Killing process 1573 (mpd) with signal SIGKILL.
Mar 30 09:20:40 ip-172-31-19-142 systemd[1]: mpd.service: Main process exited, code=killed, status=9/KILL
Mar 30 09:20:40 ip-172-31-19-142 systemd[1]: mpd.service: Failed with result 'timeout'.
Mar 30 09:20:40 ip-172-31-19-142 systemd[1]: Stopped Xilinx Management Proxy Daemon (MPD).
Mar 30 09:20:56 ip-172-31-19-142 systemd[1]: Started Xilinx Management Proxy Daemon (MPD).
Mar 30 09:20:56 ip-172-31-19-142 mpd[9324]: started
Mar 30 09:20:56 ip-172-31-19-142 mpd[9324]: found mpd plugin: /opt/xilinx/xrt/lib/libmpd_plugin.so
Mar 30 09:20:59 ip-172-31-19-142 mpd[9324]: aws: load default afi to 0000:00:1b.0

BTW, I tried source vitis_runtime_setup.sh, it reports some error as below, does that matter??
Note I had built the XRT package and installed the xrt.deb then installed aws-xrt.deb.

WARNING: 4.15.0-1063-aws does not match one of recommended kernel versions
3.10.0-862.11.6.el7.x86_64
3.10.0-693.21.1.el7.x86_64
3.10.0-957.1.3.el7.x86_64
3.10.0-957.5.1.el7.x86_64
3.10.0-957.27.2.el7.x86_64
3.10.0-1062.4.1.el7.x86_64
3.10.0-1062.9.1.el7.x86_64WARNING: Xilinx Runtime not validated against your installed kernel version.
INFO: Xilinx Vivado version is 2019.2
INFO: XRT installed. proceeding to check version compatibility
INFO: Installed XRT version : 2019.2:0be8f75ca7e8a676ae5d385f453636c11567d584
ERROR: 2019.2:0be8f75ca7e8a676ae5d385f453636c11567d584 does not match recommended versions

Thanks a lot!

Edited by: macleonsh on Mar 30, 2020 1:59 AM

asked 4 years ago196 views
4 Answers
0

Hi macleonsh,

This seems like an issue with MPD getting a SIGTERM when trying to load the default AFI.
Could you specify the region you are launching this instance in?

In the meanwhile I'll see if Xilinx can help out with the error message.

The vitis_runtime setup should not matter, you can add your version to the supported list and move forward with your testing:

echo >> $AWS_FPGA_REPO_DIR/Vitis/vitis_xrt_version.txt

echo "2019.2:0be8f75ca7e8a676ae5d385f453636c11567d584" >> $AWS_FPGA_REPO_DIR/Vitis/vitis_xrt_version.txt

source $AWS_FPGA_REPO_DIR/vitis_setup.sh #Should complete without error

-Deep

Deep_P
answered 4 years ago
0

Hi macleonsh,

Are you running this instance in a China region by any chance?

-Deep

Deep_P
answered 4 years ago
0

Yes I am using f1.4 in China region.
OK you remind me that Brain Xu from Xilinx previously helped to identify the similar issue, that within older XRT code for AWS deployment a default AFI need to be loaded but if this default AFI is not accessible then some hanging happen.
This can be overcome by manually load whatever afi, so I did the similar thing--what I did:

  • sudo systemctl stop mpd
  • sudo fpga-load-local-image -S 0 -I agfi-0fcf87119b8e97bf3
  • sudo fpga-load-local-image -S 1 -I agfi-0fcf87119b8e97bf3
  • sudo systemctl start mpd

Now it looks OK that I can do xbutil scan to find two fpga cards.

 [0] 0000:00:1d.0 xilinx_aws-vu9p-f1_dynamic_5_0(ID=0xabcd) user(inst=129)
 [1] 0000:00:1b.0 xilinx_aws-vu9p-f1_dynamic_5_0(ID=0xabcd) user(inst=128)

Why you request a default afi must be loaded --now this looks in part of xrt? in that case the default afi must be accessible to all regions.

BTW, can you help to check my initial two questions:
i.e. how can I run application to using the specific fpga slot?

answered 4 years ago
0

Hi Macleonsh,

Deep asks me to help answer your question.
A1, if you just run one application, the application has access to both FPGAs, there is openCL API that returns IDs of both FPGA, then it is up to the application itself to determine which FPGA a kernel is going to run on.
A2, if you have 2 applications, you can run each of them in a container (or a pod managed by kubernetes, similar thing), and assign one FPGA to one container.
If you have more applications, you still only have 2 of them having FPGA access, since so far we don't have FPGA hardware level virtualization support. You can make the app with FPGA access run as server, all other apps which want to have acceleration run as client and the request from the client can be proxied to the server.
Here I have an example how to do that on AWS F1
https://github.com/xuhz/k8s-xilinx-fpga-device-plugin/blob/master/aws-readme.md
I create a one node kubenetes cluster on one AWS F1 node, and launch a server pod as a service which has FPGA access, I also launch a client pod which doesn't have FPGA access. whenever the client sends a request to the server, the service pod run a helloworld on FPGA and sends back the output as response.

You may not try that example on your F1 since I believe in China the afi used in the server pod docker image is not accessible.

brianx
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions