- Newest
- Most votes
- Most comments
If you use the RTL Kernel Wizard to create a kernel with a single scalar input and 6 pointers all mapped to the same AXI MM interface, the generated XML file will look like this:
<?xml version="1.0" encoding="UTF-8"?>
<root versionMajor="1" versionMinor="6">
<kernel name="kernel6ptr" language="ip_c" vlnv="mycompany.com:kernel:kernel6ptr:1.0" attributes="" preferredWorkGroupSizeMultiple="0" workGroupSize="1" interr
upt="true">
<ports>
<port name="s_axi_control" mode="slave" range="0x1000" dataWidth="32" portType="addressable" base="0x0"/>
<port name="m00_axi" mode="master" range="0xFFFFFFFFFFFFFFFF" dataWidth="512" portType="addressable" base="0x0"/>
</ports>
<args>
<arg name="scalar00" addressQualifier="0" id="0" port="s_axi_control" size="0x4" offset="0x010" type="uint" hostOffset="0x0" hostSize="0x4"/>
<arg name="axi00_ptr0" addressQualifier="1" id="1" port="m00_axi" size="0x8" offset="0x018" type="int*" hostOffset="0x0" hostSize="0x8"/>
<arg name="axi00_ptr1" addressQualifier="1" id="2" port="m00_axi" size="0x8" offset="0x020" type="int*" hostOffset="0x0" hostSize="0x8"/>
<arg name="axi00_ptr2" addressQualifier="1" id="3" port="m00_axi" size="0x8" offset="0x028" type="int*" hostOffset="0x0" hostSize="0x8"/>
<arg name="axi00_ptr3" addressQualifier="1" id="4" port="m00_axi" size="0x8" offset="0x030" type="int*" hostOffset="0x0" hostSize="0x8"/>
<arg name="axi00_ptr4" addressQualifier="1" id="5" port="m00_axi" size="0x8" offset="0x038" type="int*" hostOffset="0x0" hostSize="0x8"/>
<arg name="axi00_ptr5" addressQualifier="1" id="6" port="m00_axi" size="0x8" offset="0x040" type="int*" hostOffset="0x0" hostSize="0x8"/>
</args>
</kernel>
</root>
The example generated by the RTL Kernel Wizard passed HW emulation when targeting the AWS F1 platform.
Thanks for your reply but when I try it, I am getting a zero out buffer.
Like
INFO: [ConfigUtil 60-895] Target platform: /home/centos/src/project_data/aws-fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1-04261818_dynamic_5_0/xilinx_aws-vu9p-f1-04261818_dynamic_5_0.xpfm
emulation configuration file `emconfig.json` is created in current working directory
XCL_EMULATION_MODE=hw_emu ./host
ERROR: xclProbe-scan failed at fpga_pci_get_all_slot_specs
xclProbe found 0 FPGA slots with xocl driver running
Found Platform
Platform Name: Xilinx
XCLBIN File Name: vadd
INFO: Importing xclbin/vadd.hw_emu.xilinx_aws-vu9p-f1-04261818_dynamic_5_0.xclbin
Loading: 'xclbin/vadd.hw_emu.xilinx_aws-vu9p-f1-04261818_dynamic_5_0.xclbin'
INFO: [SDx-EM 01] Hardware emulation runs simulation underneath. Using a large data set will result in long simulation times. It is recommended that a small dataset is used for faster execution. This flow does not use cycle accurate model
s and hence the performance data generated is approximate.
WARNING: unaligned host pointer '0x23cac20' detected, this leads to extra memcpy
WARNING: unaligned host pointer '0x23cac70' detected, this leads to extra memcpy
WARNING: unaligned host pointer '0x23cacc0' detected, this leads to extra memcpy
WARNING: unaligned host pointer '0x23cad10' detected, this leads to extra memcpy
WARNING: unaligned host pointer '0x23cad60' detected, this leads to extra memcpy
WARNING: unaligned host pointer '0x23cadb0' detected, this leads to extra memcpy
out[0]: 0
out[1]: 0
out[2]: 0
out[3]: 0
out[4]: 0
out[5]: 0
out[6]: 0
out[7]: 0
out[8]: 0
out[9]: 0
out[10]: 0
out[11]: 0
out[12]: 0
out[13]: 0
out[14]: 0
out[15]: 0
INFO: [SDx-EM 22] [Wall clock time: 22:55, Emulation time: 0.00332367 ms] Data transfer between kernel(s) and global memory(s)
BANK0 RD = 0.125 KB WR = 0.062 KB
BANK1 RD = 0.000 KB WR = 0.000 KB
BANK2 RD = 0.000 KB WR = 0.000 KB
BANK3 RD = 0.000 KB WR = 0.000 KB
BANKkrnl_vadd_rtl_1/m_axi_gmem RD = 0.000 KB WR = 0.000 KB
krnl_vadd_rtl_1:m_axi_gmem RD = 0.125 KB WR = 0.062 KB
Here is my host code
#include "xcl2.hpp"
#include <vector>
int main(int argc, char** argv)
{
int size = 16;
size_t size_bytes = sizeof(int) * size;
int *ibuf_0 = static_cast<int *>(malloc(size_bytes));
int *ibuf_1 = static_cast<int *>(malloc(size_bytes));
int *ibuf_2 = static_cast<int *>(malloc(size_bytes));
int *ibuf_3 = static_cast<int *>(malloc(size_bytes));
int *ibuf_4 = static_cast<int *>(malloc(size_bytes));
int *obuf_0 = static_cast<int *>(malloc(size_bytes));
// Create the test data and Software Result
for(int i = 0 ; i < size ; i++){
ibuf_0[i] = 0xa;
ibuf_1[i] = 0x2;
ibuf_2[i] = 0x9;
ibuf_3[i] = 0x4;
ibuf_4[i] = 0x5;
obuf_0[i] = 0x6;
}
//OPENCL HOST CODE AREA START
//Create Program and Kernel
std::vector<cl::Device> devices = xcl::get_xil_devices();
cl::Device device = devices[0];
cl::Context context(device);
cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE);
std::string device_name = device.getInfo<CL_DEVICE_NAME>();
std::string binaryFile = xcl::find_binary_file(device_name,"vadd");
cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
devices.resize(1);
cl::Program program(context, devices, bins);
cl::Kernel krnl_vadd(program,"krnl_vadd_rtl");
//Allocate Buffer in Global Memory
std::vector<cl::Memory> ibuf_vec, obuf_vec;
cl::Buffer ocl_ibuf_0(context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, size_bytes, ibuf_0);
cl::Buffer ocl_ibuf_1(context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, size_bytes, ibuf_1);
cl::Buffer ocl_ibuf_2(context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, size_bytes, ibuf_2);
cl::Buffer ocl_ibuf_3(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, size_bytes, ibuf_3);
cl::Buffer ocl_ibuf_4(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, size_bytes, ibuf_4);
cl::Buffer ocl_obuf_0(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, size_bytes, obuf_0);
ibuf_vec.push_back(ocl_ibuf_0);
ibuf_vec.push_back(ocl_ibuf_1);
ibuf_vec.push_back(ocl_ibuf_2);
ibuf_vec.push_back(ocl_ibuf_3);
ibuf_vec.push_back(ocl_ibuf_4);
obuf_vec.push_back(ocl_obuf_0);
//Copy input data to device global memory
q.enqueueMigrateMemObjects(ibuf_vec, 0/* 0 means from host*/);
//Set the Kernel Arguments
int nargs = 0;
krnl_vadd.setArg(nargs++, size);
krnl_vadd.setArg(nargs++, ocl_ibuf_0);
krnl_vadd.setArg(nargs++, ocl_ibuf_1);
krnl_vadd.setArg(nargs++, ocl_ibuf_2);
krnl_vadd.setArg(nargs++, ocl_ibuf_3);
krnl_vadd.setArg(nargs++, ocl_ibuf_4);
krnl_vadd.setArg(nargs++, ocl_obuf_0);
//Launch the Kernel
q.enqueueTask(krnl_vadd);
//Copy Result from Device Global Memory to Host Local Memory
q.enqueueMigrateMemObjects(obuf_vec, CL_MIGRATE_MEM_OBJECT_HOST);
q.finish();
//OPENCL HOST CODE AREA END
for (int i = 0 ; i < size ; i++){
printf("out[%d]: %x\n", i, obuf_0[i]);
}
free(ibuf_0);
free(ibuf_1);
free(ibuf_2);
free(ibuf_3);
free(ibuf_4);
free(obuf_0);
return 0;
}
Is there a problem with my host code?. It works with 1-scalar and 5-pointers but it fails when using 1-scalar and 6-pointers or when I have to use an offset of 0x40 on the kernel.xml file
Thanks!
Edited by: xor on Jan 8, 2019 3:29 PM
There doesn't seem to be anything wrong with your host code (at least nothing obvious).
Could the issue be with your RTL code? It is odd that it would work with 5 pointers but not with 6.
As mentioned earlier, the simple example generated by the wizard works fine.
Are you able to run HW emulation in debug mode to look at the RTL waveforms?
Hi,
Yeah I did successfully:
3-pointers and 1-scalar
4-pointers and 1-scalar
5-pointers and 1-scalar
But once I go for 6-pointers and 1-scalar, it breaks.
How do I do RTL waveforms with hardware emulation? Is there a tutorial or documentation for that?
Thanks!
Enabling RTL waveforms during emulation is covered in the SDAccel documentation:
https://www.xilinx.com/html_docs/xilinx2018_2/sdaccel_doc/device-hardware-transaction-view-nng1504034335037.html
In short:
o When using the SDx GUI, you can enable the waveforms from the Run Configurations settings.
o When working from the command line, you need to add the two lines below to the sdaccel.ini file:
[Emulation]
launch_waveform=gui
The documentation referenced above explains both approaches in greater detail.
Edited by: ThomasXilinx on Jan 9, 2019 11:24 AM
Thanks for the pointer, however this is waveform is more about "device-level transaction" and not RTL waveform debugging. It is more like a kernel events timeline. In their own words, The details include data transfers between the kernel and global memory, data flow via inter-kernel pipes as well as data flow via intra-kernel pipes.
I am checking that to see if I can find anything there. Otherwise, I believe I am going to use the RTL-wizard-->Vivado way to create a testbench for this particular case and see if I can find something.
The default waveform setup will indeed trace data transfers between the kernel and global memory, data flow via inter-kernel pipes as well as data flow via intra-kernel pipes. But in interactive mode, you can also access all the signal in the RTL kernel and add them to the waveform. So you are not limited to the default trace configuration.
This said, in your case, the default setup would let you see check whether there are data transfers related to the 6th pointer.
Please confirm the outcome of using the RTL Kernel Wizard. As mentioned earlier, I tried this yesterday and the generated example worked for me.
You were right, I did not know I could add RTL waveforms with hardware emulation. I finally found what my issue was, it was the bitwidth of the address. This is the example I am using.
This example uses 6-bits for axi-lite-control-addresses but the 6th-pointer is 0x40 which means that I would need 7-bits.
Thanks for all your help!
Relevant content
- Accepted Answerasked 5 years ago
- asked 6 years ago
- asked 5 years ago
- AWS OFFICIALUpdated 19 days ago
- AWS OFFICIALUpdated 18 days ago
- AWS OFFICIALUpdated 13 days ago
- AWS OFFICIALUpdated a year ago