5.5.3.2.3. cFp Harris Corner Detection

Note: This HTML section is rendered based on the Markdown file in cFp_Zoo.

This is the implementation of the common Harris corner detector (Harris), on cloudFPGA platform. The Harris IP is privided by the open source Xilinx ® Vitis™ Vision library, which is a fundamental library aimed at providing a comprehensive FPGA acceleration library for computer vision algorithms.

Oveview of Vitis Vision Harris Corner Detector

5.5.3.2.3.1. Repository and environment setup

git clone --recursive git@github.com:cloudFPGA/cFp_Zoo.git
cd cFp_Zoo
source ./env/setenv.sh

5.5.3.2.3.2. Intergration of Vitis Vision Harris with cloudFPGA

In the following figure it is shown how straightforward is to intergrate a function from Vitis libraries with cloudFPGA project.

Oveview of Vitis Vision Harris dataflow

Since most of Vitis libraries (in L1) are offered with a AXI stream I/F in dataflow mode, the most obvious approach to connect them to cF is to wrap this I/F with anohter I/F that takes care of carefully feeding (as well as sending the results back) to (from) the Harris IP from the network. For cFp_Zoo we are using the Themisto Shell already equipeed with a network streaming I/F for the user application. A small FSM takes care of the data casting between network and AXI streams.

5.5.3.2.3.3. Harris Simulation

The testbench of Harris is highlighted below:

Oveview of Vitis Vision Harris Testbench

The testbench is offered in two flavors:

  • HLS TB: The testbench of the C++/RTL. This is a typical Vivado HLS testbench but it includes the testing of Harris IP when this is wrapped in a cF Themisto Shell.

  • Host TB: This includes the testing of a a host apllication (C++) that send/receives images over Ethernet (TCP/UDP) with a cF FPGA. This testbench establishes a socket-based connection with an intermediate listener which further calls the previous testbench. So practically, the 2nd tb is a wrapper of the 1st tb, but passing the I/O data over socket streams. For example this is the system command inside Host TB that calls the HLS TB:

This folder contains the mandatory files to proceed withthe 1st option, i.e. HLS TB

Basic files/module for the HLS TB:

  1. test_harris.cpp: The typical Vivado HLS testbench of Harris IP, when this is wrapped in a Themisto Shell.

  2. Themisto Shell: The SHELL-ROLE architecture of cF.

  3. cFp_Zoo: The project that bridges Vitis libraries with cF.

5.5.3.2.3.3.1. Harris image size

The maximum image size, that the Harris IP is configured, is defined at https://github.com/cloudFPGA/cFp_Zoo/blob/master/HOST/vision/harris/languages/cplusplus/include/config.h through the FRAME_HEIGHT and FRAME_WIDTH definitions. These definitions have an impact of the FPGA resources. In the following simulations if the image provided has other dimensions, the cv::resize function will be used to adjust the image (scale) to FRAME_HEIGHT x FRAME_WIDTH.

Note: Remember to run make clean every time you change those definitions.

5.5.3.2.3.3.2. Run simulation

HLS TB

cd ./ROLE/vision/hls/harris_app
make fcsim -j 4  # to run simulation using your system's gcc (with 4 threads)
make csim   # to run simulation using Vivado's gcc
make cosim  # to run co-simulation using Vivado

Optional steps

cd ./ROLE/vision/hls/harris_app
make callgraph # to run fcsim and then execute the binary in Valgrind's callgraph tool
make kcachegrind # to run callgrah and then view the output in Kcachegrind tool
make memcheck # to run fcsim and then execute the binary in Valgrind's memcheck tool (to inspect memory leaks)

5.5.3.2.3.4. Harris Synthesis

Since curretnly the cFDK supports only Vivado(HLS) 2017.4 we are following a 2-steps synthesis procedure. Firstly we synthesize the Themisto SHELL with Vivado (HLS) 2017.4 and then we synthesize the rest of the project (including P&R and bitgen) with Vivado (HLS) > 2019.1.

5.5.3.2.3.4.1. The Harris IP

This is only for the HLS of Harris (e.g. to check synthesizability)

cd cFp_Zoo/ROLE/vision/hls
make harris # with Vivado HLS >= 2019.1

or

cd cFp_Zoo/ROLE/vision/hls/harris
make csynth # with Vivado HLS >= 2019.1

or

cd cFp_Zoo/ROLE/vision/hls/harris
vivado_hls -f run_hls.tcl # with Vivado HLS >= 2019.1

5.5.3.2.3.4.2. The Themisto SHELL

cd cFp_Zoo/cFDK/SRA/LIB/SHELL/Themisto
make all # with Vivado HLS == 2019.1

5.5.3.2.3.4.3. The complete cFp_Zoo

cd cFp_Zoo
make monolithic # with Vivado HLS >= 2019.1

More info for the Harris IP: https://xilinx.github.io/Vitis_Libraries/vision/api-reference.html#harris-corner-detection

5.5.3.2.3.4.4. Troubleshooting

  • Vivado libstdc++.so.6: version CXXABI_1.3.11 not found (required by /lib64/libtbb.so.2)`
    

    Fix: cp /usr/lib64/libstdc++.so.6 /tools/Xilinx/Vivado/2020.1/lib/lnx64.o/Default/libstdc++.so.6

   /lib64/libtbb.so.2: undefined reference to `__cxa_init_primary_exception@CXXABI_1.3.11'
   /lib64/libtbb.so.2: undefined reference to `std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11'

Fix: ``csim_design -ldflags "-L/usr/lib/gcc/x86_64-redhat-linux/8/ ${OPENCV_LIB_FLAGS} ${OPENCV_LIB_REF}" -clean -argv "${SimFile}"``

5.5.3.2.3.5. Harris Host Testbench

Note: This HTML section is rendered based on the Markdown file in cFp_Zoo.

The testbench of Harris is highlighted below:

Oveview of Vitis Vision Harris Testbench

The testbench is offered in two flavors:

  • HLS TB: The testbench of the C++/RTL. This is a typical Vivado HLS testbench but it includes the testing of Harris IP when this is wrapped in a cF Themisto Shell.

  • Host TB: This includes the testing of a a host apllication (C++) that send/receives images over Ethernet (TCP/UDP) with a cF FPGA. This testbench establishes a socket-based connection with an intermediate listener which further calls the previous testbench. So practically, the 2nd tb is a wrapper of the 1st tb, but passing the I/O data over socket streams. For example this is the system command inside Host TB that calls the HLS TB:

    // Calling the actual TB over its typical makefile procedure, but passing the save file
    string str_command = "cd ../../../../ROLE/vision/hls/harris/ && " + clean_cmd + "\
    INPUT_IMAGE=./test/input_from_udp_to_fpga.png " + exec_cmd + " && \
    cd ../../../../HOST/vision/harris/build/ ";
    const char *command = str_command.c_str();
    cout << "Calling TB with command:" << command << endl;
    system(command);
    

This folder contains the mandatory files to proceed with the 2nd option, i.e. Host TB

Basic files/modules for the Host TB:

  1. harris_host.cpp: The end-user application. This is the application that a user can execute on a x86 host and send an image to the FPGA for processing with Harris Corner Detector algorithm. This file is part of both the HLS TB and the Host TB

  2. harris_host_fw_tb.cpp: The intermediate listener for socket connections from an end-user application. This file is part only of the Host TB.

  3. test_harris.cpp: The typical Vivado HLS testbench of Harris IP, when this is wrapped in a Themisto Shell.

  4. Themisto Shell: The Themisto SHELL-ROLE architecture of cF.

  5. cFp_Zoo: The project that bridges Vitis libraries with cF.

# Compile sources
cd ./HOST/vision/harris
mkdir build && cd build
cmake ../
make -j 2

# Start the intermediate listener
# Usage: ./harris_host_fwd_tb <Server Port> <optional simulation mode>
./harris_host_fwd_tb 1234 0

# Start the actual user application on host
# Open another terminal and prepare env
cd cFp_Zoo
source ./env/setenv.sh
cd ./HOST/vision/harris/build
# Usage: ./harris_host <Server> <Server Port> <optional input image>
./harris_host localhost 1234 ../../../../ROLE/vision/hls/harris/test/8x8.png

# What happens is that the user application (harris_host) is sending an input image file to
# intermediate listener (harris_host_fwd_tb) through socket. The latter receives the payload and
# reconstructs the image. Then it is calling the HLS TB by firstly compiling the HLS TB files. The
# opposite data flow is realized for taking the results back and reconstruct the FPGA output image.
# You should expect the output in the file <optional input image>_fpga_out_frame_#.png
eog ../../../../ROLE/vision/hls/harris/test/8x8.png_fpga_points_out_frame_1.png

5.5.3.2.3.6. Harris cF End-to-End Demo

TODO: Flash a cF FPGA node with the generated bitstream and note down the IP of this FPGA node. e.g. assuming 10.12.200.153 and port 2718

cd ./ROLE/vision/host/harris
mkdir build && cd build
cmake ../
make -j 2
cd cFp_Zoo/ROLE/vision/host/harris/build
# Usage: ./harris_host <Server> <Server Port> <optional input image>
./harris_host 10.12.200.153 2718 ../../../../ROLE/vision/hls/harris/test/8x8.png
# You should expect the output in the file <optional input image>_fpga_out_frame_#.png
eog ../../../../ROLE/vision/hls/harris/test/8x8.png_fpga_points_out_frame_1.png

NOTE: The cFp_Zoo ROLE (FPGA part) is equipped with both the UDP and TCP offload engines. At runtime, on host, to select one over the other, you simply need to change in config.h file the define #define NET_TYPE udp (choose either udp or tcp).

5.5.3.2.3.7. Harris usefull commands

  • Editing videos for input to the Harris example:

    ffmpeg -i The_Mast_Walk_by_Alex_Thomson.mp4 -ss 00:00:39 -t 00:00:17 -async 1 -strict -2 cut.mp4 -c copy frame= 1025 fps= 42 q=-1.0 Lsize=   10487kB time=00:00:41.00 bitrate=2095.0kbits/s ffmpeg -i cut.mp4 -filter:v "crop=720:720:200:20" -strict -2 cut_720x720.mp4

5.5.3.2.3.8. Working with ZYC2

All communication goes over the UDP/TCP port 2718. Hence, the CPU should run:

$ ./harris_host <Server> <Server Port> <optional input image>

The packets will be send from Host (CPU) Terminal 1 to FPGA and they will be received back in the same terminal by a single host application using the sendTo() and receiveFrom() socket methods.

For more details, tcpdump -i <interface> -nn -s0 -vv -X port 2718 could be helpful.

The Role can be replicated to many FPGA nodes in order to create a pipline of processing. Which destination the packets will have is determined by the node_id/node_rank and cluster_size (VHDL portspiFMC_ROLE_rank and piFMC_ROLE_size).

The **Role can be configured to forward the packet always to (node_rank + 1) % cluster_size** (for UDP and TCP packets), so this example works also for more or less then two FPGAs, actually. curretnly, the default example supports one CPU node and one FPGA node.