diff --git a/openbr/plugins/cuda/README.md b/openbr/plugins/cuda/README.md index 4bd35db..45db537 100644 --- a/openbr/plugins/cuda/README.md +++ b/openbr/plugins/cuda/README.md @@ -1,23 +1,122 @@ # CUDA Plugins -This folder contains CUDA-accelerated OpenBR plugins. They are structured in the following format. +CUDA plugins are very similar to normal plugins. A single plugin is split into +two files: the `.cpp` file with the BR standard plugin definition and the `.cu` +file with your kernel and wrapper functions. -## File Structure -We will use a plugin called `CUDAPlugin` as an example. +## The `.cpp` file +Every main plugin file must have the names of the kernel wrapper functions +defined at the top of the program. Once the definitions are there, just call +the CUDA functions as you need them -Each plugin has 3 files associated with it: a CUDA file, CPP file, and HPP header file. +## The `.cu` file +All functions within the CUDA file must be declared inside their own namespace +under `br::cuda`. For example the plugin `passthrough` must have all functions +inside it declared under the namespace `br::cuda::passthrough`. + +## CPU Template object format +Like any other BR Transform, the plugin must return an object for the next +plugin to consume. For performance reasons, we don't copy data to and from +the graphics card for every transform. Instead, we use this space to transfer +data about how to access the image data and its type. The Mat is an array of data type `void*`. + +Index | Item Name | Type | Description +--------|-------------|-----------|------------ +0 | GpuData | void* | Pointer to the graphics card data +1 | rows | int | Number of rows in the Mat +2 | cols | int | Number of colums in the Mat +3 | type | int | OpenCV mat data type code (i.e. `mat.type()`) + +It is expected that the wrapper function does the proper GPU memory handling +to make sure that the GpuData pointer in the output mat is pointing to the +data that the plugin is outputting. + +## Example: Passthrough +This example plugin takes in input data and passes it straight to the output. +The BR transform calls the wrapper function which exists in the CUDA file which +in turn calls the kernel routine to copy the data in the GPU. + +**Note**: This program assumes that a previous Transform, namely `CUDACopyTo` has +copied the data to the GPU. + +### **passthrough.cpp** +```c++ +#include +#include + +// wrapper function within the CUDA file +namespace br { namespace cuda { namespace passthrough { + void wrapper(void* srcGpuData, void** dstGpuData); +}}}; + +#include +namespace br +{ + class CUDAPassthroughTransform : public UntrainableTransform + { + Q_OBJECT + + void project(const Template &src, Template &dst) { + // extract the parameters out of the Mat passed from the previous plugin + void* const* srcDataPtr = src.m().ptr(); + int rows = *((int*)srcDataPtr[1]); + int cols = *((int*)srcDataPtr[2]); + int type = *((int*)srcDataPtr[3]); + + // generate a new Mat to be passed to the next plugin + Mat dstMat = Mat(src.m().rows, src.m().cols, src.m().type()); + void** dstDataPtr = dstMat.ptr(); + dstDataPtr[1] = srcDataPtr[1]; + dstDataPtr[2] = srcDataPtr[2]; + dstDataPtr[3] = srcDataPtr[3]; + + // call the wrapper and set the dst output to the newly created Mat + br::cuda::passthrough::wrapper(srcDataPtr[0], &dstDataPtr[0], rows, cols); + dst = dstMat; + } + }; + + BR_REGISTER(Transform, CUDAPassthroughTransform); +} + +#include "cuda/passthrough.moc" ``` -cudaplugin.cu -cudaplugin.cpp -cudaplugin.hpp + +### **passthrough.cu** +```c++ +#include + +namespace br { namespace cuda { namespace passthrough { + __global__ void kernel(char* srcPtr, char* dstPtr, int rows, int cols) { + // get the current index + int rowInd = blockIdx.y*blockDim.y+threadIdx.y; + int colInd = blockIdx.x*blockDim.x+threadIdx.x; + + // don't do anything if we are outside the allowable positions + if (rowInd >= rows || colInd >= cols) + return; + + // write the input to the output + rowDstPtr[rowInd*cols + colInd] = srcVal; + } + + void wrapper(char* srcPtr, char** dstPtr, int rows, int cols, int type) { + // verify the proper image type + if (type != CV_8UC1) { + cout << "Error: image type not supported" + return; + } + + *dstPtr = cudaMalloc(rows*cols*sizeof(char)); + + dim3 threadsPerBlock(8, 8); + dim3 numBlocks(imageWidth / threadsPerBlock.x + 1, + imageHeight / threadsPerBlock.y + 1); + + // run the kernel function + kernel<<>>(srcPtr, dstPtr, rows, cols); + + // free the memory as it isn't used anymore + cudaFree(srcPtr); + } +}}} ``` -The `.cu` file contains CUDA kernel functions and the corresponding wrapper functions -that directly call the kernel functions. The `.cpp` files contain the OpenBR -standard plugin declaration. Functions in this file call the wrappers. The `.hpp` -contains header declarations for the CUDA wrapper functions so the `.cpp` file -knows how to call them. - -# CUDA Files -All functions for a particular CUDA plugin are defined in a namespace of that -plugin's name which is defined within `br::cuda` namespace. For example, if -we have a plugin called CUDAPlugin, both wrapper and kernel functions should -be globally defined within `br::cuda::cudaplugin`.