Name	Last Update	Last Commit 9c26431f – removed mongoose History
..
README.md	Loading commit data...
copyfrom.cpp	Loading commit data...
copyfrom.cu	Loading commit data...
copyto.cpp	Loading commit data...
copyto.cu	Loading commit data...
cudaaffine.cpp	Loading commit data...
cudaaffine.cu	Loading commit data...
cudacvtfloat.cpp	Loading commit data...
cudacvtfloat.cu	Loading commit data...
cudadefines.hpp	Loading commit data...
cudal2.cpp	Loading commit data...
cudal2.cu	Loading commit data...
cudalbp.cpp	Loading commit data...
cudalbp.cu	Loading commit data...
cudapca.cpp	Loading commit data...
cudapca.cu	Loading commit data...
cudargb2grayscale.cpp	Loading commit data...
cudargb2grayscale.cu	Loading commit data...
module.cmake	Loading commit data...

README.md

CUDA Plugins

CUDA plugins are very similar to normal plugins. A single plugin is split into two files: the .cpp file with the BR standard plugin definition and the .cu file with your kernel and wrapper functions.

The `.cpp` file

Every main plugin file must have the names of the kernel wrapper functions defined at the top of the program. Once the definitions are there, just call the CUDA functions as you need them

The `.cu` file

All functions within the CUDA file must be declared inside their own namespace under br::cuda. For example the plugin passthrough must have all functions inside it declared under the namespace br::cuda::passthrough.

CPU Template object format

Like any other BR Transform, the plugin must return an object for the next plugin to consume. For performance reasons, we don't copy data to and from the graphics card for every transform. Instead, we use this space to transfer data about how to access the image data and its type. The Mat is an array of data type void*.

Index	Item Name	Type	Description
0	GpuData	void*	Pointer to the graphics card data
1	rows	int	Number of rows in the Mat
2	cols	int	Number of colums in the Mat
3	type	int	OpenCV mat data type code (i.e. `mat.type()`)

It is expected that the wrapper function does the proper GPU memory handling to make sure that the GpuData pointer in the output mat is pointing to the data that the plugin is outputting.

Example: Passthrough

This example plugin takes in input data and passes it straight to the output. The BR transform calls the wrapper function which exists in the CUDA file which in turn calls the kernel routine to copy the data in the GPU.

Note: This program assumes that a previous Transform, namely CUDACopyTo has copied the data to the GPU.

passthrough.cpp

#include <openbr/plugins/openbr_internal.h>
#include <opencv2/opencv.hpp>

// wrapper function within the CUDA file
namespace br { namespace cuda { namespace passthrough {
  void wrapper(void* srcGpuData, void** dstGpuData);
}}};

#include <iostream>
namespace br
{
  class CUDAPassthroughTransform : public UntrainableTransform
  {
    Q_OBJECT

    void project(const Template &src, Template &dst) {
      // extract the parameters out of the Mat passed from the previous plugin
      void* const* srcDataPtr = src.m().ptr<void*>();
      int rows = *((int*)srcDataPtr[1]);
      int cols = *((int*)srcDataPtr[2]);
      int type = *((int*)srcDataPtr[3]);

      // generate a new Mat to be passed to the next plugin
      Mat dstMat = Mat(src.m().rows, src.m().cols, src.m().type());
      void** dstDataPtr = dstMat.ptr<void*>();
      dstDataPtr[1] = srcDataPtr[1];
      dstDataPtr[2] = srcDataPtr[2];
      dstDataPtr[3] = srcDataPtr[3];

      // call the wrapper and set the dst output to the newly created Mat
      br::cuda::passthrough::wrapper(srcDataPtr[0], &dstDataPtr[0], rows, cols);
      dst = dstMat;
    }
  };

  BR_REGISTER(Transform, CUDAPassthroughTransform);
}

#include "cuda/passthrough.moc"

passthrough.cu

#include <opencv2/opencv.hpp>

namespace br { namespace cuda { namespace passthrough {
  __global__ void kernel(char* srcPtr, char* dstPtr, int rows, int cols) {
    // get the current index
    int rowInd = blockIdx.y*blockDim.y+threadIdx.y;
    int colInd = blockIdx.x*blockDim.x+threadIdx.x;

    // don't do anything if we are outside the allowable positions
    if (rowInd >= rows || colInd >= cols)
      return;

    // write the input to the output
    rowDstPtr[rowInd*cols + colInd] = srcVal;
  }

  void wrapper(char* srcPtr, char** dstPtr, int rows, int cols, int type) {
    // verify the proper image type
    if (type != CV_8UC1) {
      cout << "Error: image type not supported"
      return;
    }

    *dstPtr = cudaMalloc(rows*cols*sizeof(char));

    dim3 threadsPerBlock(8, 8);
    dim3 numBlocks(imageWidth / threadsPerBlock.x + 1,
                   imageHeight / threadsPerBlock.y + 1);

    // run the kernel function
    kernel<<<numBlocks, threadPerBlock>>>(srcPtr, dstPtr, rows, cols);

    // free the memory as it isn't used anymore
    cudaFree(srcPtr);
  }
}}}