#

RbCUDA: Installation and Architecture

RBCuda is a software library designed to integrate with Ruby. It provides an interface for leveraging NVIDIA CUDA technology to accelerate computations on GPUs (Graphics Processing Units). This can significantly speed up tasks like matrix operations, simulations, and data processing by harnessing the parallel processing power of modern GPUs. Essentially, RBCuda helps Ruby users perform complex calculations more efficiently by offloading them to a GPU.

The main objectives of RbCUDA are:

  • Map all of CUDA into Ruby
  • Ready-made on-GPU linear algebra, reduction, scan using cuBLAS, cuMath, cuSolver libraries.
  • Random Numer generator using cuRand
  • Near-zero wrapping overhead.
  • CUDA profiler for Ruby.

This post explains the architecture of RbCUDA and how you can install it on your machine.

Installation

Install CUDA on your machine.

Building RbCUDA from source.

git clone https://github.com/prasunanand/rbcuda
cd rbcuda
bundle install
rake compile

Installing the gem

gem build rbcuda.gemspec
gem install rbcuda-0.0.0.gem

To check if installation was successful, run pry.

$ rake pry
pry -r './lib/rbcuda.rb'
[1] pry(main)> RbCUDA::CUDA.cuInit(0);
[2] pry(main)> device = RbCUDA::CUDA.cuDeviceGet(0);
[3] pry(main)> puts device
#<RbCUDA::RbCuDevice:0x00000001a9a2d0>
=> nil
[4] pry(main)> puts RbCUDA::CUDA.cuDeviceGetName(100, device);
GeForce GTX 750 Ti

If you are successfully able to retrive the name of GPU card, you are all set.

Code organisation

extconf.rb that helps in building the shared object file can be found here.

rbcuda.h defines all the Ruby structs that correspond to CUDA types. In the following code CUfunction type can be represented as function_ptr.

typedef struct FUNCTION_PTR
{
  CUfunction function;
}function_ptr;

typedef struct DEVICE_PTR
{
  CUdevice device;
}device_ptr;

The struct fuction_ptr is then wrapped by a Ruby object called RbCuFunction in the file ruby_rbcuda.c.

RbCUDA = rb_define_module("RbCUDA");

VALUE RbCuDevice = Qnil;
VALUE RbCuFunction = Qnil;

RbCuDevice    = rb_define_class_under(RbCUDA, "RbCuDevice",    rb_cObject);
RbCuFunction  = rb_define_class_under(RbCUDA, "RbCuFunction",  rb_cObject);

Dev Array

An array in RbCUDA is handled using Dev_Array class. Implementation is as follows:

typedef struct DEV_PTR
{
  double* carray;
}dev_ptr;

Dev_Array = rb_define_class_under(RbCUDA, "Dev_Array", rb_cObject);

A Dev Array stores the pointer to the array data stored on the GPU. The usage will be explained in the next blog.

Functionalities

RbCUDA has the following modules:

  1. CUDA : It consists of low-level APIs called the CUDA driver APIs.
  2. Runtime : It consists of higher-level APIs called the CUDA runtime APIs that are implemented on top of the CUDA driver APIs.
  3. CuBLAS : It consists of BLAS APIs provided by cuBLAS library.
  4. CuSolver : It consists of APIs provided by cuSolver library.
  5. CuRand : It consists APIs provided by cuRand library.
  6. Profiler : It consists of APIs for profiling CUDA code.

Conclusion

I have explained how the underlying architecture looks like.

We have got RbCUDA successfully installed on our system. In the next blog I will talk about implementing Runtime APIs.

#
Subscribe for exclusive

news and updates!