ArrayFire-rb has been interfaced to NMatrix so that they can work together. Sometimes, copying data
from GPU memory to CPU memory and vice versa can be slower than the computation time. Hence, a lot of
developers prefer to use GPU computing for the most time consuming task. The interface between ArrayFire-rb
and NMatrix
helps in providing maximum freedom to the developers to use the libraries of their choice.
This post explains how NMatrix and ArrayFire-rb interface has been created. It starts with adding
nmatrix as a dependency to arrayfire
.
gem.add_development_dependency 'nmatrix', '~> 0.2.1'
Next add nmatrix.h
header file allowing us to call nmatrix
’s C/C++ methods
from arrayfire
.
nmatrix_path = Gem::Specification.find_all_by_name('nmatrix').compact
abort "Cannot locate NMatrix installation" unless nmatrix_path
nmatrix_header_dir = File.join(nmatrix_path[0].require_path)
HEADER_DIRS = [
'/opt/local/include',
'/usr/local/include',
INCLUDEDIR,
'/usr/include',
nmatrix_header_dir
]
LIB_DIRS = [
'/opt/local/lib',
'/usr/local/lib',
LIBDIR,
'/usr/lib',
nmatrix_header_dir
]
dir_config(extension_name, HEADER_DIRS, LIB_DIRS)
have_library('af')
An NMatrix Ruby C object is denoted by cNMatrix
. Since, cNMatrix
has been initially defined in nmatrix.h
file
we represent it as extern VALUE
and C++
has been specified as we are using the C++
interface.
Next, we create the Ruby bindings.
extern "C++" {
VALUE cNMatrix;
}
void Init_arrayfire() {
ArrayFire = rb_define_module("ArrayFire");
Af_Array = rb_define_class_under(ArrayFire, "Af_Array", rb_cObject);
rb_define_method(Af_Array, "to_nmatrix", (METHOD)arf_af_array_to_nmatrix, 0);
cNMatrix = rb_define_class("NMatrix", rb_cObject);
rb_define_method(cNMatrix, "to_af_array", (METHOD)arf_nmatrix_to_af_array_method, 0);
}
The C functions have been implemented as the following code snippets.
nm
namespace is used to store NMatrix
types
and structs
.
An Af_Array#to_nmatrix
is called to convert an Af_Array
object to NMatrix
object. The #to_nmatrix
method calls
arf_af_array_to_nmatrix
method which is responsible for calling NMatrix C++ APIs
.
The rb_nmatrix_dense_create()
is NMatrix API to create a dense NMatrix
object. nm::FLOAT64
is the dtype
of NMatrix.
// ext/mri/interface/nmatrix.c
static VALUE arf_af_array_to_nmatrix(VALUE self) {
afstruct* input;
Data_Get_Struct(self, afstruct, input);
dim_t count;
uint ndims;
af_get_numdims(&ndims, input->carray);
dim_t* dims = (dim_t*)malloc(ndims * sizeof(dim_t));
af_get_dims(&dims[0], &dims[1], &dims[2], &dims[3], input->carray);
size_t* shape = (size_t*)malloc(ndims * sizeof(size_t));;
for (dim_t index = 0; index < ndims; index++){
shape[index] = (size_t)(dims[index]);
}
af_get_elements(&count, input->carray);
double* elements = (double*)malloc(count * sizeof(double));
af_get_data_ptr(elements, input->carray);
return rb_nmatrix_dense_create(nm::FLOAT64, shape, ndims, elements, (int)count);
}
An NMatrix#to_af_array
is called to convert an NMatrix
object to Af_Array
object. The #to_af_array
method calls
arf_nmatrix_to_af_array_method
method which is responsible for calling NMatrix C++ APIs
. NM_DTYPE()
is used to check
the dtype of an NMatrix
object. Currentlt, only nm::FLOAT64
is supported.
The arf_nmatrix_to_af_array
is called by arf_nmatrix_to_af_array_method
method and an Af_Array
object.
// ext/mri/interface/nmatrix.c
extern VALUE arf_nmatrix_to_af_array_method(VALUE nmatrix) {
if (NM_DIM(nmatrix) > 4) {
rb_raise(rb_eStandardError,
"NMatrix must not have greater than 4 dimensions.");
}
if (NM_DTYPE(nmatrix) == nm::FLOAT64) {
return Data_Wrap_Struct(Af_Array, NULL, arf_free, arf_nmatrix_to_af_array(nmatrix));
}
else {
rb_raise(rb_eStandardError,
"NMatrix should be either :complex64, :complex128, :int32 or :float64 type.");
}
return Qnil;
}
afstruct* arf_nmatrix_to_af_array(VALUE nm) {
DENSE_STORAGE* nmat = NM_STORAGE_DENSE(nm);
afstruct* output = ALLOC(afstruct);
if (nmat->dtype != nm::FLOAT64) {
rb_raise(rb_eStandardError, "requires dtype of :float64 to convert to an Af_Array");
}
dim_t* shape = (dim_t*)malloc(nmat->dim * sizeof(dim_t));;
for (size_t index = 0; index < nmat->dim; index++){
shape[index] = (size_t)(nmat->shape[index]);
}
af_create_array(&output->carray, nmat->elements, nmat->dim, shape, f64);
return output;
}
Let’s pry it out.
$ rake pry
pry -r './lib/arrayfire.rb'
[1] pry(main)> require 'nmatrix'
=> true
[2] pry(main)> a = ArrayFire::Af_Array.new 2, [4,4], [1, 2, 2, 0, -2, 2 , 1, 3, 1, 4, 3 , 1, 0, -3, 2, 9]
No Name Array
[4 4 1 1]
1.0000 -2.0000 1.0000 0.0000
2.0000 2.0000 4.0000 -3.0000
2.0000 1.0000 3.0000 2.0000
0.0000 3.0000 1.0000 9.0000
=> #<ArrayFire::Af_Array:0x000000019bd6f0>
[3] pry(main)> b = a.to_nmatrix
=>
[
[ 1.0, 2.0, 2.0, 0.0]
[-2.0, 2.0, 1.0, 3.0]
[ 1.0, 4.0, 3.0, 1.0]
[ 0.0, -3.0, 2.0, 9.0]
]
[4] pry(main)> c = b.to_af_array
=> #<ArrayFire::Af_Array:0x00000001a4d340>
[5] pry(main)> c.elements
=> [1.0, 2.0, 2.0, 0.0, -2.0, 2.0, 1.0, 3.0, 1.0, 4.0, 3.0, 1.0, 0.0, -3.0, 2.0, 9.0]
[6] pry(main)> ArrayFire::Util.print_array(c)
No Name Array
[4 4 1 1]
1.0000 -2.0000 1.0000 0.0000
2.0000 2.0000 4.0000 -3.0000
2.0000 1.0000 3.0000 2.0000
0.0000 3.0000 1.0000 9.0000
=> true
[7] pry(main)> ArrayFire::Util.print_array(c)
No Name Array
[4 4 1 1]
1.0000 -2.0000 1.0000 0.0000
2.0000 2.0000 4.0000 -3.0000
2.0000 1.0000 3.0000 2.0000
0.0000 3.0000 1.0000 9.0000
=> true
[8] pry(main)> c = b.to_af_array
=> #<ArrayFire::Af_Array:0x00000001436088>
It works!.
Hence, NMatrix and ArrayFire can be easily interfaced to each other.
Now, we can see how using ArrayFire can beneficial over using NMatrix. I benchmarked the matrix multiplication for two matrices with same elements using code.
shapeArray.each do |shape|
elements1 = Array.new(shape[0]*shape[1]) { rand(1...999999) }
elements2 = Array.new(shape[0]*shape[1]) { rand(1...999999) }
cpu_matrix1 = NMatrix.new(shape, elements1, dtype: :float64)
cpu_matrix2 = NMatrix.new(shape, elements2, dtype: :float64)
gpu_matrix1 = cpu_matrix1.to_af_array
gpu_matrix2 = cpu_matrix2.to_af_array
iters.times do
ArrayFire::BLAS.matmul(gpu_matrix1, gpu_matrix2, :AF_MAT_NONE, :AF_MAT_NONE) # warmup
end
result[:mat_mult_cpu] << [ shape[0]*shape[1], Benchmark.measure{cpu_matrix1.dot(cpu_matrix2)}.to_s.tr('()', '').split(" ")[3].to_f ]
result[:mat_mult_gpu] << [ shape[0]*shape[1], Benchmark.measure{ArrayFire::BLAS.matmul(gpu_matrix1, gpu_matrix2, :AF_MAT_NONE, :AF_MAT_NONE)}.to_s.tr('()', '').split(" ")[3].to_f ]
end
(Note: The above benchmarks have been done on an AMD FX 8350 octacore processor and Nvidia GTX 750Ti GPU. CUDA backend of ArrayFire was used with double floating points.)
As the matrix size increases, we can see the difference is huge.
Using ArrayFire can speed up the calculation by 7 e +5 times.
There may be overheads in copying data from CPU to GPU and vice-versa. But overall, for large matrices(Big Data), we can gain massive speedups when we use the right optimization. I would be discussing more about such optimizations in another blog post.
However, ArrayFire being significantly faster than NMatrix can easily help Rubyists by adding small chunk of code and speeding up the critical/slower steps.
Power to Ruby!
©2016-2024 Prasun Anand