12/08/2018, 13:44

Ruby C Extension

Objective & Goal In this article I will walk you through the basic step of how to build ruby extension using C programming language. First of we will take a look at how to configure and setup a basic tool needed for development and then we will move on to exploring basic set of C interfaces, ...

Objective & Goal

In this article I will walk you through the basic step of how to build ruby extension using C programming language. First of we will take a look at how to configure and setup a basic tool needed for development and then we will move on to exploring basic set of C interfaces, which provided by Ruby, essential for building the extension. Eventhough this guild will only show you how to write C extension for Ruby, but once you get a hold of how thing works you can easily do the reverse, which make use of Ruby library in C, just as easily without pain.

Notice that this is not a complete references to Ruby C interface that being said you need to consult the official documentation or browsing through the sea of C implementation source code of Ruby language if you want to dig more deeper.

Configuration & Tool

Most of C programmers make use of build tool for dependecies and project management in thier C projects, unless you are a complete narcissist who love to punish yourself, and the same also true for building Ruby extension. There are a lot of build tools available, but the most commonly use and available out of the box on UNIX/Linux system is make. Writing Makefile for simple project is easy, but as project grow and more has depedencies keeping a clean Makefile can be tedious. Fortunately there is an easy to generate Makefile when building Ruby C extension. Ruby provide a ulility library for that job call MakeMakefile. Now lets take a look at how we can use this library.

require 'mkmf'

create_makefile('foo')

create_makefile takes two arguments. The first one is require target name which correspond to global function in our c source code. This global function is our main entry point the init function which I will talk about later when we reach Ruby C interface part. If the target contains "/" then only the last part will used as global function name and the rest will consider as top level directory. The last optional argument is srcprefix can be use when the c source file is not in the same directory as configuration script. Using this argument will not only let you organize your c soruce code, but it will also sets target_prefix in generated Makefile which in turn install the generated library into directory under your RbConfig::CONFIG['sitearchdir'] when you run make install. For example, given the following file tree.

ext/
  extconf.rb
  foo/
    bar.c

And with the following config code

create_makefile('foo/bar', 'foo')

Will sets target_prefix to foo and will create the following file tree when running make install

/path/to/ruby/dir/foo/bar.so

There are a lot of other functions and constants that this library provided which help you tweek the generated Makefile so please consult the Ruby documentation here, but ones that you'll most likely need are dir_config, endable_config, have_header, find_header, have_library, find_library and some constants like LIBARG, LIBPATHFLAG. Now all that is left is to run this script and it will generate Makefile for you. One last thing, if you intended to build gem then you might want to take a look at these gems hoe and rake-compiler.

Ruby C Interface

Remember I mentioned about global function that will be use as our main entry point in the last section. Well in normal C program this will usually will be main function, but this is not the case here. This function will need to follow a convention and nameed after the target argument that we passed in create_makefile prefix with Init_. For example above it should be named Init_bar. When ruby load extension from compiled library, usually *.so for UNIX *.dll for Window or *.dylib for Mac, this global function will get called so anything that you want to write like define class, method or module all goes in this function. Take a look at the sample code bellow that define a module named Foo contain a method named bar that return an integer value. Don't sweat the detail for now I'll explain it later in this section.

VALUE bar(VALUE self)
{
    return INT2NUM(30);
}

void Init_Foo()
{
    VALUE Foo = rb_define_module("Foo");
    rb_define_method(Foo, "bar", bar, 0);
}

VALUE

This is the most general form of Ruby object representation in C. The API never let you directly access the underlying Ruby's object, but instead store and pass around pointer to Ruby's object in this form.

Type Checking

There is a time when you need to know the type of VALUE to determine whether it is a correct type so that you can safely work on it. In that case the API has a couple of macros to perform type checking that take a form of T_Type corresponding to Ruby class you want to test, e.g. T_STRING, T_ARRAY, T_FIXNUM, etc. For certain classes, there are specialized macros that are a little more efficient than the previous.

RTEST(obj); /* Check is obj is "truthy" (not nil or false) */
NIL_P(obj); /* nil object */
FIXNUM_P(obj); /* Fixnum */
SYMBOL_P(obj); /* Symbol */
RB_FLOAT_TYPE_P(obj); /* Float */

The API also define TYPE() macro which return one of the T_Type that you can you with switch statement like this.

switch (TYPE(obj))
{
    case T_STRING:
        /* logic */
        break;
    case T_FIXNUM:
        /* logic */
        break;
}

Conversion

To transfer data between C and Ruby you need to be able to convert value from one to another. Few Ruby classes are analogous to C types.

Fixnum corresponds to long. The FIX2LONG() macro convert Fixnum value to long. There are also FIX2UINT(), FIX2INT(), FIX2SHORT() which give you unsigned int, int and short respectively. Keep in mind that these macro will raise RangeError if the number is too big to fit in.

Bignum corresponds to long long and rb_big2ll() will convert to long long and rb_big2ull() will convert to unsigned long long from Bignum. These function also raise RangeError.

Float corresponds to double and RFLOAT_VALUE() convert from Float to double.

There are also macros that help you convert VALUE's object to an appropriate Numeric subclasses type.

INT2NUM() /* for int */
UINT2NUM() /* for unsigned int */
LONG2NUM() /* for long */
ULONG2NUM() /* for unsigned long */
LL2NUM() /* for long long */
ULL2NUM() /* for unsigned long long */
DBL2NUM() /* for double */

For opposite direction

NUM2CHAR() /* for char */
NUM2SHORT() /* for short */
NUM2USHORT() /* for unsigned short */
NUM2LONG() /* for long */
NUM2ULONG() /* for unsigned long */
NUM2LL() /* for long long */
NUM2ULL() /* for unsigned long long */
NUM2DBL() /* for double */

String corresponds to char*. StringValueCStr() convert from String to null terminated char* there is also StringValuePtr version. The difference is Ruby String might contain null and StringValueCStr will raise ArgumentError. To get a length of String's value use RSTRING_LEN() macro.

To conver from char* to String use rb_str_new_cstr() or rb_str_new() with long as string's length for second argument. These functions will give you string with ASCII-8BIT encoding so if you want different encoding pass VALUE to rb_str_export_locale() to get new VALUE for your locale.

There is not build-in C type for Symbol for API define ID type for this. Use ID2SYM() and SYM2ID() to convert between ID to Symbol and Symbol to ID respectively. Sometimes you might want to use char* instead of ID to get Symbol, in which case you can make use of rb_intern function and rb_id2name() for the opposite.

Calling Ruby Method

To call Ruby method in C use rb_funcall() which take VALUE's object as the receiver, ID as method's name, int as number of argument, varargs of VALUE as argument's value and return VALUE as a result.

VALUE obj;
VALUE result;
result = rb_funcall(obj, rb_intern("+"), 1, INT2NUM(10));

Which would translate to this Ruby code. As a tip it would be better if you think of expressing the code in term of metaprogramming.

result = obj.send(:+, 10)

Most of Ruby classes have API function define which take in a form of rb_(class)_(method). So for example Array#pop, instead of writing

rb_funcall(obj, rb_intern("pop"), 0, NULL);

You could write

rb_ary_pop();

Block

If you have a proc and want to pass it as a block to function call you can do so with rb_funcall_with_block(). The function signature is just like rb_funcall() except that the fourth argument is a VALUE* and the fifth argument is the proc object you want to pass. If you don't have a proc object you can also using function as a block to function. Block function need to follow function signature as API define, that is look like this.

VALUE foo_block(VALUE arg, VALUE data, int argc, VALUE *argv)
{
    /* logic */
}

rb_block_call(obj, rb_intern("map"), 0, NULL, foo_block, Qnil);

arg is the first yielded value, data will be the last argument pass to rb_block_call(), argc and argv useful when block yield mutiple values.

Constants

API has defined global VALUE for classes and modules constants that we can use. Prefix with rb_c for class, e.g. rb_cArray, rb_m for module e.g. rb_mKernel, rb_e for Exception and its subclass, standard IO with rb_ e.g. rb_stderr, false, true and nil prefix with Q e.g. Qfalse, Qnil.

Exception

Use rb_errinfo() to get last raised exception and unlike in Ruby you need to clear its value after your done with it. You can do that using rb_set_errinfo(Qnil)

To rescue exception you first need to wrap the exception code in a function that take and return VALUE then use rb_rescue2 with code to rescue in case exception is raise wrap in another function that has the same signature as exception wrapper function.

VALUE ex_func(VALUE obj)
{
    /* code that could raise exception */
    return obj;
}

VALUE rescue_func(VALUE obj)
{
   /* some rescue logic */
   return obj;
}

void some_func()
{
   /* ... */
   rb_rescue2(ex_func, ex_args, rescue_func, rescue_args, rb_eRangeError, 0);
}

The first two arguments are function to protect and its argument the next two argument are function to rescue and its arguments and come after it are varargs arguments of exception class to rescue and the last argument should always be 0 to mark the end of class. To rescue StandardError use rb_rescue() with the first four arguments in rb_rescue2.

Define Class and Module

Use rb_define_module() to create a module and rb_define_module_under() to create a nested module, for class use rb_define_class() and rb_define_class_under() respectively.

VALUE mFoo, mBar;
/* top level module Foo */
mFoo = rb_define_module("Foo");
/* nested module Foo::Bar */
mBar = rb_define_module_under(mFoo, "Bar");

VALUE cA, cB;
/* A < Object or same as A */
cA = rb_define_class("A", rb_cObject);
/* A::B < Object or same as A::B */
cB = rb_define_class_under(cA, "B", rb_cObject);

Define Method

To define a method you first need to get a hold of class or module VALUE's object and use one of rb_define_method(), rb_define_private_method(), rb_define_protected_method() or rb_define_singleton_method(). The arguments to these function are class object which method will be define on, method name, method function and number of argument for this method.

VALUE klass;
rb_define_method(klass, "foo", method_func, 0);

Method function have three signatures.

/* can have up to 16 args not including self */
VALUE method1(VALUE self, VALUE arg1, VALUE arg2, ...)

/* all args are bundle into a Ruby Array store in args */
VALUE method2(VALUE self, VALUE args)

/* use argc and argv to access method argument */
VALUE method3(int argc, VALUE *argv, VALUE self)

To check if block is given you can use rb_need_block() which raise LocalJumpError or rb_block_given_p() which return none zero if block given.

Define Instance Variable

Use rb_iv_get() and rb_iv_set() to get and set instance variable. Keep in mind that instance variable name need to prefix with @

VALUE obj, iv;
iv = rb_iv_get(obj, "@a");
rb_iv_set(obj, "@a", iv);

Allocation

This is a little bit complex topic compare to the last section. As you have seen so far you should be able to define Ruby class using API, but only class that encapsulates data that can easily translate into VALUE, what if you need to encapsulates data that can't translate into VALUE? (e.g. structure define by some C library)

The API lets you encapsulate C data by creating a VALUE of the desired class and then storing a void* pointing to the C data inside the Ruby object. You can unpack data and cast it back into appropriate type whenever you need to access to C data. To correctly encapsulate these type of data you need to define your own allocation function to do so, but where should we call this function. We let first take a look at how Ruby normally allocate object when we called on new.

class Class
  def new(*args, &blk)
    obj = allocate
    obj.initialize(*args, &blk)
    obj
  end
end

First Ruby called allocate to create an empty object then call initialize. allocate is the place that you need to to wrap your C data. The following example will define a class Foo which wrap struct point that we can initialize by passing two int value to initialize.

struct point {
    int x;
    int y;
};

void foo_free(struct point *point)
{
    free(point);
}

VALUE foo_alloc(VALUE self)
{
    struct point *point = malloc(sizeof(struct point));
    return Data_Wrap_Struct(self, NULL, foo_free, point);
}

VALUE foo_m_initialize(VALUE self, VALUE x, VALUE y)
{
    struct point *point;
    Data_Get_Struct(self, struct point, point);

    point->x = NUM2INT(x);
    point->y = NUM2INT(y);

    return self;
}

void some_func()
{
    VALUE cFoo = rb_define_class("Foo", rb_cObject);
    rb_define_alloc_func(cFoo, foo_alloc);
    rb_define_method(cFoo, "initialize", foo_m_initialize, 2);
}

Data_Wrap_Struct() first argument is the class to give the VALUE, second argument is a mark function for garbage collector when your C data hold pointer to Ruby object since in the example we don't use any Ruby object then we just pass NULL to it, the third argument is pointer to function to call when object get destroyed and the last argument is the pointer to data to wrap in to VALUE.

Data_get_Struct use to unwrap data from VALUE. The first argument is the current object, the second argument is data type of data and the third argument is the pointer to data to unwrap into.

Mark Function

I did mentioned something about a mark function a monent ago and what this means is that this function uses to tell the Ruby garbage collector which object referece need to be mark as cadidates to be destroyed when the object that hold it get destroyed. For example suppose we change to above point structure to hold reference to Ruby object instead of int we'll need to modify our code to mark those objects. To do that we need to define a mark function.

struct point {
    VALUE x;
    VALUE y;
};

void foo_mark(struct point *point)
{
    rb_gc_mark(point->x);
    rb_gc_mark(point->y);
}

VALUE foo_alloc(VALUE self)
{
    struct point *point = malloc(sizeof(struct point));
    return Data_Wrap_Struct(self, foo_mark, foo_free, point);
}

Putting it all together

I've write a simple Ruby binding to libgit2 which can be found in this repository. This Ruby binding has one class that live in Re module name Repository which has an instance method clone. To compile you first need to install hoe and rake-compiler gem and as well as libgit2 library. Once you have clone the repository change directory inside and run rake compile if the compile success then you will see a library file name re.so under lib/re/ directory. You can test this by open up irb in the root directory with the following code.

require_relative 'lib/re/re'

repo = Re::Repository.new('path/to/clone/into')
repo.clone('https://url/to/git/repository')
0