A Rubyist's Walk Along the C-side (Part 5): Variables

This is an article in a multi-part series called “A Rubyist’s Walk Along the C-side”

In the previous article, we saw some Ruby primitive data types. In this article, we’ll explore how to read and write to global variables, class variables, instance variables, and constants. Additionally, we’ll also look at storing Ruby objects as global variables in C.

Instance variables

Instance variables in Ruby are always prefixed with @, however, using the C API we can define “hidden” instance variables without the leading @ (e.g. foo rather than @foo). These variables are only accessible through the C API and not accessible through Ruby in any way, not even through Object#instance_variable_get (it will raise a NameError if we try to get an instance variable without the leading @).

Setting instance variables

To set an instance variable, we can use rb_ivar_set. It accepts three arguments and returns the value val:

  1. obj: The object to set the instance variable on.
  2. id: The ID symbol name of the instance variable.
  3. val: The value to set the instance variable to.
// Function prototype for rb_ivar_set
VALUE rb_ivar_set(VALUE obj, ID id, VALUE val);
// Setting an instance variable of my_obj named @foo
// to variable my_var
ID iv_foo = rb_intern("@foo"); // Symbol :@foo
rb_ivar_set(my_obj, iv_foo, my_var);

// We now set a "hidden" instance variable named foo
ID id_foo = rb_intern("foo"); // Symbol :foo
rb_ivar_set(my_obj, id_foo, my_var);

Getting instance variables

To get an instance variable, we can use rb_ivar_get. It accepts two arguments and returns the value of the instance variable if it exists, nil otherwise:

  1. obj: The object to get the instance variable.
  2. id: The ID symbol name of the instance variable.
// Function prototype for rb_ivar_get
VALUE rb_ivar_get(VALUE obj, ID id);
// Getting an instance variable of my_obj named @foo
ID iv_foo = rb_intern("@foo"); // Symbol :@foo
VALUE my_ivar = rb_ivar_get(my_obj, iv_foo);

// We now get a "hidden" instance variable named foo
ID id_foo = rb_intern("foo"); // Symbol :foo
VALUE my_hidden_ivar = rb_ivar_get(my_obj, id_foo);

Class variables

Class variables work very similarly to instance variables. Just like instance variables, class variable names don’t have to be prefixed with @@. However, only class variables starting with a @@ are accessible through Ruby code.

Setting class variables

To set a class variable, we can use rb_cvar_set. It accepts three arguments and does not return anything:

  1. obj: The class or module to set the class variable on.
  2. id: The ID symbol name of the class variable.
  3. val: The value to set the class variable to.
// Function prototype for rb_cvar_set
void rb_cvar_set(VALUE klass, ID id, VALUE val);
// Setting a class variable of my_klass name @@foo
// to variable my_var
ID cv_foo = rb_intern("@@foo"); // Symbol :@@foo
rb_cvar_set(my_klass, cv_foo, my_var);

Getting class variables

Just like getting instance variables, to get a class variable we can call rb_cvar_get. However, unlike getting instance variables, this function will raise a NameError: uninitialized class variable if the class variable doesn’t exist. The function accepts two arguments and returns the value of the class variable:

  1. klass: The class or module to set the class variable on.
  2. id: The ID symbol name of the class variable.
// Function prototype for rb_cvar_get
VALUE rb_cvar_get(VALUE klass, ID id);
// Getting a class variable of my_klass named @@foo
// Will raise NameError if @@foo is not defined
ID cv_foo = rb_intern("@@foo"); // Symbol :@@foo
VALUE my_var = rb_ivar_get(my_klass, cv_foo);

Check if a class variable is defined

Since rb_cvar_get will raise when the class variable isn’t defined, we might want to check that the class variable exists before accessing it. To do that, we can use the rb_cvar_defined function. This function accepts two arguments and returns Qtrue if the class variable exists, Qfalse otherwise:

  1. klass: The class or module to check for the class variable.
  2. id: The ID symbol name of the class variable.
// Function prototype for rb_cvar_defined
VALUE rb_cvar_defined(VALUE klass, ID id);
// Checking for a class variable of my_klass named @@foo
ID cv_foo = rb_intern("@@foo"); // Symbol :@@foo
if (rb_cvar_defined(my_klass, cv_foo) == Qtrue) {
  // @@foo exists in my_klass
} else {
  // @@foo does not exist in my_klass
}

There is also a corresponding rb_ivar_defined that works very similarly to rb_cvar_defined. Exploring that is left as an exercise for the reader.

Constants

We’ll now look at how to set and get constants in Ruby. If you didn’t know already, classes and modules in Ruby are just constants, so we can use this method to get classes and modules that are already defined. In the next part, we’ll look at how to define classes and modules through the C API. Also, top-level constants in Ruby are implicitly defined under Object. So a class ::Foo can be accessed through Object::Foo. This is important to know since accessing constants through the C API requires the namespace of the constant.

Setting constants

To set a constant, we can use the rb_const_set function. This function accepts three arguments and does not return anything:

  1. klass: The scope (class or module) to define the constant in.
  2. id: The ID symbol name of the constant.
  3. val: The value to set the constant to.
// Function prototype for rb_const_set
void rb_const_set(VALUE klass, ID id, VALUE val);
// Setting top-level constant ::MY_CONST to my_val
rb_const_set(rb_cObject, rb_intern("MY_CONST"), my_val);

Getting constants

To get a constant, we can use the rb_const_get function. This function accepts two arguments and returns the constant if it exists, raises a NameError: uninitialized constant if it does not exist:

  1. klass: The scope (class or module) to search the constant.
  2. id: The ID symbol name of the constant.
// Function prototype for rb_const_get
VALUE rb_const_get(VALUE klass, ID id);
// Getting top level constant ::Foo
VALUE foo = rb_const_get(rb_cObject, rb_intern("Foo"));
// Getting constant Foo::Bar
VALUE bar = rb_const_get(foo, rb_intern("Bar"));

Global variables

Accessing global Ruby variables works a little differently. Instead of converting a C string to an ID, we directly use the C string to refer to the global variable. Additionally, we cannot have “hidden” variables like instance and class variables. If the C string of the global variable name does not contain a leading $, a new C string is created from this C string to contain a leading $. This is inefficient as it uses more memory and does more work, so make sure that when accessing global variables through the C API to always have a leading $ in the name!

Setting global variables

To set a global variable, we can use rb_gv_set. It accepts two arguments and returns the value val:

  1. name: The name of the global variable as a C string (i.e. null-terminated).
  2. val: The value to set the global variable to.
// Function prototype for rb_gv_set
VALUE rb_gv_set(const char *name, VALUE val);
// Setting global variable $foo to my_var
rb_gv_set("$foo", my_var);
// This will also set $foo to my_var
// However, this is less efficient than the one above
rb_gv_set("foo", my_var);

Getting global variables

To get a global variable, we can use rb_gv_get. It accepts one argument and returns the value of the global variable if it exists, Qnil otherwise:

  1. name: The name of the global variable as a C string.
// Function prototype for rb_gv_get
VALUE rb_gv_get(const char *name);
// Getting global variable $foo
VALUE my_var = rb_gv_get("$foo");
// This will also get global variable $foo
// However, this is less efficient than the one above
VALUE my_var = rb_gv_get("foo");

Global variables in C

If we need to store a Ruby object globally and it’s only used in the C code, we can use a global variable in C. One of the biggest use cases for this is storing references to Ruby classes and modules on boot. Using a global C variable has several advantages including improving performance by not needing to call into Ruby and not cluttering Ruby with global variables. However, there is one gotcha: Ruby does not know about any references to this variable, so if there are no other references to this variable (e.g. Ruby code that points to the variable), it will reclaim the memory to this variable the next time garbage collection occurs. This will cause our global variable to point to invalid memory which could range anywhere from blank memory, to a segmentation fault, to accessing a totally different Ruby object!

To let Ruby know that we are using the object, we must use rb_global_variable to register it as a global variable. rb_global_variable accepts one argument and does not return anything:

  1. var: A pointer to memory that points to the Ruby object.
// Function prototype for rb_global_variable
void rb_global_variable(VALUE *var);
// Assume my_global_var is a global variable
VALUE my_global_var = ...;
rb_global_variable(&my_global_var);

Note that immediates don’t need to be registered (i.e. values from LONG2FIX). To get a refresher on how immediates (true, false, nil, fixnums) work, read the previous part on primitive data types.

What happens if a global variable is not registered?

You can find the accompanying source code in the GitHub repository peterzhu2118/ruby-c-ext-code

So what happens if we don’t register the global C variable? Let’s see a demo of this. Let’s create a C source file in ext/gv_bug.c:

#include <ruby.h>

// Store a string globally
static VALUE my_string;

static VALUE get_my_string(VALUE _)
{
    return my_string;
}

void Init_gv_bug(void)
{
    // Create a new string "Hello world!"
    my_string = rb_str_new_cstr("Hello world!");

    // Define a getter method for my_string
    rb_define_method(rb_cObject, "my_string", get_my_string, 0);
}

This just creates a new string “Hello world!” in a global C variable my_string. We also define a Ruby method Object#my_string to get the value stored in the global C variable.

Let’s now write the Ruby script test.rb that uses this C extension:

require_relative "ext/gv_bug"

puts "my_string is `#{my_string}`"

# Here we manually trigger the GC to collect objects that have no references
GC.start

puts "my_string is `#{my_string}`"

Here we manually trigger the garbage collector using GC.start. We could also allocate (i.e. use up memory) a large number of objects until the garbage collector decides to be triggered. Triggering this bug without calling GC.start is left as an exercise for the reader.

Let’s now run this script, it will dump a large, scary-looking backtrace that looks like this (the output is quite long, so parts have been omitted with ellipses):

my_string is `Hello world!`
test.rb:8: [BUG] Segmentation fault at 0x0000000000000000
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]

-- Crash Report log information --------------------------------------------
   See Crash Report log file under the one of following:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Control frame information -----------------------------------------------
c:0002 p:0054 s:0010 e:000005 EVAL   test.rb:8 [FINISH]
c:0001 p:0000 s:0003 E:000560 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
test.rb:8:in `<main>'

-- Machine register context ------------------------------------------------
...

-- C level backtrace information -------------------------------------------
/opt/rubies/ruby-2.7.1/bin/ruby(rb_vm_bugreport+0x96) [0x103db83e6]
/opt/rubies/ruby-2.7.1/bin/ruby(rb_bug_for_fatal_signal+0x1da) [0x103be884a]
/opt/rubies/ruby-2.7.1/bin/ruby(sigsegv+0x5b) [0x103d1772b]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff2035ad7d]
/opt/rubies/ruby-2.7.1/bin/ruby(rb_id_table_lookup+0x16) [0x103d52bc6]
/opt/rubies/ruby-2.7.1/bin/ruby(method_entry_get+0xb3) [0x103d96313]
/opt/rubies/ruby-2.7.1/bin/ruby(rb_callable_method_entry+0x29) [0x103d8ae59]
/opt/rubies/ruby-2.7.1/bin/ruby(vm_search_method+0x1b4) [0x103d9c814]
/opt/rubies/ruby-2.7.1/bin/ruby(vm_exec_core+0x3aef) [0x103d8ef8f]
/opt/rubies/ruby-2.7.1/bin/ruby(rb_vm_exec+0xb1a) [0x103da40fa]
/opt/rubies/ruby-2.7.1/bin/ruby(rb_ec_exec_node+0xc6) [0x103bf3e06]
/opt/rubies/ruby-2.7.1/bin/ruby(ruby_run_node+0x55) [0x103bf3ce5]
/opt/rubies/ruby-2.7.1/bin/ruby(main+0x5d) [0x103b4c99d]

-- Other runtime information -----------------------------------------------

* Loaded script: test.rb

* Loaded features:

    0 enumerator.so
    1 thread.rb
    2 rational.so
    3 complex.so
    4 ruby2_keywords.rb
    5 /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/x86_64-darwin19/enc/encdb.bundle
    6 /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/x86_64-darwin19/enc/trans/transdb.bundle
    7 /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/x86_64-darwin19/rbconfig.rb
    8 /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/rubygems/compatibility.rb
...
   46 /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/did_you_mean.rb
   47 /Users/peter/src/github.com/peterzhu2118/ruby-c-ext-code/part5/gv_bug/ext/gv_bug.bundle

...

This is our first time seeing the crash handler in Ruby! Let’s see some of the things it’s telling us. It first tells us that the crash occurred at test.rb:8, which is the second puts in test.rb. We see that the issue that occurred is a segmentation fault at a null pointer. Further down, we can see the Ruby backtrace and C backtrace. The C backtrace will be useful for debugging, especially if the C backtrace contains calls into C code from our C extension (which this one does not contain). We can also see Loaded features which contains all the Ruby files and C extensions that were loaded by Ruby. We can see that our gv_bug.bundle compiled file was the last file loaded.

Fixing this bug is left as an exercise for the reader.

Conclusion

In this article, we discussed ways to access and define instance variables, class variables, constants, and global variables through the Ruby C API. We also looked at how to store Ruby objects globally in C and the correct way to register this global variable. We also saw a crash that occurs when global C variables are not registered. In the next article, we’ll look at how to define and use classes and modules.