Peter's Adventures in Ruby: Creating Ruby strings in C

This is an article in a multi-part series called “Peter’s Adventures in Ruby”

Introduction

Creating a string in Ruby is probably one of the easiest things you can do in the language, you can create it just like this:

my_string = "Hello world!"

But when you’re developing MRI itself or writing a C extension, you are given many ways to create a string. So which one do you choose? Just use the one called rb_str_new? Pick at random? What’s the worst that can happen right? Turns out, the one you choose will have an impact on the performance and, most importantly, the correctness of your program. At the end, I’ll also share a real story about the problems that happen when the wrong way to create a string is used.

In fact, there are a total of 24 ways (in Ruby 2.7) to create a string using the C API (and there are many, many more ways inside MRI). I will talk about the three most common ways to create strings through the Ruby C API. Many of the others are variations of these three and are self-explanatory (e.g. creating a string with a specific encoding).

Ways to create strings in Ruby’s C API

rb_str_new

VALUE rb_str_new(const char *ptr, long len);

This one is pretty straightforward. It takes a pointer ptr to an array of characters and the length len of the string and returns the VALUE pointer to the created Ruby string object. Note that the created object points to a copy of the character array, so you can change the contents of ptr afterward without affecting the Ruby string.

Example:

char *c_str = malloc(13);
strcpy(c_str, "Hello world!");
VALUE my_string = rb_str_new(c_str, 12);
free(c_str);

rb_str_buf_new

VALUE rb_str_buf_new(long capa);

This one is also pretty straightforward. It just creates an empty string with a buffer that is capa in length. If you know ahead of time the size or approximate size of the string you’re going to create, it is efficient to set capa to that size. Of course, if you set capa to be larger than what you need you’ll be wasting memory.

Example:

VALUE my_string = rb_str_buf_new(c_str, 12);
rb_str_cat_cstr(my_string, "Hello world!");

rb_str_new_static

VALUE rb_str_new_static(const char *ptr, long len);

This looks awfully similar to rb_str_new doesn’t it? It actually works quite differently! This function requires you to pass a C string literal or a malloc‘d region which is NEVER free‘d (or at least not free‘d until this string has been garbage collected). This function creates a string without allocating extra memory for the string, meaning the created string object points directly to the character array pointer.

Example:

VALUE my_string = rb_str_new_static("Hello world!", 12);

So, what happens if you use the wrong one?

See my article on The Ruby inplace bug.