This is an article in a multi-part series called “Peter’s Adventures in Ruby”
Creating a string in Ruby is probably one of the easiest things you can do in the language, you can create it just like this:
my_string = "Hello world!"
But when you’re developing MRI itself or writing a C extension, you are given many ways to create a string. So which one do you choose? Just use the one called
rb_str_new? Pick at random? What’s the worst that can happen right? Turns out, the one you choose will have an impact on the performance and, most importantly, the correctness of your program. At the end, I’ll also share a real story about the problems that happen when the wrong way to create a string is used.
In fact, there are a total of 24 ways (in Ruby 2.7) to create a string using the C API (and there are many, many more ways inside MRI). I will talk about the three most common ways to create strings through the Ruby C API. Many of the others are variations of these three and are self-explanatory (e.g. creating a string with a specific encoding).
VALUE rb_str_new(const char *ptr, long len);
This one is pretty straightforward. It takes a pointer
ptr to an array of characters and the length
len of the string and returns the
VALUE pointer to the created Ruby string object. Note that the created object points to a copy of the character array, so you can change the contents of
ptr afterward without affecting the Ruby string.
char *c_str = malloc(13); strcpy(c_str, "Hello world!"); VALUE my_string = rb_str_new(c_str, 12); free(c_str);
VALUE rb_str_buf_new(long capa);
This one is also pretty straightforward. It just creates an empty string with a buffer that is
capa in length. If you know ahead of time the size or approximate size of the string you’re going to create, it is efficient to set
capa to that size. Of course, if you set
capa to be larger than what you need you’ll be wasting memory.
VALUE my_string = rb_str_buf_new(c_str, 12); rb_str_cat_cstr(my_string, "Hello world!");
VALUE rb_str_new_static(const char *ptr, long len);
This looks awfully similar to
rb_str_new doesn’t it? It actually works quite differently! This function requires you to pass a C string literal or a
malloc‘d region which is NEVER
free‘d (or at least not
free‘d until this string has been garbage collected). This function creates a string without allocating extra memory for the string, meaning the created string object points directly to the character array pointer.
VALUE my_string = rb_str_new_static("Hello world!", 12);
See my article on The Ruby inplace bug.