rb_gc_force_recycle is Deprecated in Ruby 3.1
If you are interested in Ruby C extensions, read my series “A Rubyist’s Walk Along the C-side”
It’s almost Christmas, and you know what that means! That’s right, Ruby 3.1 is around the corner. In Ruby 3.1, the public function in the Ruby C API called rb_gc_force_recycle
will be deprecated (see ticket). Additionally, this function has been changed to a no-op, so any code depending on its behavior will need to be fixed.
You will see a compiler warning during compilation if your native extension uses it. The warning looks like this:
test.c:7:5: warning: ‘rb_gc_force_recycle’ is deprecated: this is now a no-op function [-Wdeprecated-declarations]
7 | rb_gc_force_recycle(my_obj);
| ^~~~~~~~~~~~~~~~~~~
What is rb_gc_force_recycle
?
If you’re not familiar with how Ruby’s garbage collector works, read this
rb_gc_force_recycle
tells the garbage collector to forcibly reclaim the object. This was designed as an optimization for letting the garbage collector know which objects you know are dead and so the garbage collector can reclaim the memory (similar to how the free
C library function works). This can reduce the number of garbage collections which can make Ruby run faster since garbage collection cycles are relatively costly in performance.
It’s important to note that rb_gc_force_recycle
reclaims the RVALUE of the object, with no regard to the type of the object or the contents of the object. This is unlike objects freed during regular garbage collection cycles since rb_gc_force_recycle
does not free any resources (e.g. allocated memory, open file descriptors) held on by the object. This means that if the caller of rb_gc_force_recycle
forgets to free the resources held by the object, then they will leak.
Why was rb_gc_force_recycle
deprecated?
There are several reasons why rb_gc_force_recycle
was deprecated, including causing memory leaks and violating GC assumptions.
Resource leaks
As mentioned above, because rb_gc_force_recycle
does not free any resources held on by the object, the caller must remember to correctly free all of the resources held on by the object. Forgetting to free resources before calling rb_gc_force_recycle
is an easy mistake to make.
The infamous memory leak in Hash#transform_keys!
in Ruby 3.0.2 was caused by a call to rb_gc_force_recycle
that didn’t correctly free resources. This was the culprit that caused extreme memory growth for Rails apps in Ruby 3.0.2. You can see the impact of this memory leak on RubyGems.
This was also the reason for memory leaks in liquid-c. liquid-c is an extension that makes the liquid templating language faster.
To find and prevent memory leaks in your native gem, you can use the ruby_memcheck gem for an easy way to find memory leaks using your test suite. It has found memory leaks in popular Ruby gems such as nokogiri, liquid-c, protobuf, and gRPC. You can read about how ruby_memcheck works in my blog post.
Violation of GC assumptions
As a Ruby core developer that works on the garbage collector, rb_gc_force_recycle
is frequently a pain point for us that makes innovation difficult as it violates assumptions in the garbage collector. The garbage collector assumes that objects are only freed during the sweeping phase of garbage collection, but rb_gc_force_recycle
violates this since it can cause objects to be freed regardless of what state the garbage collector is in. This means that rb_gc_force_recycle
has to do various things like lock the VM (which hurts parallelism) and remove the object from the mark stack if it is on the mark stack (since we cannot mark a dead object).
How does this deprecation affect me?
If you are a native extension gem maintainer, check if your gem uses rb_gc_force_recycle
. If it doesn’t, there’s nothing you need to do!
However, if you do use rb_gc_force_recycle
, you can try to remove the calls to rb_gc_force_recycle
and run your test suite to see if there are any bugs or crashes. The most common type of crash is called “double free” caused by freeing the same resource twice. Sometimes, calls to rb_gc_force_recycle
are not safe to just be removed. You may have to replace it with calls to RB_GC_GUARD
. I’ll talk about both of these in more depth below.
Double free
Since rb_gc_force_recycle
is now a no-op function, the object will no longer be reclaimed by rb_gc_force_recycle
but rather will be reclaimed during the regular garbage collection cycle. So if you release the resources held on by the object before the call to rb_gc_force_recycle
, then it may get released again during garbage collection (which can cause crashes or other undefined behavior).
If you free an allocated region of memory more than once, you may get a crash with output double free or corruption
.
If you close a file descriptor more than once, the close
call may fail (return -1
and set EBADF
on errno
). File descriptors may also be reused by the system, so the second time you close a file descriptor you may close one that was recently opened by some other part of your program!
How should I fix this?
The best solution is to not release any resources during execution and only release resources during garbage collection. But you may be using rb_gc_force_recycle
because you have to release the resources right there rather than later during garbage collection. In that case, make sure to leave the object in a state such that during garbage collection the same resource does not get released again.
Guarding the stack
When removing calls to rb_gc_force_recycle
, you should consider whether the Ruby object (i.e. the VALUE
that points to the object) could be optimized away by the compiler while there are pointers on the stack that refer to resources held on by that object. Consider the following C snippet:
// Create an array with capacity for 100 elements
VALUE my_array = rb_ary_new_capa(100);
// Get the pointer to the contents of the array
VALUE *array_ptr = RARRAY_PTR(my_array);
// Do some things with array_ptr
array_ptr[0] = rb_str_new_cstr("Hello world!");
// Free memory held on by this array by resizing to 0
rb_ary_resize(my_array, 0);
// Force recycle array object
rb_gc_force_recycle(my_array);
If we removed the calls to rb_ary_resize
and rb_gc_force_recycle
at the end, then the compiler may potentially optimize the stack memory so that my_array
is replaced with array_ptr
. If that happens and the call to rb_str_new_cstr
triggers a garbage collection, my_array
may be reclaimed since there are no references to it from other objects and it no longer exists on the stack. This means that the memory region pointed to by array_ptr
may be freed and we may have a use-after-free.
Additionally, this code assumes that resizing the array to capacity 0 will free all resources for the array. While this may be true in the current implementation, this may not be true in the future and may cause memory leaks.
You may be able to more consistently reproduce crashes of this kind by running your test suite with GC.stress = true
. But that may not always cause the bug to surface. You can also use the Valgrind tool to help you find use-after-free bugs, you can use the ruby_memcheck gem for an easy way to use Valgrind on native gems.
How should I fix this?
If a particular variable must exist on the stack up to a certain point, you can use RB_GC_GUARD
to guard that a variable is not optimized away up to that point. In most cases, doing excessive calls to RB_GC_GUARD
will be harmless, but may impact performance (runtime and/or memory usage) since it prevents certain compiler optimizations from occurring.
The example above can be updated as such:
// Create an array with capacity for 100 elements
VALUE my_array = rb_ary_new_capa(100);
// Get the pointer to the contents of the array
VALUE *array_ptr = RARRAY_PTR(my_array);
// Do some things with array_ptr
array_ptr[0] = rb_str_new_cstr("Hello world!");
// Guard that my_array exists on the stack until here
RB_GC_GUARD(my_array);
Conclusion
You’ve seen what rb_gc_force_recycle
does, why it was deprecated, the common bugs that might show up, and how to fix them. If you’re a native gem maintainer and have questions or feedback, feel free to reach out to me through Twitter or email!