Garbage Collection in Ruby
Overview
Ruby’s garbage collector code lives in a single file called gc.c
. “Garbage collector” is probably not the best term to call it because in addition to garbage collection, the code inside gc.c
is responsible for memory allocation and management. In other words, the whole lifecycle of an object is managed by the garbage collector.
Primitive object types
While every type of Ruby object may appear to work the same in the Ruby source code, there are actually many different types of Ruby objects internally. These types can be classified into two groups: immediates (which are not managed by the garbage collector) and heap allocated (which are managed by the garbage collector).
Immediates
Not all Ruby objects are managed by the garbage collector. There are several special types of objects called immediates. Inside of your Ruby program, immediates behave just like any other Ruby objects but are represented very differently. The following types are represented as immediates:
nil
.true
andfalse
.Fixnum
: small integers that are between-(2**62)
and2**62 - 1
(on 64 bit systems).Float
: floating point numbers represented as immediates. However, there are some floating point numbers that cannot be represented as immediates and are heap allocated instead.- Static
Symbol
: constant symbols in your source code (e.g.:foo
). Dynamic symbols that are generated at runtime (e.g."foo".to_sym
or:"foo#{my_variable}"
) are not immediates and are heap allocated instead.
Immediates take advantage of a certain property of non-immediate Ruby objects (which is discussed in further detail in the “slots” section). All pointers to Ruby objects are aligned at 40-byte bounaries (i.e., the pointer is a multiple of 40). This alignment guarantees that the lowest three bits are always 0 (since multiples of 40 means that it is also a multiple of 8). Ruby takes advantage of this to do pointer tagging by using the bottom three bits of the pointer to store the type of the immediate and using the remaining bits to store the data. So if the bottom three bits are zero, then the object is managed by the garbage collector. If the bottom three bits are non-zero then it’s an immediate. You can see the various values Ruby uses to tag pointers.
Heap allocated
There are actually 14 different heap allocated object types in Ruby. These are:
Array
Bignum
Class
Complex
File
Float
Hash
MatchData
Object
Rational
Regexp
String
Struct
Symbol
We’ll talk more about how these types are allocated in Ruby in the next section.
Data structures
We’ll now discuss the various data structures Ruby’s garbage collector uses to allocate objects and manage memory. We’ll start from the smallest unit (slots) and work our way to larger (heap pages) and larger (size pools) units of management.
The VALUE
type
Every pointer to a Ruby object is stored in a type called VALUE
. The VALUE
type is defined to be of the same size as a pointer on your system. For heap allocated objects, the VALUE
type directly points to the memory address of the object. For immediates, the bottom 3 bits of the pointer are used to store the type of the immediate and the remaining bits are used to store the data.
Slots
Every heap allocated object lives in a slot. Due to historical reasons, slots are of sizes that are a multiple of 40 bytes. We’ll discuss more about slot sizes in the “size pools” section.
Every slot begins with two fields, both of the VALUE
type called flags
and klass
:
flags
is used to store data about the object. On 64 bit systems, the first 12 bits are used to store various metadata about the object, such as frozen status, the age of the object (used for the generational garbage collector), and the write barrier protected status (which will be discussed in the “write barrier” section). It is then followed by 20 bits that the specific object type can use to store data. For example, small arrays use some of these bits to store the length of the array. The top 32 bits are used by object shapes to store the shape of the object (which mostly holds information about the instance variables).klass
is a pointer to aClass
object that is the class of this object.
The remaining space in the slot is used by the object to store data for that data type. For example, arrays use the remaining space to store the contents of the array while classes hold information such as the method table, constant table, and list of superclasses.
Heap page
The heap is divided up into pages, with each page being 64KB in size. These pages are further divided up into slots. The slot size for each page is fixed (i.e., there is only one slot size for each page). This is for simplicity and performance, as it means that when we iterate over the objects on the page, we don’t need to change slot sizes.
The garbage collector allocates memory from the system in units of heap pages rather than allocating each slot individually. There are many reasons for doing so, one of these reasons is for allocation performance. It is much more efficient to allocate large chunks of memory from the system rather than frequently allocating small pieces of memory.
Size pools
Introduced in Ruby 3.2, size pools are the core of the Variable Width Allocation feature that allows dynamic sized objects to be allocated through the garbage collector. Prior to this feature, all objects were allocated in 40 byte slots and any additional data had to be externally allocated through the system. This limited performance and decreased memory efficiency.
Size pools are used to manage heap pages that have slots of the same size. The current slot sizes used (on 64 bit systems) are power-of-two-multiples-of-40-bytes and there are five total size pools. That’s pretty hard to understand, but what that means is there are five size pools where each has slot sizes of 40 bytes, 80 bytes, 160 bytes, 320 bytes, and 640 bytes. Power-of-two-multiples were chosen to have a good balance of memory efficiency, performance, and being able to support large sizes. Using too many fine-grained sizes would mean frequent re-allocation, which degrades performance, and using too few sizes would mean inefficiency in memory usage.
Since Variable Width Allocation was introduced incrementally, not all object types support it, so in order to improve space efficiency for these types, the first size pool is 40 bytes in size, which perfectly fits objects that do not support this feature.
rb_objspace_t
struct
The rb_objspace_t
struct contains all the metadata about the garbage collector, including the size pools, internal garbage collector data, state of the garbage collector, and various statistics.
Allocating objects
Objects are allocated using the NEWOBJ_OF
macros. Information such as the type, class, and size of the object are passed into the garbage collector.
These macros call the garbage collector’s internal functions that allocate and initialize objects. The heart of object allocation in the garbage collector is the newobj_of0
function. What’s important to note in this function is that there are two allocation paths: the fast path and the slow path.
Fast path
In order to speed up allocation speed, there is a cache in each Ractor containing a linked list of free slots. By caching some free slots, slots can be allocated in parallel in different Ractors without the risk of race conditions, which eliminates the need to perform synchronization. When we need to allocate an object, a slot is removed from the head of the linked list. However, the cache will eventually run out of slots, which means we need to enter the slow path.
Slow path
When we run out of slots in the cache, we need to switch to the slow path. This is called the slow path because it requires locking the Ruby virtual machine, which means that only one Ractor can enter the slow path at a time. The slow path first looks at the linked list of heap pages with free slots. If there are such heap pages, it will remove the first one and move the linked list of free slots from the heap page to the Ractor cache.
If there aren’t any heap pages with empty slots, then it first checks if it’s allowed to allocate a new heap page. Every size pool has a allocatable_pages
counter which tracks the number of pages a size pool is allowed to allocate before it is required to trigger a new garbage collection cycle. allocatable_pages
is calculated using heuristics after major garbage collection cycles (which will be discussed later). However, once allocatable_pages
reaches 0, a new garbage collection cycle is triggered, which will be discussed in the following section.
Garbage collection lifecycle
The garbage collector is split into two phases: it first marks, then sweeps (and compacts, if it’s enabled).
In the marking phase, living objects are marked. Then, during the sweeping phase, the unmarked slots are freed and added to the linked list of free slots on the page.
Ruby uses a garbage collector known as a “stop-the-world” garbage collector, meaning all execution of Ruby code has to be paused while the garbage collector executes. This can cause long pauses in the Ruby code when the garbage collector is running, especially in a large Rails app, for example. Ruby solves this problem using two solutions, incrementally marking objects, and using a generational garbage collector, both discussed in later sections.
Marking phase
The marking phase traverses live Ruby objects by recursively marking children of live Ruby objects starting from root objects. Objects are marked using three colors: white, grey, and black. At a high level, the tri-color marking algorithm works as follows:
- All objects are initially white.
- Mark root objects as grey.
- While there is another object O that is grey:
- Mark O as black.
- Mark every child C of O as grey.
At the end of the marking phase, there are only black and white objects. All objects that are white are dead and can be reclaimed by the garbage collector.
graph TD classDef markedNode fill:black, stroke:white, color:white classDef markingNode fill:grey, stroke:black, color:black classDef unmarkedNode fill:white, stroke:black, color:black root --> A root --> B A --> C A --> D C --> E root[Root objects] style root fill:LightBlue, stroke:DarkBlue A("Object A (marked)") class A markedNode B("Object B (marked)") class B markedNode C("Object C (marking)") class C markingNode D("Object D (marking)") class D markingNode E("Object E (unmarked)") class E unmarkedNode F("Object F (unmarked)") class F unmarkedNode
In the figure above, objects A and B have their children marked, so they are marked black. Objects C and D have not yet been marked, so they are marked grey. And object E has not yet been marked, so it’s white. We also have object F that is not referenced by any Ruby objects, so it won’t be marked and will remain white until the end of the marking process, so it will be reclaimed during the sweeping phase.
Generational garbage collector
Generational garbage collectors are based on the weak generational hypothesis theory, which states that objects often die young, and objects that have been alive for a long time are likely to remain alive. Generational garbage collectors keep track of the number of garbage collection cycles a particular object has survived, and when this number exceeds a threshold, the object is promoted to the old generation. In Ruby’s garbage collector, an object is required to survive three garbage collection cycles before it is promoted to the old generation. To take advantage of this property to improve performance, we have two types of garbage collection cycles: minor and major.
Minor garbage collection cycles mark only the young objects, thus it can run faster by skipping old objects.
Major garbage collection cycles mark both the young and old objects, thus it can free more memory.
graph TD classDef oldNode fill:LightPink, stroke:Red classDef youngNode fill:Yellow, stroke:Orange root -- Traverse --> A root -- Traverse --> B A -. Ignore .-> C C -. Ignore .-> E B -- Traverse --> D root[Root objects] style root fill:LightBlue, stroke:DarkBlue A("Object A (old)") class A oldNode B("Object B (young)") class B youngNode C("Object C (old)") class C oldNode D("Object D (old)") class D oldNode E("Object E (old)") class E oldNode
In the figure above, in a minor garbage collection cycle, none of the descendants of object A will be traversed during marking.
The number of generations an object has survived is stored in the flags in the RBasic
struct. We can get the age of the object using RVALUE_FLAGS_AGE
and RVALUE_OLD_P_RAW
which returns the age and whether an object is old, respectively.
Write barrier
What if we added a child (that is a young object) to an old object? Then during minor garbage collection cycle the young object would not be marked and thus will be swept. This would be a really bad bug!
Here’s a diagram to illustrate the bug.
graph TD classDef oldNode fill:LightPink, stroke:Red classDef youngNode fill:Yellow, stroke:Orange root -- Traverse --> A root -- Traverse --> B A -. Ignore .-> C C -. Ignore .-> E C -. Ignore .-> F B -- Traverse --> D root[Root objects] style root fill:LightBlue, stroke:DarkBlue A("Object A (old)") class A oldNode B("Object B (young)") class B youngNode C("Object C (old)") class C oldNode D("Object D (old)") class D oldNode E("Object E (old)") class E oldNode F("Object F (young)") class F youngNode
In the diagram above, since objects A and C are old, they are not traversed during marking. However, object F is a young descendant of object C. So in a minor garbage collection cycle, object F won’t be marked so it will be swept.
To solve this issue, we introduce the concept of a write barrier. A write barrier is a callback that is invoked to let the garbage collector know whenever a reference is added to another object. When the write barrier is called with an old parent object and a young child object, the old parent object is flagged and placed in the remember set. The remember set is a list of objects that will be marked in the next minor garbage collection cycle. The remember set is cleared after each garbage collection cycle. The garbage collector will immediately promote all young descendant objects in the remember set to the old generation.
graph TD subgraph root[" "] rootTemp[Root objects] rememberSet[Remember set] style rootTemp fill:LightBlue, stroke:LightBlue end style root fill:LightBlue, stroke:DarkBlue classDef oldNode fill:LightPink, stroke:Red classDef youngNode fill:Yellow, stroke:Orange root -- Traverse --> A root -- Traverse --> B A -. Ignore .-> C C -. Ignore .-> E C -- Traverse --> F B -- Traverse --> D rememberSet -- Traverse --> C root[Root objects] A("Object A (old)") class A oldNode B("Object B (young)") class B youngNode C("Object C (old)") class C oldNode D("Object D (old)") class D oldNode E("Object E (old)") class E oldNode F("Object F (young)") class F youngNode
Generational garbage collection was introduced in Ruby 2.1. Since not all object types support write barriers, in order to introduce the feature gradually and to preserve backward compatibility with old native extensions, Ruby supports objects that don’t have a write barrier, which are called “write barrier unprotected” objects (and objects that support write barriers are called “write barrier protected” objects). Whenever an old write barrier protected object references a write barrier unprotected object, the parent object (old write barrier protected object) is added to the remember set. We add it to the remember set because without a write barrier, we don’t know when new references are added to that object.
If you’re interested in more details, Koichi Sasada (the author of write barriers) has an excellent paper about implementing gradual write barriers.
Incremental marking
Ruby 2.2 introduced the incremental garbage collector. Koichi Sasada (the author of incremental marking) has an excellent article explaining how the incremental garbage collector works, I will attempt to summarize it below.
The main idea of incremental marking is to not run the marking phase all at once, but divide it into chunks that can be interleaved with the execution of Ruby code. Since Ruby has a stop-the-world garbage collector (i.e., no Ruby code can run while the garbage collector is running), splitting the marking phase can significantly reduce the stutter caused by the garbage collector’s execution.
Incremental marking marks only a certain number of slots before continuing execution of Ruby code. So during the execution of Ruby code, some objects may be marked black, some may be marked grey, while some may have not yet been marked at all.
Incremental marking takes advantage of the write barrier to know when a reference from a black object is added to a white object. The write barrier will then mark the white object as grey. For all other cases, we don’t need to do anything since if the parent object is grey or white, its references have not yet been traversed, and if the child object is grey or black, then it has already been marked.
However since not all objects support write barriers, incremental marking has the additional overhead of needing to mark all marked write barrier unprotected objects at the end.
Incremental marking only occurs during major garbage collection cycles because minor garbage collection cycles are relatively fast since it does not traverse old objects and adding incremental marking would introduce additional overhead.
Sweeping phase
After the marking phase, all live objects are marked black and all dead objects are unmarked. The sweeping phase then iterates over each slot of every page and frees the unmarked objects. The object is freed in the obj_free
function which performs operations like freeing memory allocated from the system or closing file descriptors. The freed slots are then placed in the linked list of free slots in that page.
Lazy sweeping
Sweeping all of the dead objects could potentially cause a lengthy pause in Ruby code execution. Similar to how marking can be done incrementally in several steps to reduce the consecutive time spent in the garbage collector, lazy sweeping divides the sweeping process into numerous short steps, with each step sweeping 2048 slots. This process is performed lazily. That is, when there’s a need to allocate an object but no free slots are available, we perform a sweeping step before allocating the object.
Compaction phase
As our Ruby program allocates and deallocates objects, it’s inevitable that the heap memory (i.e., slots inside pages) have “holes”, as some objects are longer lived while others are swept. This is known as fragmentation, where we have many free slots between live objects. Fragmentation increases memory usage as it prevents pages that are mostly empty from being freed. Having a lot of pages also slows down both the garbage collector and regular execution of code. It’s pretty obvious that it would slow down the garbage collector, as both the marking and sweeping phases iterate over the pages. Fragmentation could also slow down the execution of Ruby code, as CPU caches become less effective due to caching more empty slots rather than useful data. If we also had some way of moving related objects closer to one another, we might also be able to improve performance by increasing cache hits (this is not currently being done in the Ruby garbage collector).
The compaction phase, introduced in Ruby 2.7, attempts to solve this problem by moving Ruby objects across different heap pages.
The Variable Width Allocation feature, introduced in Ruby 3.2, also takes advantage of the compaction phase to resize objects by moving objects across different size pools, placing objects in the most optimal slot size to improve runtime performance and space efficiency.
Compaction is currently an opt-in feature that can be manually triggered by calling GC.compact
or can be enabled to automatically run during major garbage collection cycles by setting GC.auto_compact = true
.
Pinning objects
Certain objects cannot be moved for various reasons. One reason is that compaction is a relatively new feature, so there are many types in Ruby and in gems with native extensions that don’t support compaction. We also cannot move objects that are currently being used and are on the stack. These are known as pinned objects. If we moved pinned objects, then there may be pointers to broken or to different objects, which can cause bugs and crashes!
This issue was addressed by changing the original C API function for marking Ruby objects, rb_gc_mark
, to both mark and pin the object. A new function, rb_gc_mark_movable
, was introduced for types that do support compaction. This way, object types that don’t support compaction will continue to work and new types can be gradually introduced to support compaction.
Algorithm
To illustrate the compaction algorithm, I will be using the following heap as an example. This is a simplified example, as it only shows two size pools. The arrows are references between objects. For example, there are references from objects A
and B
to object F
.
Marking for major garbage collection cycle
Before we can do compaction, we must first do the marking phase of a major garbage collection cycle. This will mark all of the live objects and pin objects that cannot be moved.
Object movement phase
The compactor uses a two finger compaction algorithm. As the name suggests, we have two cursors (or fingers), one cursor that moves from the right to the left called the scan cursor that looks for live objects and a cursor that moves from the left to the right called the free cursor that looks for free slots.
We iterate over every size pool and attempt to move one object at the scan cursor from each size pool. (Technically, the implementation moves a whole page of objects from each size pool at a time for performance reasons, but this detail is not crucial for explaining the algorithm.) For each object we attempt to move, we analyze the optimal size and attempt to move it to the free cursor of that size pool. However, if the size pool it tries to move to has already completed compaction, then we don’t move the object.
We only move one object from each size pool at a time (rather than moving all objects from one size pool and then moving to the next size pool) in order to ensure that all objects have an equal chance of moving to their optimal size. If we started moving all objects from the smallest size pool, then we would have a bias to upsize objects (since smaller objects generally want to be larger). On the other hand, if we started moving all objects from the largest size pool, then we would have a bias to downsize objects (since larger objects generally want to be smaller). But you might wonder, can’t we remove this bias by adding pages to a size pool during compaction when it’s full? You’d be right, however it would defeat one of the purposes of compaction, which is to reduce memory usage, not to increase it.
After we move an object, we leave a T_MOVED
at the original slot of the object. This is called a forwarding address and is used in the reference update phase to look up the new address of the object.
Once the scan and free cursors meet, then object movement is complete for the size pool. Once the scan and free cursors meet in all size pools, then the movement phase is complete.
In Ruby pseudocode, the algorithm looks like this (of course, the real version is implemented in C).
# Continue looping until all size pools have finished compacting.
while !size_pool.all?(&:finished_compacting?)
# Loop through the size pools
size_pools.each do |size_pool|
# Skip this size pool if it has finished compacting (i.e., the scan and
# free cursors have met).
next if size_pool.finished_compacting?
# Move the scan cursor forwards.
obj = size_pool.next_live_object
# Get the optimal size for this object.
optimal_size = obj.optimal_size
# Find the size pool that fits the destination size.
dest_size_pool = size_pools.find { |s| optimal_size <= s.slot_size }
# Move the free cursor forwards for the destination size pool.
dest = dest_size_pool.next_free_slot
# Skip moving this object if the destination size pool has finished
# compacting.
next unless dest
# Write obj into dest.
dest.write(obj)
# Leave a T_MOVED forwarding object in obj that points to dest.
obj.write(T_MOVED.new(dest))
end
end
Here’s an animation that illustrates the object movement phase on an example heap.
Reference update phase
After compaction, we traverse all the objects in the heap and update their references (pointers) that have moved (the pointers that point to a T_MOVED
) and update the pointer to point directly to the moved object.
Finish sweeping
At this point, everything to the left of the free cursor are guaranteed to be live objects (since we filled all empty slots to the left of the free cursor with live objects). The heap is still unswept to the right of the free cursor, and contains a mix of live, dead, and T_MOVED
objects. We then start sweeping from the free cursor and reclaim the dead and T_MOVED
objects.
Conclusion
That’s all folks! To recap, in this blog post, we looked at some of the important data structures in the garbage collector, we discussed how Ruby objects are represented and created, and we also discussed the various phases in a garbage collection cycle (marking, sweeping, and compaction).