The Ruby Inplace Bug

It is strongly recommended that you read my article on the ways to create Ruby strings in C extensions if you’re not familiar with Ruby’s C string API

Here’s a story of string corruption in MRI when the wrong function is used to create the Ruby string. But before that, let me explain a few Ruby features that you may not know.

Ruby features that you may not know

Editing a file in place

Consider the following script:

while gets
  puts $_.gsub(/perl/, "ruby")
end

And then we run:

$ echo "I like perl, it is my favourite language." > temp.txt

$ ruby -i script.rb temp.txt

$ cat temp.txt
I like ruby, it is my favourite language.

As you can see, with the -i flag we can read the text file line-by-line through gets, and replace that line with whatever is in the standard output (the puts).

Backup files

Let’s run the script above again, but instead of the -i we pass to Ruby, let’s pass -i.bak:

$ echo "I like perl, it is my favourite language." > temp.txt

$ ruby -i.bak script.rb temp.txt

$ cat temp.txt
I like ruby, it is my favourite language.

$ cat temp.txt.bak
I like perl, it is my favourite language.

All this does is create a backup file (with the extension .bak in our case) with the original contents before modifying it.

Run the Ruby script line-by-line

Did you know we can do the above, but with a single line of Ruby code? Consider the following Ruby script:

$_.gsub!(/perl/, "ruby")

And then we run it as the following (notice we are running the script with an extra -p flag):

$ echo "I like perl, it is my favourite language." > temp.txt

$ ruby -pi.bak script.rb temp.txt

$ cat temp.txt
I like ruby, it is my favourite language.

$ cat temp.txt.bak
I like perl, it is my favourite language.

The -p flag conviniently wraps your code around an implicit while gets(); ...; puts $_; end.

BEGIN blocks

One question you might ask yourself is, if I use the -p flag, how do I do global setup? Ruby’s got a feature for that! Enter BEGIN blocks.

Note: Ruby’s keywords are case sensitive, so BEGIN is not the same as begin.

So now, if we do:

BEGIN {
  puts "It is starting!"
}

$_.gsub!(/perl/, "ruby")

And then we can run it as follows:

$ echo "I like perl, it is my favourite language." > temp.txt

$ ruby -pi.bak script.rb temp.txt
It is starting!

$ cat temp.txt
I like ruby, it is my favourite language.

$ cat temp.txt.bak
I like perl, it is my favourite language.

Notice we get the It is starting! output in the terminal, and not in the file.

Shebang

We want to minimize the number of characters we have to type. So we can omit the flags from the terminal and instead include it in the file using a shebang line.

#!/usr/bin/ruby -pi.bak

BEGIN {
  puts "It is starting!"
}

$_.gsub!(/perl/, "ruby")

And then we can run it as follows:

$ echo "I like perl, it is my favourite language." > temp.txt

$ ruby script.rb temp.txt
It is starting!

$ cat temp.txt
I like ruby, it is my favourite language.

$ cat temp.txt.bak
I like perl, it is my favourite language.

In fact, Ruby doesn’t actually check that the binary in the shebang is valid, just that it ends in ruby. So the following shebang would have worked too:

#!/some/invalid/dir/ruby -pi.bak

The Ruby inplace bug

Ok, we finally have everything we need to reproduce the bug. Consider the following script:

#!/usr/bin/ruby -pi.bak

BEGIN {
  GC.start(full_mark: true)
  arr = []
  1000000.times do |x|
    arr << "fooo#{x}"
  end
}

$_.gsub!(/perl/, "ruby")

So what do we expect it to do? Well, this is pretty much the same script as the one in example #5 (while doing some seemingly useless work in the BEGIN block).

So let’s run this script:

$ echo "I like perl, it is my favourite language." > temp.txt

$ ruby script.rb temp.txt

$ cat temp.txt
I like ruby, it is my favourite language.

$ cat temp.txt.bak
cat: temp.txt.bak: No such file or directory

$ ls
script.rb
temp.txt
temp.txto106

Wait what!?!? Where is our backup file temp.txt.bak? And what is temp.txto106? Let’s inspect these files.

$ cat temp.txt
I like ruby, it is my favourite language.

$ cat temp.txto106
I like perl, it is my favourite language.

It seems like temp.txto106 contains the original file (the one that should have been the backup file). So what’s going on? You might have guessed it, because the wrong C function was used to create Ruby strings!

So, what’s wrong?

When you pass the flag -i.bak to Ruby, it parses the arguments in your shebang from a Ruby string read from the contents of the Ruby script1. We then set the extension of the backup file by calling ruby_set_inplace_mode. This function is really simple, it’s defined like this.

void
ruby_set_inplace_mode(const char *suffix)
{
    ARGF.inplace = !suffix ? Qfalse : !*suffix ? Qnil : rb_fstring_cstr(suffix);
}

The problem here is the usage of rb_fstring_cstr, which (similar to rb_str_new_static) expects a pointer to a region of memory that is not free‘d before this Ruby string is swept (because it sets the pointer of the Ruby string directly to the string passed in).

To explain using diagrams (don’t we all love diagrams?), we have a Ruby string of the flags in the shebang stored in argv. We then call ruby_set_inplace_mode and set ARGF.inplace to a Ruby string that points to .bak.

But then when we no longer need argv, but ARGF.inplace is still pointing to the string that argv was referencing.

So then we call GC.start which guarantees that argv is swept, and then we create a large amount of fooo#{x} strings. This helps us make sure the string -pi.bak is overwritten and so we get ARGF.inplace pointing to some other value. So then once the script terminates, the backup extension we use is o106 (or some other gibberish).

The fix

The fix only changes one line and a few characters!2 The solution is instead of calling rb_fstring_cstr to create the string, we use rb_str_new, which allocates memory for the string and copies over the contents of the string. Once we do that, we no longer need to ensure that the original string is not free‘d.

Acknowledgements

I would like to acknowledge Matt Valentine-House as the co-discoverer of the bug and the co-author of the fix.