Day 3 in the #vDM30in30
It all started with a hanging build…
https://travis-ci.org/voxpupuli/puppet-nginx/builds/166701701
We were having an issue with a hanging build with mocked tests for Facter in the Nginx module for Vox Pupuli. Since I was the last person who touched it, I took responsibility for the issue and dug in.
The code was a basic unit test for checking Ruby code to determine the version of nginx installed:
require 'spec_helper'
describe Facter::Util::Fact do
before { Facter.clear }
describe 'nginx_version' do
before do
Facter::Util::Resolution.stubs(:exec)
end
context 'neither nginx or openresty in path' do
before do
Facter::Util::Resolution.stubs(:which).with('nginx').returns(false)
Facter::Util::Resolution.stubs(:which).with('openresty').returns(false)
end
it { expect(Facter.fact(:nginx_version).value).to eq(nil) }
end
context 'nginx in path' do
context 'with current version output format' do
before do
Facter::Util::Resolution.stubs(:which).with('nginx').returns(true)
Facter::Util::Resolution.stubs(:exec).with('nginx -v 2>&1').returns('nginx version: nginx/1.8.1')
end
it { expect(Facter.fact(:nginx_version).value).to eq('1.8.1') }
end
context 'with old version output format' do
before do
Facter::Util::Resolution.stubs(:which).with('nginx').returns(true)
Facter::Util::Resolution.stubs(:exec).with('nginx -v 2>&1').returns('nginx: nginx version: nginx/0.7.0')
end
it { expect(Facter.fact(:nginx_version).value).to eq('0.7.0') }
end
end
context 'openresty in path' do
context 'with current version output format' do
before do
Facter::Util::Resolution.stubs(:which).with('nginx').returns(false)
Facter::Util::Resolution.stubs(:which).with('openresty').returns(true)
Facter::Util::Resolution.stubs(:exec).with('openresty -v 2>&1').returns('nginx version: openresty/1.11.2.1')
end
it { expect(Facter.fact(:nginx_version).value).to eq('1.11.2.1') }
end
context 'with old version output format' do # rubocop:disable RSpec/EmptyExampleGroup
# Openresty never used the old format as far as I can find, no point testing
end
end
end
end
However, I couldn’t reproduce it locally. Regardless of Ruby version, it seemed fine for me.
Eventually I realised the gem had been updated, and as the gem wasn’t pinned in the Gemfile and Travis installs gems from a clean state. There had been a recent release in v1.2.0
that caused the failure.
However, it didn’t look like an issue with our code. The test was written in the same way as all the tests I’d written. In fact, it looked like a bug with the mocking tool itself: mocha.
So I did a bit of digging. I’d seen a blog post about ruby-stacktrace, a Rust tool that allows you to do a stacktrace on a Ruby process. Seemed like a perfect fit. With a bit of debugging with ruby-stacktrace, I could see the following where the hang was happening in Ruby:
self | tot | method
100.0% | 33.3% | hide_original_method : /root/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/mocha-1.2.0/lib/mocha/class_method.rb
Looking through the code, mocha had changed the hide_original_method
in the 1.2.0 release. So looked like a mocha issue.
So I logged as much info as I could to the mocha ticket, and left it to the experts, ie. the Mocha author James Mead.
To my surprise, James ended up figuring out the issue: it was a live one, a real life bug in Ruby! And not just a small one, a biggie:
I can reproduce this. Seems like a infinite loop inside
ofrb_callable_method_entry_without_refinements
. I think this is a showstopper bug.
https://bugs.ruby-lang.org/issues/12832
Now identified, the issue was fixed in both Mocha and Ruby itself, and all was well.
Honestly, I didn’t understand the actual cause or issues why it happened, but I’m glad that I was a small part of helping fix a bug in Ruby!