Preventing Flaky Tests With A Custom RuboCop Rule
In this blog post, I will talk about how I found and fixed a flaky test in our RSpec test suite, and how I wrote a custom RuboCop rule to prevent it from happening again.
DocSpring uses Stripe to handle credit card processing. We have a large test suite that covers lots of different billing flows and edge cases. We use Stripe's billing customer portal, which allows customers to update their credit card details and view invoices. Stripe hosts this portal for us, so we don't have to build all of this functionality ourselves.
I wrote an integration test that makes sure we can generate a link to this billing portal page. This test was "flaky", which means that it was randomly failing from time to time. Flaky tests can be really annoying, especially when they only fail once every few weeks. It's very difficult to fix something if you don't know how to make it happen consistently.
I figured out that this test failure was order-dependent, which means that it was only failing whenever another test ran before it. We run our RSpec tests in a random order, which helps us find issues like this. The problem was happening because we use the stripe-ruby-mock gem to mock requests to Stripe's API. This mocking behavior is enabled by calling StripeMock.start
, and turned off by calling StripeMock.stop
As you might have guessed by now, we had a test that wasn't tidying up after itself by calling StripeMock.stop
.
The stripe-ruby-mock docs suggest using it like this:
describe MyApp do
let(:stripe_helper) { StripeMock.create_test_helper }
before { StripeMock.start }
after { StripeMock.stop }
I had started with this approach, but it looks like I had accidentally moved the StripeMock.stop
call into a nested block. This means that only some of the tests in this spec file were calling StripeMock.stop
. If we got unlucky and tests were run in a different order, we'd end up with StripeMock
enabled for the rest of our test suite. This didn't actually cause too many problems, apart from this one billing portal test. This test makes a live request to Stripe's API, which we record/replay using VCR.
If you look at our test failure logs in CircleCI above, you'll notice StripeMock complaining that it doesn't know about the post /v1/billing_portal/sessions
API endpoint. This is how I figured out the issue, because I realized that StripeMock shouldn't have been running at all.
The solution was to make sure that our tests always call StripeMock.stop
whenever they call StripeMock.start
. But maybe there's a better way to do this, instead of using separate before
and after
blocks?
We can actually group these calls together by using an around
block, like this:
around(:each) do |ex|
StripeMock.start
ex.run
StripeMock.stop
end
This around
block will start StripeMock, run the test, and then stop StripeMock once the test is finished. This convention keeps the #start
and #stop
calls right next to each other, so it helps to prevent us from accidentally removing the #stop
call or moving it somewhere else during a refactor.
Now that I've fixed the flaky test, I also want to make sure that it never happens again. I want to prevent myself (and other developers on our team) from calling StripeMock.start
in a before
block, and only allow it inside an around
block.
We use RuboCop, which is a Ruby static code analyzer (a.k.a. linter
) and code formatter. RuboCop comes with many cops that can check your code for style, formatting, security issues, code complexity, among many other things. Many gems also have their own set of RuboCop rules. We currently use the RuboCop extensions for:
- Rails
- Rake
- RSpec
- Capybara
- FactoryBot
I decided to try writing a custom RuboCop rule (or "cop") that would scan our RSpec test files for StripeMock.start
. I would like it to warn us whenever we use it in a before
block instead of an around
block.
Here's the code I wrote for our custom cop (with some help from ChatGPT!) I saved this file at lib/rubocop/cop/stripe_mock_usage.rb
# frozen_string_literal: true
require 'rubocop'
# This cop checks that StripeMock.start and StripeMock.stop are only used within an around block (instead of before/after.)
# This prevents us from accidentally starting StripeMock in a before(:all) or before(:each) block and
# forgetting to stop it, which can cause flaky tests.
module RuboCop
module Cop
module RSpec
class StripeMockUsage < Cop
MSG = 'Only use `StripeMock.start` and `StripeMock.stop` within an `around(:each)` block.'
def_node_search :stripe_mock_start?, <<~PATTERN
(send (const nil? :StripeMock) :start)
PATTERN
def_node_search :stripe_mock_stop?, <<~PATTERN
(send (const nil? :StripeMock) :stop)
PATTERN
def on_block(node)
return unless %i[before after].include? node.send_node.method_name
# Search within the before/after block for StripeMock.start and StripeMock.stop calls
node.each_descendant(:send) do |send_node|
if stripe_mock_start?(send_node) || stripe_mock_stop?(send_node)
add_offense(send_node, message: MSG)
end
end
end
end
end
end
end
See the "Development" page in the RuboCop docs for more information on writing new cops.
I then enabled this custom cop by adding the following to our .rubocop.yml
:
require:
- ./lib/rubocop/cop/stripe_mock_usage.rb
RSpec/StripeMockUsage:
Enabled: true
Include:
- 'spec/**/*_spec.rb'
Now we get a warning whenever we call StripeMock.start
or StripeMock.stop
in a before
or after
block:
It's not a perfect check, but I think this could be a nice reminder. My personal philosophy is that I try to fully stamp out any flaky tests, bugs, or security issues so that they can never happen again. I haven't truly fixed an issue unless I've also added a new test or linting rule.
I enjoyed learning more about the Ruby AST and how RuboCop works internally. I have a few other ideas for cops that might be helpful, so I might be writing some more custom cops in the future. Thanks for reading!