How to use a Gemfile.initial file to speed up Docker builds with an extra cached layer

We recently wrote a blog post about a script that updates version strings in a Dockerfile. We've since identified a better way to achieve the same goals.

A Rails application will have gem dependencies listed in files called Gemfile and a Gemfile.lock. It can take a long time to install gem dependencies when you're building a Docker image. To speed this up, you can put these lines near the top of your Dockerfile:

COPY Gemfile Gemfile.lock ./
RUN bundle install

Docker will cache the bundle install step, and it will only re-run this step if there are any changes in Gemfile or Gemfile.lock.

However, you probably use some core gems that don't change too often, such as Rails, Rake, Nokogiri, etc. These do get updated from time to time, but probably not as often as you are adding or updating other dependencies. It would be great if you could cache these gems separately so that you don't have to download them every time your Gemfile.lock changes.

You can achieve this by adding a second Gemfile to your application, and call it something like Gemfile.initial. This file should only contain a few gems that you would like to cache independently. You just need to make sure that you set explicit versions for these gems, otherwise Bundler might install a different version.

The nice thing about a Gemfile is that it's just a Ruby file, so we can write some code to automate this and parse the versions from the original Gemfile.lock:

source 'https://rubygems.org'

INITIAL_CACHED_GEMS = %w[
  rails rake sprockets mysql2 bootsnap nokogiri newrelic_rpm rack-cors
].freeze

lock_file = [
  File.join(__dir__, 'Gemfile.lock'),
  File.join(__dir__, 'Gemfile.initial.lock')
].find {|f| File.file? f }
lockfile_parser = Bundler::LockfileParser.new(Bundler.read_file(lock_file))

INITIAL_CACHED_GEMS.each do |gem_name|
  spec = lockfile_parser.specs.find { |s| s.name == gem_name }
  gem_version = spec.version.to_s
  gem gem_name, gem_version
end

This might not look like any Gemfile you've seen before, but rest assured that it's a valid Gemfile, and you can use it with bundle install. The INITIAL_CACHED_GEMS constant is an array of gem names that we would like to cache first. This code will parse the Gemfile.lock file to find the pinned versions of each gem, and set this as an explicit version requirement. Then we can be sure that Gemfile and Gemfile.initial will install exactly the same versions of gems.

Here's how we can put this all together in an updated Dockerfile:

COPY Gemfile.initial Gemfile.initial.lock ./
RUN bundle install --gemfile Gemfile.initial

COPY Gemfile Gemfile.lock ./
RUN bundle install

-

Limitations and Next Steps

The main downside to this approach is that you will need to remember to run bundle install --gemfile Gemfile.initial to generate a new Gemfile.initial.lock whenever you change one of the gem versions. You could add a CI job to check for this and fail the build if you need to update the file (or even automatically commit and push the changes to the branch.)

We may be able to write a Bundler plugin that will take care of all of these steps. It would be great if there was a Bundler command that could install a subset of your gems and use/generate a separate lock file.

If you have any thoughts or comments, please feel free to send us an email: engineering@docspring.com