Fast Docker Builds for Rails and Webpack
This blog post demonstrates how to optimize your Docker builds for Rails and webpack. The demo app uses react_on_rails, but most of these techniques can be used with a plain Rails application. I’ve also included some tips for Dockerizing a Rails application.
Disclaimer: This is a very complex setup, and includes a lot of optimizations. Some of these are optional (e.g. creating small diff layers), or they might not be relevant to your application (e.g. webpack.) But check out the examples and see if you can use some of these ideas to speed up your Docker builds and create smaller images.
Overview
- Use Docker’s cached layers for gems, npm packages, and assets, if the relevant files have not been modified (
Gemfile
,Gemfile.lock
,package.json
, etc.) - Use a multi-stage build so that Rails assets and the webpack build are cached independently.
- e.g. Changing assets doesn’t invalidate the webpack layers, and vice versa.
- Use the webpack DllPlugin to split the main dependencies into a separate file. This means that we only need to compile the main libraries once (e.g. React, Redux)
- I used a separate
package.json
to take advantage of Docker’s caching. - If there are any changes to
Gemfile
orpackage.json
, re-use the gems and packages from the first build. (Don’t download everything from scratch.) - If there are any changes to assets, re-use the assets and cache from the first build.
- Only include necessary files in the final image.
- A production Rails app doesn’t need any files in
app/assets
,node_modules
, or front-end source code. A lot of gems also have some unnecessary files that can be safely removed (e.g.spec/
,test/
,README.md
) - Include the bootsnap cache in the final image, to speed up server boot and rake tasks.
- After building a new image, create a small “diff layer” between the new image and the previous image. This layer only includes the changed files.
- Create a nested sequence of diff layers, but reset the sequence if there are too many layers (> 30), or if the diff layers add up to more than 80% of the base layer’s size.
Background
DocSpring was launched on Heroku, but I migrated to AWS to use some free AWS credits. I decided to use Convox, which is like Heroku for your own AWS account. When you deploy an app, Convox builds a Docker image and uses ECS to launch containers on your EC2 instances. I started with a simple Dockerfile
that had some basic caching, but deploys were often taking a long time. Every time I changed the Gemfile
or package.json
, it would have to download and compile all the gems and npm packages from scratch. The assets:precompile
task was also very slow, because it wasn’t using any cached assets from previous builds.
Heroku caches the results of bundle install
and rake assets:precompile
, so you don’t have to run these from scratch every time. This is also true when you’re using Capistrano or Chef to deploy your application to a VM.
I really wanted to use Convox to manage my AWS infrastructure, so Dockerizing my application was a requirement. The Convox CLI is incredibly easy to use, and I can set up a new staging environment from scratch by running a few commands. Docker also has some real advantages, such as being able to run the same environment on my local machine, CI, staging, and production. I run my tests on GitLab CI, and it uses the same base Docker image as my production servers. I can also run a CI build on my local machine with gitlab-runner exec docker rspec
.
However, Docker can be tricky to work with. Setting up a cluster is more complicated than just deploying your app to a VM. If you’re not careful, you might end up creating a Docker image with too many layers, duplicated files, or unnecessary files. I had to take some extra steps to get proper caching for gems, npm packages, and assets. And Docker’s ADD
and COPY
commands don’t check for changed files, so it’s difficult to create a new layer that only includes the changed files.
Build Scripts and Dockerfiles
I’ve created a rails_docker_example repo that contains some example build scripts and Dockerfiles. I used shakacode/react-webpack-rails-tutorial as the demo Rails app, so this example supports react_on_rails
and webpack
.
The main idea is that you build Docker images on your local machine, and use some intermediate images as a cache to speed up your builds. (This approach might not work so well if you build your images on a CI server, etc.)
Scripts:
Dockerfiles:
- Dockerfile.ruby-node
- Dockerfile.base
- Dockerfile.app _(for react_onrails / webpack)
- Dockerfile.app-simple (for a plain Rails app)
Image Tags
The build_app
script uses the following Docker tags:
demoapp/ruby-node:latest
Contains specific versions of Ruby, Node.js, and Yarn.
(I started by using some ruby-node
images from Docker Hub, but I prefer to have full control over the versions.)
demoapp/base:latest
Based on demoapp/ruby-node
. Installs Linux packages, such as build-essential
, postgresql-client
, and rsync
. It also sets up some directories and environment variables.
demoapp/app:base-webpack-build
The base image for the webpack build. The initial build uses demoapp/base
as the base image, and then tags the resulting image with demoapp/app:base-webpack-build
. All the subsequent builds use this first build as the base image. We only set the base-webpack-build
once and don’t update it very often, because if it keeps changing then Docker can’t cache any layers.
demoapp/app:base-assets-build
The base image for the assets build. The first build will use demoapp/base:latest
as the base image (a clean slate), and the following builds will use the first build as the base image.
demoapp/app:latest-webpack-build
and :latest-assets-build
The most recent build environments. We run docker build
multiple times, targeting different stages in Dockerfile.app
. If you change a lot of gems or npm packages, you can update the base images to point to these tags: docker tag demoapp/app:latest-assets-build demoapp/app:base-assets-build
Next time you change the Gemfile, you won’t have to install as many gems. But don’t update this too often, because most of the time you’ll be using Docker’s cached layers.
demoapp/app:current
The in-progress production build that contains the final squashed layer. We don’t override the demoapp/app:latest
tag immediately, because the last step is to produce a small diff layer between demoapp/app:latest
and demoapp/app:current
demoapp/app:latest
This is the final production image after running ./scripts/build_app
. It will contain a small diff layer between the current image and the most recent image.
Dockerization Tips
Where Should I Put My Code?
For a Docker image, you should put your code in /app
. This is also the directory that Heroku uses.
I’ve worked with quite a few teams, and so far I’ve deployed code to: /var/www/myapp
, /srv/www/myapp
, /usr/src/myapp
, /opt/myapp
, /home/deploy/myapp
, /home/deploy/src/myapp
, and probably a few others. It’s always a bit annoying when you don’t know where to put something and everyone does it differently. So you should use the /app
directory in your Docker images.
Rails Assets
Avoid ERB templates
Don’t use application.js.erb
. Instead, you should use a javascript_tag
in your layout or view, and use this to configure the behavior at runtime. It’s very helpful if you can use exactly the same Docker image for staging and production, and you should configure different behavior with environment variables.
Use Relative Paths in CSS
When I first Dockerized my application, I was setting config.asset_host
during builds. I’m using the image-url
helper, so the compiled assets would have hard-coded URLs that included the CloudFront domain. This made it impossible to use the same Docker image on staging and production.
Now I don’t set config.asset_host
during the build, and it is only set when running the container. This means that my CSS only has relative paths to images. It works fine as long as the initial CSS is loaded from the correct CloudFront domain.
Should I Run Nginx In Front Of Puma?
There’s usually no need to run Nginx if you run puma in clustered mode in front of a load balancer, and you’re using a CDN for assets. You should be fine to just run puma directly behind the load balancer.
If your assets are cached with CloudFront, then puma only needs to serve each asset once. The first request will be a bit slower than Nginx, but not by a significant amount.
Remove Unnecessary Files From Ruby Gems
I started thinking about more ways to make my images smaller, and I noticed that gems with native extensions were leaving some log files (224KB). I found a lot of other unnecessary files, such as spec
, test
, and docs
directories, README
, LICENCE
, .gitignore
, etc. These files added up to 23MB, which was 8% of all the files in vendor/bundle
(292MB.) At this point I felt like Lon Chaney Jr.1, but I was having fun and the cleanup step is very fast.
I didn’t investigate node_modules
, because my app doesn’t use any npm packages in production. It might be worth taking a look at node_modules
for a Node.js app.
Limitations Of This Docker Setup
This build script is very complex, so it might be difficult to debug any issues. You also have to be very careful about adding files at the right time in your Dockerfile
. For example, my Active Admin assets weren’t being compiled, because I forgot to add config/initializers/active_admin.rb
before running the assets:precompile
task.
The diff layer uses some Docker internals that might change in the future (it inspects a container to find the host directory of a volume.) Hopefully Docker will fix this issue so that we don’t need to use rsync
and docker commit
.
Another limitation is that you have to build images on your local machine. Convox normally packages up your source code and sends it to an EC2 instance, runs the build, then pushes the image to ECR. I’m using a custom build script with multiple steps, so I have to build the image locally and push it manually, and then tell Convox to use this image.
In practice, this has not been an issue for me. My laptop is a bit faster than my EC2 instances, and I’m pushing diff layers to ECR, so I’m sending less data than a standard Convox deploy. I also have locally cached layers that are used for every build. With Convox, you can either set up a dedicated build instance, or it will pick one of your existing instances to run the build. A dedicated build instance costs money, and picking an random instance means that it might have an empty Docker cache.
Conclusion
I like to invest a lot of time into something that I write once and use every day. I deploy new code very frequently, so even small improvements are usually worthwhile.
I hope this blog post has been helpful, and please open an issue or send a pull request to DocSpring/rails_docker_example if you have any feedback or suggestions!