Lessons in developing with Vagrant

At work, we use Vagrant to create and manage development environments. Once you’ve experienced working like this, you probably won’t want to work the ‘old’ way ever again (I’ve begun to transition my personal projects to using Vagrant now, too). Having each project come with its own complete, isolated environment allows any developer to pick it up and work on it immediately; it also makes deployment a part of the development process, as each provisioning run of the Vagrant VM is effectively a miniature deployment in itself. When the project is to be deployed to a staging or production environment, you can use much the same tried and tested processes to do so (we use Chef).

From what I’m hearing and reading, more and more developers are working like this, but it’s still not a particularly well-understood way of doing things. The usual way it’s done is that your editor and tools live on your host machine, you interact with the project code on the guest machine using a shared folder, and you run tests and other development processes on the guest machine via ssh.

Sounds pretty straightforward, but there are some subtle problems. One is filesystem events, which don’t work as they normally would.

Watching the filesystem

In most modern operating systems, there’s some kind of mechanism for a process to receive notifications when files change. In Linux it’s inotify, for example. These mechanisms allow processes to watch a set of files for changes without polling the filesystem (i.e. repeatedly checking the modification times of the watched files). Because they work natively, and without polling, they’re very fast.

In the standard Vagrant development setup, though, you’re making changes to the files on the host machine, but running the watching processes in the guest machine. And unfortunately that means that mechanisms like inotify don’t work.

If you use Guard, for example, to run tests in a Ruby app when files change, it uses Listen to watch the filesystem. Listen ships with adaptors for many different notification systems, including a fallback polling adaptor. Until quite recently, the only way to use Guard inside a Vagrant VM was to use the polling adaptor – which is very slow, and very resource-intensive. Polling the files in a decent-sized Rails app at an interval of 1 second will most likely pin the CPU of the guest machine; also, in my experience it just wasn’t reliable (changes often wouldn’t seem to be noticed, or would be noticed late).

If you’re using something like guard-rspec to do continuous TDD, for instance, then having to repeatedly nudge the filesystem to pick up changes, and wait several seconds for them to be picked up, becomes, well, painful. There’s a way round this, though: Listen and Guard provide a way to listen to filesystem events in one machine and forward the events to another machine over the network. I won’t describe this in detail, because it’s been done elsewhere.

There are a couple of niggling inconveniences with this solution, though. Firstly, it’s just cumbersome: You need to start a listen process on your host machine, then start a guard process on your guest machine, and then remember to shut them both down when you’re done. In a traditional setup you just run guard and away you go.

Secondly, the guard process needs to know the full path to the watched directory on the host machine, which means it’s hard to make the setup portable (it’s a near certainty that the path will be different for every developer on the project).

vagrant-triggers

Enter vagrant-triggers, which lets you do arbitrary stuff around the lifecycle events of a Vagrant VM. We can use this to start and stop listen on the host machine for us, which solves issue one. And we can set up some environment variables inside the guest machine to solve problem two. Let’s do that first.

# In Vagrantfile:
LISTEN_PORT = 4000
config.vm.network :forwarded_port, guest: LISTEN_PORT, host: LISTEN_PORT
config.vm.provision :shell, inline: <<-END
echo "export HOST_ROOT=#{File.dirname(__FILE__)}" > /etc/profile.d/host_root.sh
echo "export LISTEN_PORT=#{LISTEN_PORT}" > /etc/profile.d/listen_port.sh
END

That creates the environment variables HOST_ROOT and LISTEN_PORT in the guest machine, and forwards LISTEN_PORT to the guest. Next we create a couple of simple functions in the Vagrantfile:

def stop_listen_script
  listen_exe = `which listen`.chomp
  ps = `ps aux | grep '#{listen_exe}'`
    .match(/\w*\s+(\d+)\s+.*127\.0\.0\.1:#{LISTEN_PORT}/)
  if (pid = (ps && ps[1]))
    "kill #{pid}"
  else
    "true"
  end
end

def start_listen_script
  "listen -f 127.0.0.1:#{LISTEN_PORT} > /dev/null 2>&1 &"
end

start_listen_script starts listen and forwards change notifications to LISTEN_PORT; because we’ve forwarded that port to the guest machine, the guest machine will receive the notifications.

stop_listen_script checks for a running process in the host machine which matches the listen executable and arguments, and if it finds one, kills it. We need to do this so that Vagrant can run its lifecycle operations correctly, and so we don’t end up with lots of orphan listen processes.

Now we’re almost ready to create some triggers, but we need to make some additional gems available to Vagrant. Run the following in your host machine:

vagrant plugin install celluloid-io && vagrant plugin install thor

celluloid-io and thor are necessary for listen to work correctly when started as part of Vagrant’s lifecycle (interesting to note here that vagrant plugin install is just gem install in disguise – it can make arbitrary gems available to Vagrant).

Next we need to make sure listen is available on our host machine:

gem install listen

And finally install vagrant-triggers:

vagrant plugin install vagrant-triggers

Now we can create the following triggers in our Vagrantfile:

# Stop listen whenever we shut down or re-forward ports (a running listen
# will prevent port forwarding).
[:up, :resume, :suspend, :halt, :destroy, :reload].each do |cmd|
  config.trigger.before cmd do
    run stop_listen_script
  end
end

# Start listen when we start the machine
[:up, :resume, :reload].each do |cmd|
  config.trigger.after cmd do
    run start_listen_script
  end
end

That will start listen on the host when we bring our VM up, and stop listen when we take it down or cycle it. All we need now is a way to properly run guard inside the guest machine, picking up the correct watch directory and ports. We can do that in our project’s Rakefile:

namespace :guard do
  desc "Start guard, listening for changes on a given port at the default gateway"
  task :remote do
    cmd = "guard --clear -o '10.0.2.2:#{ENV['LISTEN_PORT']}' -w '#{ENV['HOST_ROOT']}'"
    system(cmd)
  end
end

That works thanks to the environment variables we pushed into the guest machine earlier. Note that I hardcode the IPs to 127.0.0.1 on the host and 10.0.2.2 on the guest, because they’re the defaults – you can change them, or make them configurable if you want. Now we can run Guard in the guest machine like so:

bundle exec rake guard:remote

Much better.

The front end

We don’t only develop Rails apps though. We also have a burgeoning front-end estate, and for that we use npm, Browserify and Karma, among other things. These all present their own issues. Firstly, as far as file-watching goes, we’re stuck with polling on the front end. To my knowledge, none of the JS filewatching solutions provide anything out-of-the-box like Listen’s network forwarding, so if you’re using Watchify to run incremental Browserify builds, make sure to pass the poll option (see the Watchify docs). Continuous testing with Karma defaults to polling automatically, as does live reloading with Browsersync. There is one big headache remaining though.

npm

When we started using npm and Browserify to build our front-end projects, we were, ah, dismayed by how long it took to run a complete npm install. The turnaround could be minutes – sometimes double figures of minutes – which made any change in dependencies agonising. To boot, it quite often hung or failed entirely. We entertained a few potential solutions (running a local caching server, adjusting the npm cache settings e.g.) before we noticed something odd.

nfs

A new front-end developer we’d taken on wasn’t using Vagrant, and had resisted switching to it. It turned out that his resistance was owing to how long it took for npm install runs to complete. Because on his host machine, they were fast. Where our runs would take 7 minutes, his took 40 seconds. So it was immediately apparent that the problem wasn’t just npm – it was Vagrant, too (or to be more accurate, VirtualBox).

We did a bit of research into what the problem could be, and it occurred to me that some time ago when I’d been trying to get Guard to work, I’d read about using nfs rather than the default VirtualBox filesystem to share folders between host and guest. Using nfs had caused more problems than it seemed to solve, so I gave up, but I recalled during that research I’d read some Vagrant users suggesting that the VirtualBox filesystem could be slow for certain operations. So we tried nfs again. Bam: 40-second npm runs.

It turns out that VirtualBox Shared Folders (vboxsf), the default filesystem when using VirtualBox with Vagrant, is extremely slow for almost all operations (see e.g. http://mitchellh.com/comparing-filesystem-performance-in-virtual-machines). With a tool like npm, which in the course of an install reads and writes thousands of files, this is disastrous. We’d never noticed the issue in our Rails apps, using Bundler, but npm’s architecture (which installs a full copy of every subdependency in the tree), combined with the Javascript fashion for lots of very small modules, was enough to bring the deficiencies of vboxsf to a very noticeable light.

Just switching to nfs, though, wasn’t enough to solve all our problems. When I’d used it before, I’d had issues with unwanted caching (files not appearing to change in the guest machine when changed on the host). So we had to do a bit more research to figure out how to tweak the nfs setup to suit. This is what we ended up with:

# VirtualBox needs this to use nfs
config.vm.network "private_network", type: "dhcp"
config.vm.synced_folder ".", nfs: true,
  mount_options: %w{nolock,vers=3,udp,noatime,actimeo=1}

Note the mount_options parameter: this passes the given array as arguments to the command Vagrant uses to start nfs. Here’s what they do:

nolock prevents the guest and host filesystems from sharing file locking information. It’s a workaround to allow the use of older nfs servers – we found it necessary to enable sharing between our Ubuntu 12 guests and OS X hosts.
vers=3 forces the version of nfs server to use. Again we found this necessary, but you may not.
udp forces the use of udp rather than tcp, which we found gave a performance boost.
noatime prevents the modification of the last-updated timestamp on files, which again gives a performance boost.
actimeo=1 sets the caching timeout to one second, which fixed the issues we were having with unwanted caching.

When the Vagrant machine is started or cycled, you may be asked for your password, because Vagrant has to modify the /etc/exports file on your host system to enable the share. Otherwise, this setup works well for us – we get fast npm runs and file watches that don’t completely pin the guest cpu.

Finally

This way of doing dev work is still fairly immature, and we’ve had to find our own solutions to the problems it poses. There are still things that don’t work – something like Robe for example, which can run a Ruby process and use it to provide code completion and navigation, has so far been too difficult to get working across the host/guest boundary.

That’s a nice-to-have though; the benefits of working this way make it more than worthwhile to work on solutions to the problems.

Automatic Coding

Thoughts on programming, software engineering, and Emacs

Lessons in developing with Vagrant

Watching the filesystem

vagrant-triggers

The front end

npm

nfs

Finally

Comments