★ An OpenStack Crime Story solved by tcpdump, sysdig, and iostat

Previously on OpenStack Crime Investigation* … Two load balancers running as virtual machine in our OpenStack based cloud, sharing a* keepalived based highly available IP address started to flap, switching the IP address back and forth. After ruling out a misconfiguration of keepalived and issues in the virtual network, I finally got the hint that the problem might originate not in the virtual, but in the bare metal world of our cloud. Maybe high IO was causing the gap between the keep alive VRRP packets.

When I arrived at baremetal host node01, hosting virtual host loadbalancer01, I was anxious to see the IO statistics. The machine must be under heavy IO load when the virtual machine’s messages are waiting for up to five seconds.

I switched on my iostat flash light and saw this:

$ iostat
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              12.00         0.00       144.00          0        144
sdc               0.00         0.00         0.00          0          0
sdb               6.00         0.00        24.00          0         24
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0
sdf              20.00         0.00       118.00          0        118
sdg               0.00         0.00         0.00          0          0
sdi              22.00         0.00       112.00          0        112
sdh               0.00         0.00         0.00          0          0
sdj               0.00         0.00         0.00          0          0
sdk              21.00         0.00        96.50          0         96
sdl               0.00         0.00         0.00          0          0
sdm               9.00         0.00        64.00          0         64

Nothing? Nothing at all? No IO on the disks? Maybe my bigger flash light iotop could help:

$ iotop

Unfortunately, what I saw was too ghastly to show here and therefore I decided to omit the screenshots of iotop¹. It was pure horror. Six qemu processes eating the physical CPUs alive in IO.

So, no disk IO, but super high IO caused by qemu. It must be network IO then. But all performance counters show almost no network activity. What if this IO wasn’t real, but virtual? It could be the virtual network driver! It had to be the virtual network driver.

I checked the OpenStack configuration. It was set to use the para-virtualized network driver vhostnet_.

I checked the running qemu processes. They were also configured to use the para-virtualized network driver.

$ ps aux | grep qemu
libvirt+  6875 66.4  8.3 63752992 11063572 ?   Sl   Sep05 4781:47 /usr/bin/qemu-system-x86_64
 -name instance-000000dd -S ... -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 ...

I was getting closer! I checked the kernel modules. Kernel module vhostnet_ was loaded and active.

$ lsmod | grep net
vhost_net              18104  2
vhost                  29009  1 vhost_net
macvtap                18255  1 vhost_net

I checked the qemu-kvm configuration and froze.

$ cat /etc/default/qemu-kvm
# To disable qemu-kvm's page merging feature, set KSM_ENABLED=0 and
# sudo restart qemu-kvm
KSM_ENABLED=1
SLEEP_MILLISECS=200
# To load the vhost_net module, which in some cases can speed up
# network performance, set VHOST_NET_ENABLED to 1.
VHOST_NET_ENABLED=0

# Set this to 1 if you want hugepages to be available to kvm under
# /run/hugepages/kvm
KVM_HUGEPAGES=0

vhostnet_ was disabled by default for qemu-kvm. We were running all packets through userspace and qemu instead of passing them to the kernel directly as vhostnet_ does! That’s where the lag was coming from!

I acted immediately to rescue the victims. I made the huge, extremely complicated, full 1 byte change on all our compute nodes by modifying a VHOSTNETENABLED=0 to a VHOSTNETENABLED=1, restarted all virtual machines and finally, after days of constantly screaming in pain, the flapping between both load balancers stopped.

I did it! I saved them!

But I couldn’t stop here. I wanted to find out, who did that to the poor little load balancers. Who’s behind this conspiracy of crippled network latency?

I knew there was only one way to finally catch the guy. I set a trap. I installed a fresh, clean, virgin Ubuntu 14.04 in a virtual machine and then, well, then I waited — for apt-get install qemu-kvm to finish:

$ sudo apt-get install qemu-kvm
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  acl cpu-checker ipxe-qemu libaio1 libasound2 libasound2-data libasyncns0
  libbluetooth3 libboost-system1.54.0 libboost-thread1.54.0 libbrlapi0.6
  libcaca0 libfdt1 libflac8 libjpeg-turbo8 libjpeg8 libnspr4 libnss3
  libnss3-nssdb libogg0 libpulse0 librados2 librbd1 libsdl1.2debian
  libseccomp2 libsndfile1 libspice-server1 libusbredirparser1 libvorbis0a
  libvorbisenc2 libxen-4.4 libxenstore3.0 libyajl2 msr-tools qemu-keymaps
  qemu-system-common qemu-system-x86 qemu-utils seabios sharutils
Suggested packages:
  libasound2-plugins alsa-utils pulseaudio samba vde2 sgabios debootstrap
  bsd-mailx mailx
The following NEW packages will be installed:
  acl cpu-checker ipxe-qemu libaio1 libasound2 libasound2-data libasyncns0
  libbluetooth3 libboost-system1.54.0 libboost-thread1.54.0 libbrlapi0.6
  libcaca0 libfdt1 libflac8 libjpeg-turbo8 libjpeg8 libnspr4 libnss3
  libnss3-nssdb libogg0 libpulse0 librados2 librbd1 libsdl1.2debian
  libseccomp2 libsndfile1 libspice-server1 libusbredirparser1 libvorbis0a
  libvorbisenc2 libxen-4.4 libxenstore3.0 libyajl2 msr-tools qemu-keymaps
  qemu-kvm qemu-system-common qemu-system-x86 qemu-utils seabios sharutils
0 upgraded, 41 newly installed, 0 to remove and 2 not upgraded.
Need to get 3631 kB/8671 kB of archives.
After this operation, 42.0 MB of additional disk space will be used.
Do you want to continue? [Y/n] <
...
Setting up qemu-system-x86 (2.0.0+dfsg-2ubuntu1.3) ...
qemu-kvm start/running
Setting up qemu-utils (2.0.0+dfsg-2ubuntu1.3) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up qemu-kvm (2.0.0+dfsg-2ubuntu1.3) ...
Processing triggers for libc-bin (2.19-0ubuntu6.3) ...

And then, I let the trap snap:

$ cat /etc/default/qemu-kvm
# To disable qemu-kvm's page merging feature, set KSM_ENABLED=0 and
# sudo restart qemu-kvm
KSM_ENABLED=1
SLEEP_MILLISECS=200
# To load the vhost_net module, which in some cases can speed up
# network performance, set VHOST_NET_ENABLED to 1.
VHOST_NET_ENABLED=0

# Set this to 1 if you want hugepages to be available to kvm under
# /run/hugepages/kvm
KVM_HUGEPAGES=0

I could not believe it! It was Ubuntu’s own default setting. Ubuntu, the very foundation of our cloud decided that despite all modern hardware supporting vhost_net to turn it off by default. Ubuntu was convicted and I could finally rest.

This is the end of my detective story. I found and arrested the criminal Ubuntu default setting and were able to prevent him from further crippling our virtual network latencies.

Please feel free to leave comments and ask questions about details of my journey. I’am already negotiating to sell the movie rights. But maybe there will be another season of OpenStack Crime Investigation in the future. So stay tuned on codecentric Blog.

Footnotes

1. Eh, and because I lost them.↵