04 October, 2015

Linux, Virtualisation and some Performance monitoring

P.S.  This post is more of a noob's voyage towards better virtualisation on Linux than some professional guidance.
 
A few days ago I was facing an issue on my Linux machine where performance suddenly dropped to near unusability while the hard disk LED was on overdrive.  My first thought was that there may be some excessive swapping going on.  The problem was, though, how to identify what was causing this rather than what was happening.

Cheesy image for computers!


I could have guessed what the reason was since I had just turned on maybe the 10th VM on VMWare workstation.  Despite this fact it was not immediately obvious which VM might be swapping rapidly or why it was doing so (there shouldn't be much memory usage during startup).

As yet, I haven't totally found out what it was but messing with some config files did the trick up to a certain point.  First of all I limited VMWare to 16GB or RAM (out of 32) and configured it to swap as much as possible.  I was led to believe that VMWare's and the kernel's swapping mechanisms weren't on the same terms, which ended up with me bashing (excuse the pun) the whole system.  A few miraculous key presses took (Ctrl Alt F1) me to the terminal from where I could, at least, get a list of CPU heavy processes and kill them.  Unfortunately it was not just vmware-vmx processes but also kswapd0 - an integral part of the system which won't easily allow you to kill -9 it.  So basically this was the first indication of a memory issue.

After some googling I reconfigured swapping etc.  but I wasn't able to replicate the issue and quite frankly I really did not want to spend 15 everytime to recover my system.  So the process of finding a solution took days - not continuously trying to fix it of course.  the best solution I could come up with was buying a small 50GB SSD and using it all for swapping.  Apart from that I also set the vm.swappiness to a nice 100.  The memory configuration on VMWare was set to swap as much as possible too.  My idea was to allow everything to swap as much as they can since the disk was much faster now.  Apart from that, I'd have as little occupied memory as possible.

I thought I'd start seeing a lot of fast swapping this time, so in case I got into the same situation again, it would be much easier to recover.  In fact it did happen once again, but this time the system was under much more stress, so the extra swapping did help.  This time I had a little script prepared so the 10 second long keypresses would not waste much of my time when typing in all the arguments.  I used the following script to see what was hogging the CPU, network, disks - almost every possible bottleneck I could think of:

#!/bin/bash
dstat -cdnpmgs --top-bio --top-cpu --top-mem

Short and sweet, just calling dstat with canned arguments!  Calling jtop is a lot shorter than all those arguments, that's for sure.  Again, the result was swapping a swapping issue. 

dstat however showed me something I was not really expecting.  RAM usage wasn't really that bad, actually, just by looking at numbers it was great - less than 50% usage.  However there were some more numbers and at that point I was not sure if I was actually using ~40% or 97%.

Reading up on Linux memory management taught me another thing.  Linux is actually making use of much more RAM, however the bulk of it is caching.  This cache is cleared when more memory usage is required by processes.  Effectively I would see that there is less than 2-3% free RAM but that is not the correct way to read it.  So there is some silver lining to this issue - I got to learn quite a lot more on memory management on Linux.

Following this result I started looking for a virtualisation solution that did not try to re-implement what the kernel was built to do.  Not that I have anything in particular against VMWare or how it is implemented, but I was quite sure that the problem was originating from it.  After a bit more educated reading on virtualisation, and a bit more courage to move out my (then) GUI-based comfort zone (few weeks before the said case I was mostly a Windows user..), I came to the conclusion that the Linux-based systems were potentially much better.


The logo is cool though
Here I introduced myself to KVM and Xen.  Both appear to be more ingrained into the system and had potentially better memory management.  I read up on some general performance and history of both systems and KVM appeared to have the upper hand.  Being a more integral part of the Linux eco-system (and marginally faster https://major.io/2014/06/22/performance-benchmarks-kvm-vs-xen/) I opted to base my future VMs on KVM.  I'm happy to say that I've never looked back since then and the impressive performance I enjoy on KVM is (on my system) unparalleled.

I'll let the kernel manage my memory
There is no particular conclusion here, except that maybe you should be familiar with your options before making decisions.  I've got nothing against VMWare, as I said, I simply found something that works better for me.  Management tools are far better on the VMWare side, but I'm satisfied with what VM Manager offers in terms of management and monitoring.  Oh, you may also make use of the "script" I have.  It's convenient when you need to see some performance details while not keying in some 5 arguments.  I might write something on KVM next time, since it allows one to define many more options rather than a few clicks and done.