James Farrugia's Blog

18 November, 2017

Operation routines

It isn't unusual that I'm setting up a VPS, or quickly spinning up a VM, and having to go through two or three websites (partially) documenting the process of installing a database or a systemd service for example. It even happens with my own tools, such as trying to find the README explaining how to install and set up some Spring Boot application.

I'm trying to solve this little problem by launching kurlibro.com, a simple side project that I built over the course of four Saturdays or so. All is does is basically allow the user to easily create a step by step document which can later be used as a reference.

So go ahead and try it out now :)

https://kurlibro.com

Thanks!

03 December, 2015

Linux Performance Monitoring and Tuning

A few week ago I went over an issue I faced when deploying a number of VMs on a Linux OS. I did find a solution to my issues which was thanks to the open nature of Linux itself, however I promised myself to learn more about performance monitoring and to write about it. Today I feel much more comfortable analysing and dealing with issues that come up and this list of utilities helped me tune the performance of my systems.

Just like KVM, a virtualisation solution available directly on the Linux Kernel, numerous tools exist right out of the box on many Linux distributions that help one monitor and tune performance. When these are not enough, a simple package installation will make available even more powerful tools.

When dealing with performance issues we would typically look at CPU usage, memory consumption, and disk and network utilisation.

The CPU is by far the fastest component in your system. In order to make the most efficient use of the system we would need to have it at a high usage percentage (without saturating it, of course). If we are running some heavy load service and it is performing badly while the CPU is sitting happily at 4% than something else must be going very wrong. I'll go through some commands and methods that I came across that helped detect and solve some severe performance issues.

Pedal to the metal

General overview

First things first - before digging into configuration files and what not, we should get a general overview of what is happening our system. I would generally start with some tools that provide a good context of the processes, such as a list of processes and respective resource utilisation, general memory availability and overall system load.

top

The top utility quickly gives a good indication of which processes are using too many resources. It is good to note that top shows CPU usage as a percentage of all processing capabilities; if you have 4 CPUs, 99% CPU usage means that your consumption is about 25% of all available processing power. We'll get into this later on when we see how much each CPU is being utilised.

Top can be easily brougt up by typing in just top. Various other parameters may be used for finer control. These can be seen by passing in man top. When done, hit q to exit.

dstat

Similar to top but this is system focused rather than process oriented. It shows general metric on CPU, disk, network and memory. This utility is extremely extensible, with plugins enabling many more features, even integrating with MySQL, for example. Despite these features, simply firing up dstat with zero arguments is enough to provide a good overview. A favourite config of mine is the one below which I found very useful when analysing my VM problem. It highlights the top CPU consuming process as well as the blocking IO (which is typically slo-o-ow) . Additionally, I also get some nice metrics on memory usage, including buffers and caches.

Errors

Gotta catch 'em all

Performance may not always suffer because the hardware is not able to keep up. Application errors may be causing software to underperform while unfortunately not making evident that something is going wrong. Applications will typically log any errors or problems they encounter, but where are they stored?

/var/log/

On Debian or Ubuntu based distributions it is normal to find application and daemon logs in this directory. Navigating to it and listing all the files (and grepping the result) will probably yield the log files or your underperforming service. In this example, I simply list all the files in this log directory and look for the apt directory - a trivial command which may be easily extended. MySQL for example has a slow query log file, which may be useful in case of a slow database. It is a matter of simply opening up the log file and looking for warnings and errors to find a potential problem.

dmesg | tail

Another great command is dmesg. This is just like calling cat and a log file, however this is by default callable from anywhere just like a normal command. What dmesg does is list the log messages from the kernel. Deamons will typically log messages that are accessible form this command. For example, if we misconfigure nginx and try to launch the service, any errors will be logged here. To bring up the last few lines of the log (which can be very long), simply pass the output to the tail command. The syntax is simply dmesg | tail.

Detailed Analysis

Once we have a better idea of what's malfunctioning, we can start digging deeper into the metrics. As

The root of so many problems

mentioned earlier on, the general areas are the CPU, memory, and IO. From this point, it is better to make use of an other package not typically available out of the box. I'll deal with Debian based distributions (Ubuntu, Mint, Elementary, etc.), however such packages are available on others via their respective package managers.

The sysstat packages offers numerous performance monitoring tools - install it using sudo apt-get install sysstat.

pidstat

pidstat is quite similar to top in the sense that it offers an overview of top processes and their related metrics. The main difference is that this command will keep writing to the output rather than refreshing the list - making it easier to keep outputs in a file or try to find out patterns.

To invoke pidstat and keep a rolling output, simply pass pidstat 1.

free

This will not set anything free, but only show some number on how much memory is free. This is the one exception that is actually available out of the box rather than requiring an extra package. free displays some numbers on memory usage however it might be confusing to new comers or users who are typically accustomed to total = free + used. In the case of Linux, a considerable part of memory is used by the cache when not being used by any applications. This helps the system open up files from disk much more quickly.

As a result, the free command will display another column and show that there is actually very little memory that is free. This can also be seen in the dstat output where the total RAM available is calculated by adding all 4 columns rather than just 2. The cache is quite flexible and will be cleared as soon as more memory is required by processes, meaning that practical free RAM is equal to the free + cached memory. To run this command, simply pass free -m. You may pass zero parameters to get the values in kilobytes, -m for megabytes and -g for gigabytes.

mpstat

On multiprocessor systems, it is vital to monitor each processor utilisation when things get ugly. Sometimes you may note that one CPU is handling all the work while the others are basking in its heat. This is a bad sign indicating that some process is not handling multiple processors correctly and, worse, hogging one of them to unusability. The great mpstat command will show a breakdown of the utilisation of each CPU on your system.

Similar to pidstat, this is from the sysstat package and may be used to produce a rolling output. An excellent way to relate CPU usage to slow IO is the iowait column. The lower this is the greater the efficiency, since it means that the CPU is actually doing work rather than waiting uselessly.

Using this command is simple: mpstat -P ALL 1

Do use this command when things are getting slow since it may quickly lead to either a problem in IO or simply an improperly configured application.

iostat

In case of slow IO identified from mpstat, iostat will provide further details on what device is functioning slowly. This utility shows which devices are being used at an instant and their utilisation. Ideal utilisation is below 60%, otherwise it is likely that it is being saturated. This mostly applies to physical block devices - a virtual device that maps to multiple physical ones may simply be used heavily while the physical backend may be capable of handling much more load (i.e. thing are working quite efficiently).

On relatively basic systems though, a high utilisation which is accompanied with a high iowait is very much likely a case of very bad IO performance. I noted that (unsurprisingly) SSDs will drop utilisation from 99% down to about 15%. SSDs may not always be available, however in my case I was able to map a region of memory as a filesystem. Of course, many cases will not have (or want) to be mapped to RAM, but finding a better physical device will most probably fix issues in this metric (or, if possible, implementing efficient buffers and writing to disk on separate threads).

In order to produce a nice rolling update, just issue: iostat -xz 1

sar

IO performance may suffer also on the network side. This however is less likely, at least from my experience, but also mostly because networks do not deal with any mechanical devices such as hard disks. It is also much more likely that an application is not correctly managing its network handling rather than a slow TCP stack or network card. Tools exist though that allow monitoring of network performance, one of which is sar, also available from the sysstat package.

The amount of data going through each network interface can be monitored in, again, a rolling output. This may be useful to check if the NIC is being used to its potential or if it is able to handle many more connections before getting saturated.

This can be called using sar -n DEV 1

Conclusion

This post was probably no revelation for many, however it can be a good starting point if you're feeling lost in a world of tools and sometimes weird commands. Linux offers numerous metrics which many utilities use to provide a good picture of the system's efficiency. This list of utilities is by no means exhaustive - it is simply a collection of utilities that I used along the way and found useful when performing load testing on various servers. Feel free to comment on any other tools, suggestions or even corrections.

16 October, 2015

Java Programming Tutorial - Unit 2 - Methods and Variables

Let's start from where Unit 1 left off. During the first steps, we instructed our computer to write "Hello World!" to the display. Through that unit, we went through quite some material, despite it being so simple. In this unit, we'll cover more practical points rather than theoretical ones, so get ready to write some more code this time!

Variables

So, what is a variable? As the name clearly indicates, it is something that varies. A variable is just a label which you can use to store something. Let's say we want to store our user's name, we create a new label named username and assign the user's input to this label. A user types in their name and we instruct the computer to store the input somewhere which can be addressed using the word "username".

To better understand and appreciate how useful that little label is, try imagining having hundreds of such variables all without a human-friendly name. You d not need to go far, older languages had no such concept and used exclusively memory addresses.

With this knowledge, you can now think of your computer's memory as being a large room full of P.O. Boxes. Each P.O. Box may be referred to by its number. In modern languages you can give each P.O. Box its own unique name too, so it's easier for you to know what you're working with.

The next bit is theoretical, however its good to know about variable terminology.
Java is known as being strongly-typed. This strongly named description simply means that each variable can have one type, and one type only. If we declared our username variable as being of type text, it can only contain text. If we had another one for storing a number, it can only store a number. It's a restriction, but it's convenient. There are language that have variables whose types change during runtime, or weakly-typed. It's convenient too, but its easier to shoot yourself in the foot if you're new to it.

Let's make use of a variable in a more practical example. In this task, we want our text to be defined as a variable, rather than passing a direct value to System.out.println.

As you can see, the change is minor. In the new line, the only thing which may be new is the String label. A String is simply a series of characters; it is a type of variable which you'll find in the vast majority of programming languages.

Variable types

Now that you have declared your first variable, and hopefully got to understand the relation between the type of the variable and the content it stores, it is safe to introduce the list of primitive types in Java. As you now know, Java is object oriented and everything is defined as a class. This implies that every instance in our program is an object. However, this is not entirely true, since objects need to be made up of something. If we keep going deeper into what constitutes and object, we find that there are only 8 primitive types. Each primitive type is made up of some number of bits. These are as follows; afterwards we'll go through them:

boolean - no specific number of bits, but practically 1
byte - 8 bits
char - 16 bits
short - 16 bits
integer - 32 bits
long - 64 bits
float - 32 bits
double - 64 bits

As you can see (assuming you're familiar with bits, the basic units of information), all types are practically increasing sizes of numbers. No letters, no images, nothing but numbers. Later I'll explain how everything can be made from these primitives, but first let's see how we can organise them into roughly three categories.

First we have the boolean type. This can have just two values, 1 or 0. Effectively we use true or false in Java, and is mostly used for setting states and flags.

Next come the natural numbers. All primitives from the byte to the long can fall under this category. Values stored by these types cannot have any values after the decimal point. One thing to note about the char type is that it does store a numeric value, however it is treated as a character. Note also that it is a 16-bit unicode.

Finally we have the real numbers; the float and double. Double, as the name implies, is just double the size of a float. It is usually much more practical to work with a double unless you're working on a high performance system where memory is precious (not all systems have gigabytes of memory to waste).

Primitive types can be easily declared or have a value assigned to them. If you want an integer with a value of 10, simply enter:

int myNumber = 10;

Composites

Now that you have the most granular types, it is possible to mix and match to create more complex types. A composite is basically another name for an object. The String type, for example, is a composite. In order to explain this composite, we need to introduce another programming term; arrays. An array is just a contiguous series of memory cells, each containing a value of the same type. Java has native support for arrays supports defining new arrays during runtime (older languages did not support this directly). The next snippet shows how we can use an array of characters to emulate a String, albeit in a less practical way.

Unlike primitives, composites, or the proper name, objects, are created using the new keyword. The declaration also follows this convention:
Type myTypeVariable = new Type();

As you can see, there is the type, the name, followed by the assignment to a new instance of the class (or type). Note though, that the String is an exceptional case in Java and can be declared like a primitive. This is only an exception and does not apply to any other class.

Probably the String is not enough, so let's go through some more examples. Let's say we want to show a picture. What constitutes a picture? Pixels, the number of pixels in width, and in height. Width and height are just numbers. The pixels are an array of the Pixel object (so we also have nested composites). And after that, what is in each pixel? Three values for the primitive colours Red, Green and Blue; again, three numbers.

Let's define our own Picture type. First, we need a Pixel. Then we'll create a Picture and we'll find its area. Using this area, we'll set the value of the pixels in our Picture, since initially this is null.

Now we'll create the program "body". The main class this time will create the image, the pixel array, and print out the area. Note how we concatenated the text and a variable using the '+' symbol. I'll explain the operators later on in this unit.

So you see, pretty much anything can be reduced to a number.

Operators

As I mentioned earlier, I'll give an introduction to operators. These are not so complex so there is not much else to learn about them.

Operators are the symbols used in code, such as the '+', '-', etc. The plus can be used for concatenating anything. For example, let's say we have variables a and b. a + b could mean the following:
If a and b are primitive, the result is a primitive. If any of a or b is not a primitive, the result is always a String representation.

Other operators are only reserved for primitives:

The minus '-' used to subract;
The star '*' is used to multiply;
The slash '/' is to divide. The value is rounded if not of types float or double;
The percent '%' used for obtaining the modulo;
The hat '^' is used for bitwise XOR;
The pipe '|' is used for bitwise OR;
The ampersand '&' is used for bitwise AND;
The exclamation mark '!' is used for NOT;
The greater than '>' and less than '<', for...well greater or less than;
The double greater and less than ('<<' and '>>') for bit shifting;

You shall not be using many of these in the early days. However you should be familiar with the computing terms used here (such as shifting and bitwise operations).

Methods

We mentioned something about methods during the first unit, mostly trying to relate them to the methods in your recipe books. This time, we shall add more methods to our little picture program. At first it might seem like overkill to have too many methods for a simple task, but as your project grows you'll come to appreciate shorter and more frequent methods.

So, the first task - adding new methods. But why, what are they going to do? Imagine we want our program to accept a user input. For this task, we'll use methods that were written by others - we'll be calling those methods. Afterwards we'll break up our program into smaller methods so that later on we can follow better programming practice. In this case we'll write our own methods too.

The program

Our next task will be to add on to the Hello World Picture program. This time the area will be calculated by the picture, rather than us having to calculate it in our main program. We'll also let the user specify the width and height of the picture. This user input will need some processing, as we shall see next.

First, we'll extend the Picture class so that it can support its own methods. We shall call this PictureExtended to avoid confusion for now.

Next we'll upgrade the main program. As you can see, it has many more methods and the functionality is more granular. If we had two pictures for example, we could still call the same createPicture, thus avoiding duplicate code.

Static vs not static

Note how we put static in front of methods in the main class, while we did not put any in the PictureExtended. Now that we do have some methods and classes, it is safe to explain it.

Static methods are those methods that can be called without having an instance of the enclosing class. For example, we never declare a new System or Stream class, but we call println on System.out variable (which is of type Stream). This is because it is declared static. However, we cannot call getArea() on PictureExtended by itself. We must have a new PictureExtended and place it in a variable. We are then able to call it from the variable.

This is basically the difference between static and non-static; if it is static, it can be called without an instance of the enclosing class; However, it cannot access the non-static members of the class. Let's say we make the getArea() static, in that case, we cannot access the width and height values of the PictureExtended instance.

Accepting user input

We are able to accept user input via the Scanner class. Again, this is just like System, a class already provided with Java (although we had to create a new Scanner, unlike System). Note how we passed System.in to it, telling it that we expect to receive input from the standard system input; the keyboard.

The difference from System is that Scanner resides in what is known as a package which is different from ours. We will go through packages in a later Unit, however note how we needed to import the class. The import statement has to be at the very top, outside the class declaration.

Conclusion

This was quite a long unit and covers quite a lot. We created new classes, instances of these classes, or objects, static methods and imported some other ones too. In the next units we shall go over further interesting bits of programming in Java, such as loops, cases and conditionals.

Other tutorials (which are just as good or better) may hold off explaining the details of classes and objects initially. I believe that this might send off the wrong message about Java. It is understandable that it is initially complicated, however it will embed the idea that in Java one should follow an object oriented methodology, otherwise the code will not be up to standard. Not that it is incorrect, but as projects grow, not following conventions will make Java very frustrating.

So, as a precaution, I'm giving out fairly detailed descriptions of why classes and objects before going further into the traditional loops and conditionals. Hopefully the descriptions coupled with the actual code will make it more natural.

Thank you!