Thursday, November 07, 2013

Understanding Vmstat Output

Below is a Vmstat output:
procs    -----------memory-------------- ---swap--  -----io----  --system--   -----cpu--------
r  b          swpd    free    buff   cache     si     so    bi     bo       in      cs  us  sy id wa st
2 0    2573144 12404  1140  47128  185  263  185  299  3173  3705  92  8  0    0 0
3 0    2574708 12304  1188  47436  192  187  192  234  3079  3468  92  8  0    0 0
Under Procs we have
       r: The number of processes waiting for run time or placed in run queue or are already executing (running)
       b: The number of processes in uninterruptible sleep. (b=blocked queue, waiting for resource (e.g. filesystem I/O blocked, inode lock))

If runnable threads (r) divided by the number of CPU is greater than one -> possible CPU bottleneck

(The (r) coulmn should be compared with number of CPUs (logical CPUs as in uptime) if we have enough CPUs or we have more threads.)

High numbers in the blocked processes column (b) indicates slow disks.

(r) should always be higher than (b); if it is not, it usually means you have a CPU bottleneck

Note: “cat /proc/cpuinfo” dispalys the cpu info on the machine
>cat /proc/cpuinfo|grep processor|wc -l
output: 16

Remember that we need to know the number of CPUs on our server because the vmstat r value must never exceed the number of CPUs. r value of 13 is perfectly acceptable for a 16-CPU server, while a value of 16 would be a serious problem for a 12-CPU server.

Whenever the value of the r column exceeds the number of CPUs on the server, tasks are forced to wait for execution.There are several solutions to managing CPU overload, and these alternatives are:
1.      Add more processors (CPUs) to the server.
2.      Load balance the system tasks by rescheduling large batch tasks to execute during off-peak hours.

Under Memory we have:

swpd: shows how many blocks are swapped out to disk (paged). The amount of Virtual memory used.
            Note: you can see the swap area configured in server using "cat proc/swaps"
>cat /proc/meminfo
>cat /proc/swaps
Filename                        Type            Size    Used    Priority
/dev/dm-7                       partition       16777208        21688   -1

free: The amount of Idle Memory
buff: Memory used as buffers, like before/after I/O operations
cache: Memory used as cache by the Operating System

Under Swap we have:
si: Amount of memory swapped in from disk (/s). This shows page-ins
so: Amount of memory swapped to disk (/s). This shows page-outs. The so column is zero consistently, indicating there are no page-outs.

In Ideal condition, si and so should be at 0 most of the time, and we definitely don’t like to see more than 10 blocks per second.

Under IO we have:
bi: Blocks received from block device - Read (like a hard disk)(blocks/s)
bo: Blocks sent to a block device – Write(blocks/s)

Under System we have:
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.

(A context switch occurs when the currently running thread is different from the previously running thread, so it is taken off of the CPU.)

It is not uncommon to see the context switch rate be approximately the same as device interrupt rate (in column)

If cs is high, it may indicate too much process switching is occurring, thus using memory inefficiently.
If cs is higher then sy, system is doing more context switching than actual work.

High r with high cs -> possible lock contention
Lock contention occurs whenever one process or thread attempts to acquire a lock held by another process or thread. The more granular the available locks, the less likely one process/thread will request a lock held by the other. (For example, locking a row rather than the entire table, or locking a cell rather than the entire row.)

When you are seeing blocked processes or high values on waiting on I/O (wa), it usually signifies either real I/O issues where you are waiting for file accesses or an I/O condition associated with paging due to a lack of memory on your system.

Note: the memory, swap, and I/O statistics are in blocks, not in bytes. In Linux, blocks are usually 1,024 bytes (1 KB).

Under CPU we have:
These are percentages of total CPU time.
       us: % of CPU time spent in user mode (not using kernel code, not able to acces to kernel resources). Time spent running non-kernel code. (user time, including nice time)
       sy: % of CPU time spent running kernel code. (system time)
       id: % of CPU  idle time
       wa: % of CPU time spent waiting for IO.

Note: the memory, swap, and I/O statistics are in blocks, not in bytes. In Linux, blocks are usually 1,024 bytes (1 KB).

To measure true idle time measure id+wa together:
- if id=0%, it does not mean all CPU is consumed, because "wait" (wa) can be 100% and waiting for an I/O to complete
- if wait=0%, it does not mean I have no I/O waiting issues, because as long I have threads which keep the CPU busy I could have additional threads waiting for I/O, but this will be masked by the running threads

If process A is running and process B is waiting on I/O, the wait% still would have a 0 number.
A 0 number doesn't mean I/O is not occurring, it means that the system is not waiting on I/O.
If process A and process B are both waiting on I/O, and there is nothing that can use the CPU, then you would see that column increase.

- if wait% is high, it does not mean I have io performance problem, it can be an indication that I am doing some IO but the cpu is not kept busy at all
- if id% is high then likely there is no CPU or I/O problem

To measure cpu utilization measure us+sy together (and compare it to physc):
- if us+sy is always greater than 80%, then CPU is approaching its limits 
- if us+sy = 100% -> possible CPU bottleneck
- if sy is high, your appl. is issuing many system calls to the kernel and asking the kernel to work. It measures how heavily the appl. is using kernel services.
- if sy  is higher than us, this means your system is spending less time on real work (not good)

Mointor System with Vmstat:
>nohup vmstat -n 10 604879 > myvmstatfile.dat &
To generate one week of Virtual Memory stats spaced out at ten second intervals (less the last one) is 60,479 10 second intervals

> nohup vmstat -n 3 5|awk '{now=strftime("%Y-%m-%d %T "); print now $0}' > vmstat.data &
Append timestamp to the vmstat output.