Monitoring data transfers through pipes


Posted by Diego Assencio on 2014.02.07 under Linux (Shell)

A friend of mine recently introduced me to a nifty Linux tool which monitors the progress of data transfers through pipes: pv. There are lots of cool things that can be done using pv; this post describes a few of them.

Before jumping to the applications, a few words about pv are necessary: as said above, pv monitors the progress of an amount of data being transferred through a pipe. It takes the input from a pipe and outputs it to standard output; the progress is shown on standard error. To clarify, if you run:

cat input.txt | pv > output.txt

the contents of input.txt will be sent to pv, which will then output this data to its standard output (the file output.txt). The screen output will show the transfer progress:

1.11GB 0:00:10 [88.9MB/s] [         <=>                                  ]

The output shows the amount of data already transferred, the time elapsed, the current data transfer rate through the pipe and, if possible to determine, the progress of the transfer (the <=> marker moves to the right as the file is transferred to indicate the current progress; it will bounce right and left if the progress cannot be determined). OK, now it's fun time!

1) Measure maximum sequential reading speed of a hard disk

If the device node through which your hard disk is accessible is /dev/sda, you can measure its maximum sequential reading seed with the following command:

sudo cat /dev/sda | pv > /dev/null

On my laptop, the maximum sequential reading speed is about 100MB/s:

 961MB 0:00:10 [ 102MB/s] [            <=>                               ]

The command above works on Ubuntu/Debian; if you are using some other distribution, you might have to change it to have it work on your system.

2) Measure the speed of /dev/zero

On Linux, the file /dev/zero is a device which constantly outputs the ASCII zero character (0x00). One common use of /dev/zero is to completely destroy the data on a disk by writing zeros over its entire extension (do not run the command below unless you know what you are doing):

sudo dd if=/dev/zero of=/dev/sda

A valid question is: how fast does /dev/zero generate zeros? Let's find out:

cat /dev/zero | pv > /dev/null

The output I get on my laptop is similar to this:

22.1GB 0:00:08 [2.85GB/s] [         <=>                                  ]

In other words, /dev/zero generates zeros at a rate of 2.85GB/s (it actually oscillates between 2.5GB/s and 3GB/s). That's pretty fast, but I have seen even faster: 3.95GB/s on my desktop computer.

3) Measure the rate at which entropy is produced

Linux has an entropy pool which stores collected environmental noise from several devices (e.g. the hard disk, the keyboard etc.). The kernel keeps an estimate of the number of bits of gathered noise in this entropy pool.

The gathered entropy is used by the kernel to generate random numbers. For instance, if you run:

cat /dev/random

you should see a bunch of characters on the screen which are formed from random bits taken out of the entropy pool. By outputting random characters (bytes), the entropy pool loses bits of gathered noise. When it is depleted, the output will stop. You should get more output as soon as more entropy is gathered.

Now, back to pv: to see the rate at which entropy is produced on your computer, run:

cat /dev/random | pv > /dev/null

When my laptop is idle, no entropy is generated. However, as soon as I start to type, the entropy generation jumps to approximately 8B/s. By moving my mouse erratically, the entropy generation starts oscillating between 8B/s and 16B/s. To generate entropy without having to act like a maniac, I ran the following command:

ls -R /

The command above will recursively list all system files and therefore generate lots of hard disk activity. On my laptop, this gets the entropy generation rate to oscillate between 8B/s and 16B/s.

NOTE: readers who would like to learn more about this topic should take a look at the man page of /dev/random with the following command:

man 4 /dev/random

Comments

Martin Seener on Mar 31, 2014:
Enough entropy is essential to modern computing not only for encryption but also for every process or fork started since modern OS are using ASLR which requires also some entropy. On most Linux distributions like Debian the entropy pool mostly has only about 124bits which is quite low.

To greatly improve that situation it is always recommended to use either a hardware random number generator or the good old havege-daemon (apt-get install haveged) which is for example available for squeeze (backports) and newer as a standard package. This algorithm uses some more cpu metrics to generate quite good random numbers and feeds them into the linux entropy pool and ensures (by default) that is does not run lower than 1024bits. For even better entropy pools one can adjust that in /etc/defaults/haveged with the -w parameter.

By default, the linux entropy pool can hold up to 4096bits so you can safely set the haveged threshold to 4096, so your pool will always have full entropy available.

Leave a reply

NOTE: A name and a comment (max. 1024 characters) must be provided; all other fields are optional. Equations will be processed if surrounded with dollar signs (as in LaTeX). You can post up to 5 comments per day.