Monitoring data transfers through pipes


Posted by Diego Assencio on 2014.02.07 under Linux (Shell)

Linux contains a nifty tool which monitors the progress of data transfers through pipes: pv. There are lots of cool things that can be done using pv; this post describes a few of them.

Before jumping to the applications, a few words about pv are necessary: as said above, pv monitors the progress of an amount of data being transferred through a pipe. It takes the input from a pipe and outputs it to standard output; the progress is shown on standard error. To clarify, if you run:

cat input.txt | pv > output.txt

the contents of input.txt will be sent to pv, which will then output this data to its standard output (the file output.txt). The screen output will show the transfer progress:

1.11GB 0:00:10 [88.9MB/s] [         <=>                                  ]

The output shows the amount of data already transferred, the time elapsed, the current data transfer rate through the pipe and, if possible to determine, the progress of the transfer (the <=> marker moves to the right as the file is transferred to indicate the current progress; it will bounce right and left if the progress cannot be determined). OK, now it's fun time!

1) Measure the maximum sequential reading speed of a hard disk

If the device node through which your hard disk is accessible is /dev/sda, you can measure its maximum sequential reading speed with the following command:

sudo cat /dev/sda | pv > /dev/null

On my laptop, the maximum sequential reading speed is about 100MB/s:

 961MB 0:00:10 [ 102MB/s] [            <=>                               ]

The command above works on Ubuntu/Debian; if you are using another distribution, you might have to change it to have it work on your system.

2) Measure how fast data can be transferred through a pipe

On Linux, the file /dev/zero is a device which constantly outputs the ASCII null character (0x00). This device is commonly used to completely destroy the data on a disk by writing zeros over its entire extension (do not run the command below unless you know what you are doing):

sudo dd if=/dev/zero of=/dev/sda

Because /dev/zero outputs null characters extremely fast, it can be used to measure the maximum speed at which data can be transferred through a pipe. The command below shows how this can be done (you will eventually need to press Ctrl+C to stop it since /dev/zero will never stop generating input for pv):

cat /dev/zero | pv > /dev/null

The output I get on my laptop is similar to this:

22.1GB 0:00:08 [2.85GB/s] [         <=>                                  ]

In other words, data can be piped at a rate of 2.85GB/s (it actually oscillates between 2.5GB/s and 3GB/s on my system). That's pretty fast, but I have seen even faster speeds: 3.95GB/s on my desktop computer.

As a side note, the maximum speed at which /dev/zero can produce null characters can be estimated with the dd command:

dd if=/dev/zero of=/dev/null bs=1M status=progress

The speed shown by dd will take a few seconds to stabilize. Here is what the output typically looks like:

106460872704 bytes (106 GB, 99 GiB) copied, 10 s 10.6 GB/s

As the output shows, /dev/zero can indeed generate null characters really fast. The speed observed with the command above will however vary depending on the value passed to the bs parameter (which specifies how many bytes are transferred from /dev/zero to /dev/null at a time), so you may want to increase it or decrease it to get a better estimate of the actual maximum speed of /dev/zero.

3) Measure the rate at which entropy is produced

Linux has an entropy pool which stores collected environmental noise from several devices (e.g. the hard disk, the keyboard etc.). The kernel keeps an estimate of the number of bits of gathered noise in this entropy pool.

The gathered entropy is used by the kernel to generate random numbers. For instance, if you run:

cat /dev/random

you should see a bunch of characters on the screen which are formed from random bits taken out of the entropy pool. By outputting random characters (bytes), the entropy pool loses bits of gathered noise. When it is depleted, the output will stop. You should get more output as soon as more entropy is gathered.

Now, back to pv: to see the rate at which entropy is produced on your computer, run:

cat /dev/random | pv > /dev/null

When my laptop is idle, no entropy is generated. However, as soon as I start to type, the entropy generation jumps to approximately 8B/s. By moving my mouse erratically, the entropy generation starts oscillating between 8B/s and 16B/s. To generate entropy without having to act like a maniac, I ran the following command:

ls -R /

The command above will recursively list all system files and therefore generate lots of hard disk activity. On my laptop, this gets the entropy generation rate to oscillate between 8B/s and 16B/s.

NOTE: readers who would like to learn more about this topic should take a look at the man page of /dev/random with the following command:

man 4 /dev/random

Comments

Martin Seener on Mar 31, 2014:
Enough entropy is essential to modern computing not only for encryption but also for every process or fork started since modern OS are using ASLR which requires also some entropy. On most Linux distributions like Debian the entropy pool mostly has only about 124bits which is quite low.

To greatly improve that situation it is always recommended to use either a hardware random number generator or the good old havege-daemon (apt-get install haveged) which is for example available for squeeze (backports) and newer as a standard package. This algorithm uses some more cpu metrics to generate quite good random numbers and feeds them into the linux entropy pool and ensures (by default) that is does not run lower than 1024bits. For even better entropy pools one can adjust that in /etc/defaults/haveged with the -w parameter.

By default, the linux entropy pool can hold up to 4096bits so you can safely set the haveged threshold to 4096, so your pool will always have full entropy available.
aNeutrino on May 28, 2020:
try

dd if=/dev/zero of=/dev/null bs=64k count=1M

if you want to see the speed of /dev/zero/

using '|' is slowing down things so `pv` can not show real sped but the speed of pipe.
Diego Assencio on May 29, 2020:
@aNeutrino: Thank you very much for pointing that out. I fixed the post accordingly.