Posts on Mathematics

Matrix compression: the singular value decomposition


Posted by Diego Assencio on 2014.02.09 under Mathematics (Linear algebra)

I expect most readers of this blog to be familiar with the concept of lossy data compression. For instance, whenever you rip a song from a CD to a lossy format such as MP3 or Ogg Vorbis, you are effectively discarding some data from the song to make it smaller while still preserving most of the audio. A good compression technique yields a song which sounds almost like the original one.

Matrices can sometimes also be compressed in the sense above. For instance, given a matrix $A$, it is sometimes possible to cleverly compute another matrix $\tilde{A} \approx A$ such that $\tilde{A}$ can be represented in a way which requires much less data storage than the original matrix $A$. This approximation can be obtained from a very powerful tool in linear algebra: the singular value decomposition (SVD). This post will not present techniques for computing SVDs, but merely discuss this tool in the context of matrix compression.

The SVD of an $m \times n$ real matrix $A$ is the factorization $A = U\Sigma V^T$, where $U$ is an $m \times m$ real orthogonal matrix, $\Sigma$ is an $m \times n$ diagonal matrix with real nonnegative values along its diagonal (called the singular values of $A$) and $V$ is an $n \times n$ real orthogonal matrix. Every matrix has an SVD (for a proof of this fact, see [1], theorem 3.2).

To illustrate how one can use the SVD to approximate a matrix, I will use octave. On Ubuntu/Debian, you can install octave by opening a terminal and running the following command:

sudo apt-get install octave

Now start octave. I have generated an example matrix which is shown below (in what follows, all user input is highlighted):

octave:1> A
A =

   1.02650   0.92840   0.54947   0.98317   0.71226   0.55847
   0.92889   0.89021   0.49605   0.93776   0.62066   0.52473
   0.56184   0.49148   0.80378   0.68346   1.02731   0.64579
   0.98074   0.93973   0.69170   1.03432   0.87043   0.66371
   0.69890   0.62694   1.02294   0.87822   1.29713   0.82905
   0.56636   0.51884   0.65096   0.66109   0.82531   0.55098

You can generate the matrix above by typing 'A=[' (without the quotes), then copying and pasting the entries from $A$ above and closing it with another square bracket ']'.

Octave provides a method for computing the SVD of $A$:

octave:2> [U,Sigma,V] = svd(A)
U =

  -0.418967  -0.449082   0.754882   0.062414  -0.211638   0.065260
  -0.386832  -0.456685  -0.393449   0.517538   0.420908  -0.204913
  -0.369990   0.429218   0.277696  -0.322484   0.605310  -0.362449
  -0.455375  -0.257776  -0.427049  -0.676528  -0.289362  -0.048923
  -0.469840   0.546519  -0.099679   0.407065  -0.521257  -0.182266
  -0.331389   0.201011  -0.076997   0.029526   0.237081   0.887000

Sigma =

Diagonal Matrix

   4.6780e+00            0            0            0            0            0
            0   8.9303e-01            0            0            0            0
            0            0   4.5869e-02            0            0            0
            0            0            0   7.2919e-03            0            0
            0            0            0            0   1.5466e-16            0
            0            0            0            0            0   4.5360e-17

V =

  -0.418967  -0.449082   0.726833   0.183844  -0.240614   0.053029
  -0.386832  -0.456685  -0.363745  -0.694211  -0.126874  -0.107069
  -0.369990   0.429218  -0.101603  -0.070784  -0.356422   0.732468
  -0.455375  -0.257776  -0.373669   0.487768   0.559525   0.188602
  -0.469840   0.546519   0.309430  -0.289389   0.439192  -0.328915
  -0.331389   0.201011  -0.306111   0.396987  -0.541307  -0.552684

As the output above shows, the singular values of $A$ (the highlighted diagonal entries of $\Sigma$) are placed in decreasing order along the diagonal of $\Sigma$. Compared to the first two singular values ($\Sigma_{11}$ and $\Sigma_{22}$), the other four ($\Sigma_{33}$, $\Sigma_{44}$, $\Sigma_{55}$ and $\Sigma_{66}$) are relatively small. Being so small, and since $A = U\Sigma V^T$, the effect of discarding them should still imply $A \approx U\Sigma V^T$ for this modified $\Sigma$. Let's discard them and see what happens:

octave:3> Sigma(3,3) = Sigma(4,4) = Sigma(5,5) = Sigma(6,6) = 0
Sigma =

Diagonal Matrix

   4.67800         0         0         0         0         0
         0   0.89303         0         0         0         0
         0         0   0.00000         0         0         0
         0         0         0   0.00000         0         0
         0         0         0         0   0.00000         0
         0         0         0         0         0   0.00000

Discarding $\Sigma_{33}$ means the third column of $U$ and the third row of $V^T$ (the third column of $V$) have no effect on $U\Sigma V^T$ since they contain the only entries of $U$ and $V$ respectively which are multiplied by $\Sigma_{33}$. We can then discard these entries and also discard the third row and the third column of $\Sigma$ altogether. Similarly, we can discard the fourth, fifth and sixth columns of $U$ and $V$ and also the fourth, fifth and sixth rows and columns of $\Sigma$:

octave:4> U = U(1:6,1:2)
U =

  -0.41897  -0.44908
  -0.38683  -0.45668
  -0.36999   0.42922
  -0.45538  -0.25778
  -0.46984   0.54652
  -0.33139   0.20101

octave:5> V = V(1:6,1:2)
V =

  -0.41897  -0.44908
  -0.38683  -0.45669
  -0.36999   0.42922
  -0.45537  -0.25778
  -0.46984   0.54652
  -0.33139   0.20101

octave:6> Sigma = Sigma(1:2,1:2)
Sigma =

Diagonal Matrix

   4.67800         0
         0   0.89303

Now let's compute $U\Sigma V^T$:

octave:7> A_tilde = U*Sigma*V'
A_tilde =

   1.00125   0.94131   0.55302   0.99588   0.70167   0.56888
   0.94131   0.88626   0.49448   0.92918   0.62733   0.51770
   0.55302   0.49448   0.80491   0.68936   1.02269   0.65062
   0.99588   0.92918   0.68936   1.02940   0.87507   0.65967
   0.70168   0.62733   1.02269   0.87507   1.29940   0.82647
   0.56888   0.51770   0.65062   0.65967   0.82647   0.54982

The values of $\tilde{A}$ are very close to the values of the original matrix $A$. Let's then compute the relative error for each entry of $\tilde{A}$ (below the './' operation computes the division of each entry of $(A - \tilde{A})$ by each corresponding entry of $A$):

octave:8> (A - A_tilde) ./ A
ans =

   2.4599e-02  -1.3907e-02  -6.4608e-03  -1.2934e-02   1.4859e-02  -1.8656e-02
  -1.3375e-02   4.4309e-03   3.1586e-03   9.1550e-03  -1.0756e-02   1.3384e-02
   1.5708e-02  -6.1061e-03  -1.4028e-03  -8.6418e-03   4.4995e-03  -7.4838e-03
  -1.5442e-02   1.1226e-02   3.3826e-03   4.7511e-03  -5.3225e-03   6.0837e-03
  -3.9746e-03  -6.3450e-04   2.4899e-04   3.5946e-03  -1.7525e-03   3.1091e-03
  -4.4631e-03   2.1875e-03   5.2814e-04   2.1558e-03  -1.3991e-03   2.1170e-03

As the numbers above show, the relative errors are small, so indeed $\tilde{A} \approx A$. However, storing $A$ requires storing $N_{A} = 36$ elements, but $\tilde{A}$ needs not be stored as a $6 \times 6$ matrix. Instead, we can merely store $U$, $\Sigma$ and $V$ after many of their entries are removed. This yields: $$ N_{\tilde{A}} = \textrm{size}(U) + \textrm{size}(V) + \textrm{size}(\Sigma) = 12 + 12 + 2 = 26 $$ so we can obtain $\tilde{A}$ by storing only $N_{\tilde{A}} = 26$ instead of $N_A = 36$ elements. The reason why $\textrm{size}(\Sigma)= 2$ is because $\Sigma$ is a diagonal matrix, so its off-diagonal entries need not be stored as we know they are zero. This represents a $28\%$ reduction in the required storage space. For large matrices, the method above can lead to even better compression ratios.

It is possible to use the technique above to compress images while still preserving most of their visual properties. If you want to know how, please visit this blog again in the future :-)

References

[1]James W. Demmel, Applied Numerical Linear Algebra, SIAM; 1st edition (1997)
Comments (0) Direct link

The birthday paradox


Posted by Diego Assencio on 2013.12.16 under Mathematics (Statistics and probability)

If $n$ people are randomly selected from a very large population, what is the probability that at least two of them will have the same birthday?

This problem is a very interesting one and is commonly known as the birthday paradox (the reason for the "paradox" will become clear as we solve the problem). The solution is relatively simple if we assume that the probability $P$ that a randomly selected person is born on a given day $D$ of the year is the same for every day of the year: $P(D) = 1/365$ (in other words, if we assume a uniform distribution; we will ignore years with $366$ days).

More generally, one could ask: if $n$ elements are randomly selected from a set $S$ with $N$ elements (all elements of $S$ are distinct), with the act of selecting each element being independent from all other element selections, what is the probability that the at least one element is selected two or more times?

In the original question, a birthday is an element of the set $Y = \{1,2,\ldots,365\}$ which represents all days of the year ($Y$ contains $365$ elements). Selecting a person at random is equivalent to randomly selecting an element from $Y$. In what follows, the event of selecting a given element more than once will be called a collision.

Let's solve the problem in its general formulation (not necessarily thinking of birthdays). We first randomly select an element $E_1$ from a set which contains $N$ elements. The total number of possible $E_1$ values is $N$. Then we randomly select the second element $E_2$. The probability that this second element is distinct from the first one is: $$ P^{(2)} := P(E_2 \notin \{E_1\}) = \displaystyle\frac{N-1}{N} $$ since there are $N-1$ elements which are distinct from $E_1$ (above I wrote $E_2 \notin \{E_1\}$ instead of $E_2 \neq E_1$ for reasons which will become clear below). Then we randomly select the third element $E_3$. The probability that this element is distinct from the first two selected elements is: $$ P^{(3)} := P(E_3 \notin \{E_1,E_2\}) = \displaystyle\frac{N-2}{N} $$ since there are $N-2$ elements distinct from both $E_1$ and $E_2$ if $E_1 \neq E_2$. We keep on going this way until we select the $n$-th element. Assuming the $(n-1)$ already selected elements are all distinct from each other, the probability that the $n$-th element is distinct from the $(n-1)$ previously selected elements is: $$ P^{(n)} := P(E_n \notin \{E_1,E_2,\ldots,E_{n-1}\}) = \displaystyle\frac{N-(n-1)}{N} $$ since after the first $(n-1)$ elements are selected, there are still $N - (n-1)$ elements left which were not yet selected (assuming, as mentioned above, that the $(n-1)$ elements already selected are all distinct from each other).

Since the act of randomly selecting an element is independent from all other random element selections, the probability $P^*(n)$ of selecting $n$ distinct elements is then the multiplication of the probabilities $P^{(i)}$ computed above ($i = 2,\ldots,n$). To emphasize, $P^{(i)}$ is the probability of selecting an $i$-th element which is distinct from the $(i-1)$ previously selected elements assuming they are all distinct from each other: $$ P^*(n) = P^{(2)} \times \ldots \times P^{(n)} $$ The probability $P(n)$ of at least two among the $n$ selected elements being equal is then: $$ \begin{eqnarray} P(n) &=& 1 - P^*(n) \nonumber\\[5pt] &=& 1 - P^{(2)} \times \ldots \times P^{(n)} \nonumber\\[5pt] &=& 1 - \displaystyle\left(\frac{N-1}{N}\right)\left(\frac{N-2}{N}\right) \ldots \left(\frac{N-(n-1)}{N}\right) \nonumber\\[5pt] &=& 1 - \prod_{i=1}^{n-1}\left(\frac{N-i}{N}\right) \nonumber\\[5pt] &=& 1 - \frac{\prod_{i=0}^{n-1}(N-i)}{N^n} \nonumber\\[5pt] &=& 1 - \frac{\prod_{i=0}^{N-1}(N-i)}{N^n\prod_{i=n}^{N-1}(N-i)} \label{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_deriv_pn} \end{eqnarray} $$ And therefore, noticing that the top and bottom products on the last line of equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_deriv_pn} are the definitions of $N!$ and $(N-n)!$ respectively, we get: $$ \boxed{ \displaystyle P(n) = 1 - \prod_{i=1}^{n-1}\left(\frac{N-i}{N}\right) = 1 - \frac{N!}{N^n(N - n)!} } \label{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_pn} $$ The result on the fourth line of equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_deriv_pn} has been added to the final result as it makes the computation of $P(n)$ very easy to do on a computer. To clarify, computing $P(n)$ directly using the version involving factorials will not be a trivial task since $N!$ can be an immense number (it will definitely be for $N = 365$) which most calculators/programs will not be able to deal with due to the use of finite precision arithmetic. Roughly speaking, computers and calculators usually store numbers with a fixed number of precision digits and are therefore uncapable of handling very large numbers. I will show later how one can use the product version of $P(n)$ (the middle one on equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_pn}) to easily compute $P(n)$ with octave (see the "bonus" section below).

Consider again the original problem. How many people do we need to select to have the probability of two or more of them having the same birthday be larger than or equal to $50\%$? Perhaps $365/2 \approx 182$? Or $150$? The answer is, surprisingly, $23$: $$ P(23) = 1 - \frac{365!}{(365)^{23}(365 - 23)!} \approx 0.5073 $$ In other words, with only $n = 23$ randomly selected people we can already expect two or more of them to have the same birthday with more than $50\%$ probability! If the "paradox" on "birthday paradox" was not yet clear to you, it probably is now. Figure 1 shows the probability $P(n)$ of a collision (two or more people having the same birthday) for different values of $n$ and $N = 365$.

Fig. 1: Birthday paradox: collision probability $P(n)$ versus number $n$ of elements randomly selected from of a set $Y = \{1,2,\ldots,365\}$ representing the days of the year. $Q(n) = 1 - e^{-n(n-1)/(2N)}$ is a lower bound estimate for $P(n)$.

Also interestingly, we can expect a collision with $99\%$ probability if we select as few as $n = 57$ people: $$ P(57) \approx 0.9901 $$

A lower bound estimate for $P(n)$

There is a simple lower bound estimate for the probability of a collision $P(n)$ when $n$ elements are randomly selected from a set of set $N$. From equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_pn} we get: $$ \displaystyle P(n) = 1 - \prod_{i=1}^{n-1}\left(\frac{N-i}{N}\right) = 1 - \prod_{i=1}^{n-1}\left(1 - \frac{i}{N}\right) \geq 1 - \prod_{i=1}^{n-1}e^{-i/N} $$ where above we used the fact that $1 - x \leq e^{-x}$ for all $x \geq 0$ (indeed, for $x = 0$ both sides are equal to $1$; the derivative of $(1-x)$ is $-1$ while the derivative of $e^{-x}$ is $-e^{-x} \geq -1$ for all $x \geq 0$, so $(1-x)$ decreases faster than $e^{-x}$ for $x \geq 0$). Then: $$ \displaystyle P(n) \geq 1 - e^{-\frac{1}{N}\sum_{i=1}^{n-1} i} $$ But since: $$ \sum_{i=1}^{n-1} i = \displaystyle\frac{n(n-1)}{2} $$ we finally obtain: $$ \boxed{ P(n) \geq Q(n) := 1 - e^{-n(n-1)/(2N)} } \label{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_approx_p} $$ You can compare how close $Q(n)$ is to $P(n)$ on figure 1. Equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_approx_p} is very interesting since it makes it easy to compute a value of $n$ for which we will have a collision with probability larger than or equal to $p$: $$ \begin{eqnarray} p = 1 - e^{-n(n-1)/(2N)} &\Longrightarrow& \displaystyle\frac{-n(n-1)}{2N} = \log(1 - p) \nonumber\\[5pt] &\Longrightarrow& \displaystyle\frac{n(n-1)}{2N} = \log\left(\displaystyle\frac{1}{1-p}\right) \nonumber\\[5pt] &\Longrightarrow& n^2 \geq 2N\log\left(\displaystyle\frac{1}{1-p}\right) \end{eqnarray} $$ Therefore: $$ \boxed{ n \geq \sqrt{2N\log\left(\displaystyle\frac{1}{1-p}\right)} } \label{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_n} $$ will yield a collision with probability larger than or equal to $p$. For $N = 365$ and $p = 0.5$, we get $n \approx 22.49$, so indeed for $n = 23$ we will have collisions with at least $50\%$ probability.

The birthday paradox gets even stranger if we take $N = 10^6$ (a million) and $p = 0.5$: for those parameters, $n \approx 1177.4$, so we will have a collision with more than $50\%$ probability with as few as $1180$ elements! In general, to produce a collision with $50\%$ probability for a given $N$, we can select a number of elements given by: $$ \boxed{ n \geq 1.2\sqrt{N} } \label{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_n_approx} $$ The equation above is obtained by setting $p = 0.5$ on equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_n} and rounding up the constant factor. For $N = 365$, equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_n_approx} yields $n \geq 22.92$ (no surprises here).

Equation \eqref{post_c59e4c7dd11d1d61bf902136ee9cafcb_eq_n_approx} closes our discussion of the problem. To have a collision with more than $50\%$ probability when randomly selecting elements from a set of size $N$, we must merely select around $1.2\sqrt{N}$ elements. The paradox originates from the fact that for large values of $N$, $\sqrt{N}$ is significantly smaller than $N$ itself.

Bonus: using octave to compute $P(n)$

On Ubuntu/Debian, you can install octave by opening a terminal and running the following command:

sudo apt-get install octave

Now create a file called birthday.m with the following contents (or download it directly):

# n: number of randomly selected elements
# N: number of elements on the set from which elements are selected
function p = birthday(n,N)

	# we need to choose at least two elements
	if n < 2
		p = 0;
	else
		pc = 1.0;
		for i = 1:(n-1)
			pc = pc * (N-i)/N;
		end
		p = 1 - pc;
	end

end

Now start octave:

octave

and compute $P(n)$ as in the example below (here I am assuming $n = 23$ and $N = 365$):

octave:1> birthday(23,365)
ans =  0.50730
Comments (0) Direct link

How much can $i^i$ buy?


Posted by Diego Assencio on 2013.11.12 under Mathematics (Complex numbers)

Suppose you want to buy some candy at a store. You look at your pocket and find you have (in your local currency) an amount of money equal to $i^i$ ($i$ here is really the complex number $i$). You choose a snack and check its price: its costs $0.10$. Do you have money to buy it?

If $i^i$ looks absurd to you, you might be impressed by the fact that there is a thing such as complex exponentiation. It turns out that complex exponentiation is extremely important in mathematics, in the natural sciences and in engineering. Being myself a physicist, I dealt with complex exponentiation countless times when learning topics such as waves, circuits, fluids, classical mechanics, quantum mechanics and many others.

But back to the question: how do we compute $i^i$? The answer starts with the definition of the exponential function for a pure imaginary number $i\theta$, where $\theta$ is a real number: $$ \boxed{ e^{i\theta} := \cos\theta + i\sin\theta } \label{post_827f3801cb1a0f2b336b0fc67f9e1abd_euler_eq} $$ Equation \eqref{post_827f3801cb1a0f2b336b0fc67f9e1abd_euler_eq} is called Euler's formula. If you are wondering what could motivate such a strange definition, consider the Taylor series for $e^x$, where $x$ is a real number: $$ e^x = 1 + \displaystyle\frac{x}{1!} + \displaystyle\frac{x^2}{2!} + \displaystyle\frac{x^3}{3!} + \displaystyle\frac{x^4}{4!} + \ldots = \sum_{n=0}^{\infty} \displaystyle\frac{x^n}{n!} \label{post_827f3801cb1a0f2b336b0fc67f9e1abd_exp_real_def} $$ What would happen if we lost our senses and decided to replace $x$ with $i\theta$ on equation \eqref{post_827f3801cb1a0f2b336b0fc67f9e1abd_exp_real_def}? Well, we would obtain: $$ \begin{eqnarray} e^{i\theta} & = & 1 + \displaystyle\frac{(i\theta)}{1!} + \displaystyle\frac{(i\theta)^2}{2!} + \displaystyle\frac{(i\theta)^3}{3!} + \displaystyle\frac{(i\theta)^4}{4!} + \displaystyle\frac{(i\theta)^5}{5!} + \ldots \nonumber\\[5pt] & = & 1 + i\theta - \displaystyle\frac{\theta^2}{2!} - i\displaystyle\frac{\theta^3}{3!} + \displaystyle\frac{\theta^4}{4!} + i\displaystyle\frac{\theta^5}{5!} + \ldots \nonumber\\[5pt] & = & \left(1 - \displaystyle\frac{\theta^2}{2!} + \displaystyle\frac{\theta^4}{4!} + \ldots\right) + i\left(\theta - \displaystyle\frac{\theta^3}{3!} + \displaystyle\frac{\theta^5}{5!} + \ldots\right) \nonumber\\[5pt] & = & \cos\theta + i\sin\theta \end{eqnarray} $$ where above we used the Taylor series for both $\cos\theta$ and $\sin\theta$. This is exactly what equation \eqref{post_827f3801cb1a0f2b336b0fc67f9e1abd_euler_eq} states!

In general, since any complex number $z$ is such that $z = a + ib$ with both $a$ and $b$ being real numbers, then we can compute $e^{a+ib}$ through: $$ \boxed{ e^z = e^{a+ib} := e^ae^{ib} = e^a(\cos{b} + i\sin{b}) } $$

Now we can define the logarithm of a complex number. Since every nonzero complex number $z$ can be uniquely written in the form $z = |z|(\cos\theta + i\sin\theta)$ for some $\theta$ such that $0 \leq \theta \lt 2\pi$, then we can write: $$ z = |z|(\cos\theta + i\sin\theta) = |z|e^{i\theta} = e^{\log|z|}e^{i\theta} = e^{\log|z| + i\theta} $$ This leads us to define the logarithm of a complex number $z \neq 0$ as: $$ \boxed{ \log{z} = \log(e^{\log|z| + i\theta}) := \log|z| + i\theta } $$ The value of $\log{z}$ is uniquely defined provided that we enforce $0 \leq \theta \lt 2\pi$. This restriction is important since on $z = |z|(\cos\theta + i\sin\theta)$ there are infinitely many possible values of $\theta$ values as $$ \cos(\theta + 2\pi n) = \cos\theta,\quad \sin(\theta + 2\pi n) = \sin\theta $$ for any integer $n$. As an example, let's compute $\log i$. Since $$ i = 0 + i = \cos(\pi/2) + i\sin(\pi/2) $$ then: $$ \log i = \log|i| + i\pi/2 = \log 1 + i\pi/2 = i\pi/2 \label{post_827f3801cb1a0f2b336b0fc67f9e1abd_log_i} $$

The last ingredient we need is how to compute $z^w$ for two complex numbers $z$ and $w$ with $z \neq 0$. This can be done according to the following definition: $$ \boxed{ z^w := e^{w\log z} } $$ which is motivated by the fact that for real numbers $x \gt 0$ and $y \neq 0$ we have $x^y = e^{\log x^y} = e^{y\log x}$.

If you have endured all of this, you are probably eager to know what the value of $i^i$ is. So let's compute it: $$ \boxed{ i^i = e^{i\log i} = e^{i(i\pi/2)} = e^{-\pi/2} \approx 0.2078 } $$ where equation \eqref{post_827f3801cb1a0f2b336b0fc67f9e1abd_log_i} was used. Interestingly, $i^i$ is actually a real number!

Well, there you have it. You can now go ahead and buy your candy. You deserve it! :-)

Comments (1) Direct link