# Posts on Computer science

## The set of languages over {0,1} is not countable

Posted by Diego Assencio on 2015.12.01 under Computer science (Automata theory)

In this post, we will prove that the set $\mathcal{L}_{\{0,1\}}$ of all languages over the set $\{0,1\}$ is not countable, i.e., we cannot enumerate the infinitely many languages in $\mathcal{L}_{\{0,1\}}$. A language over $\{0,1\}$ is a set of finite-length strings formed using only the symbols $0$ and $1$. For example, $\{0, 10, 01, 101\}$ and $\{1, 10, 100, 1000, \ldots\}$ are languages over $\{0,1\}$. Notice that while each string in a language must have finite length, the language itself may have infinitely many strings as illustrated in the second example just given.

Our proof will be based on the fact that a contradiction is obtained if $\mathcal{L}_{\{0,1\}}$ is countable. To start, notice that we can enumerate the strings in the set $W$ of all finite-length strings over $\{0,1\}$. To see that, let $W' = \{1s \mid s \in W\}$ be the set which is built from $W$ by adding a $1$ in front of each of its strings. We can interpret every string in $W'$ as the binary representation of some natural number. As an example, for a string $0110 \in W$, we have $10110 \in W'$, and $10110$ represents the natural number $22$ in the decimal base. The reason why we need to build $W'$ comes from the fact that distinct strings such as $0010$ and $10$ are both in $W$ but represent the same natural number in binary representation because they differ only by leading zeros; $W'$ does not have this issue and shows us how we can generate a one-to-one mapping from $W$ to $\mathbb{N}$: just add a $1$ in front of each string in $W$ and interpret the resulting strings as binary numbers (distinct strings in $W$ are mapped to distinct numbers in $\mathbb{N}$). Since $W$ has infinitely many strings and since a one-to-one mapping from $W$ to $\mathbb{N}$ exists, $W$ is countable. In other words, the strings in $W$ can be enumerated and we can therefore write $W = \{s_1, s_2, s_3, \ldots\}$, with $s_j$ being the $j$-th string over $\{0,1\}$.

Now assume that $\mathcal{L}_{\{0,1\}}$ is countable, i.e., that that $\mathcal{L}_{\{0,1\}} = \{L_1, L_2, L_3, \ldots\}$ with each $L_i$ being a language over $\{0,1\}$. Given that each $L_i$ is a set whose elements are strings from $W$, and since $W$ is countable, we can build a table whose row indices are language indices and whose column indices are string indices as follows: for each table cell with row index $i$ and column index $j$, write $1$ if the language $L_i$ contains the string $s_j$ or $0$ otherwise. This table completely specifies which strings $s_j \in W$ are contained in each language $L_i \in \mathcal{L}_{\{0,1\}}$. Below is an example of what such a table would look like:

$s_1$ $s_2$ $s_3$ $s_4$ $\ldots$
$L_1$ $1$ $0$ $1$ $1$ $\ldots$
$L_2$ $0$ $0$ $0$ $1$ $\ldots$
$L_3$ $1$ $1$ $0$ $0$ $\ldots$
$L_4$ $0$ $0$ $1$ $1$ $\ldots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\ddots$

Consider now the language built through the following procedure: flip the value of every diagonal cell on the table above, then collect all strings $s_j$ such that the diagonal cell on the column of $s_j$ has a $1$ after flipping; let $L_{\textrm{diag}}$ be the set of all such strings.

To clarify the way $L_{\textrm{diag}}$ is built, take a look at the table below which is built by flipping the diagonal entries of the table above:

$s_1$ $s_2$ $s_3$ $s_4$ $\ldots$
$L_1$ $0$ $0$ $1$ $1$ $\ldots$
$L_2$ $0$ $1$ $0$ $1$ $\ldots$
$L_3$ $1$ $1$ $1$ $0$ $\ldots$
$L_4$ $0$ $0$ $1$ $0$ $\ldots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\ddots$

From the procedure just described, $L = \{s_2, s_3, \ldots\}$, so $L$ does not contain $s_1$ and $s_4$ but contains $s_2$ and $s_3$.

$L$ is a language with a special property: it is different from every language $L_i \in \mathcal{L}_{\{0,1\}}$. Indeed, for every $L_i \in \mathcal{L}_{\{0,1\}}$, if $L_i$ contains $s_i$, $L$ does not, but if $L_i$ does not contain $s_i$, $L$ does. This implies $L \neq L_i$ for all $L_i \in \mathcal{L}_{\{0,1\}}$ and therefore $L \notin \mathcal{L}_{\{0,1\}}$. However, since $L$ is a set of strings which are in $W$, $L$ is a language over $\{0,1\}$ and therefore $L \in \mathcal{L}_{\{0,1\}}$, a contradiction. Hence $\mathcal{L}_{\{0,1\}}$ cannot be countable.

## Numerically stable computation of arithmetic means

Posted by Diego Assencio on 2015.07.26 under Computer science (Numerical methods)

Consider a set of $n$ values $x_i$ for $i = 1, 2, \ldots n$. The arithmetic mean $\mu_n$ of this collection of values is defined as: $$\mu_n = \displaystyle\frac{1}{n}\sum_{i=1}^n x_i \label{post_c34d06f4f4de2375658ed41f70177d59_mean}$$ The simplicity of equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} hides an important issue: computing the sum of all values $x_i$ numerically is not a good idea since accuracy errors which are inherent in floating point arithmetic might degrade the accuracy of the computed mean value. This is due to the fact that as we sum the values $x_i$, the partial sum can become very large, so adding another value $x_i$ to it might amount to adding a small number to a large number. Given the finite precision involved in each of these additions, each computed partial sum may become less and less accurate; if this is the case, the accuracy of the computed mean value will suffer as well.

Notice, however, that equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} can be written as below: $$\mu_n = \displaystyle\frac{1}{n}\sum_{i=1}^n x_i = \frac{1}{n}\left(x_n + \sum_{i=1}^{n-1} x_i\right) = \frac{1}{n}\left(x_n + (n-1)\mu_{n-1}\right)$$ where $\mu_{n-1}$ is the arithmetic mean of the first $(n-1)$ values $x_1, x_2, \ldots, x_{n-1}$: $$\mu_{n-1} = \displaystyle\frac{1}{n-1}\sum_{i=1}^{n-1} x_i$$ Therefore, we have that: $$\boxed{ \displaystyle\mu_n = \mu_{n-1} + \frac{1}{n}(x_n - \mu_{n-1}) } \label{post_c34d06f4f4de2375658ed41f70177d59_mean_stable}$$ Equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} gives us a recursive formula for computing $\mu_n$ from the values of $\mu_{n-1}$ and $x_n$. This means we will need to compute $\mu_{n-1}$ before computing $\mu_n$. This recursive approach requires us then to compute $\mu_{n-2}$ to compute $\mu_{n-1}$, and so on. Therefore, to compute $\mu_n$, we will need to compute all of $\mu_1, \mu_2, \ldots, \mu_{n-1}$ first. This technique is a bit more expensive than directly using \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} since we have to perform more arithmetic operations to compute $\mu_n$, but the overall time complexity is still $O(n)$.

Why is the recusive formula \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} better than the sum formula \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean}? The reason is simple: the recursive formula avoids doing arithmetic operations with large and small numbers. The only issue there is that the factor $1/n$ can make the second term too small compared to the first one if $n$ is very large, but in practice this is much less of a problem than the accuracy issues discussed above.

To exemplify, suppose we throw a dice with six faces $n$ times and compute the mean value of the face which falls upwards (the dice needs not be fair). Assume that each face $k = 1, 2, \ldots, 6$ falls $n_k$ times upwards. For this particular example, the exact mean value of the top face can be computed directly: $$\mu^e_n = \displaystyle\frac{1}{n}\sum_{i=1}^n x_i = \frac{1}{n}\sum_{k=1}^6 k n_k \label{post_c34d06f4f4de2375658ed41f70177d59_dice_mean}$$ where $x_i$ is the result of the $i$-th throw. Denoting the mean values computed using equations \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} and \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} as $\mu_n^s$ and $\mu_n^r$ (for "sum" and "recursive") respectively, we can then see which one is more accurate by comparing their values with the exact value $\mu^e_n$. In what follows, we will use single-precision floating-point numbers to make the effects of finite precision arithmetic more visible, but the results shown below are also true for double-precision numbers even though larger values of $n$ may be necessary for the effects to become significant. Table 1 shows some simulation results obtained for different sets of values $(n_1, n_2, \ldots, n_6)$.

$n\;(\times 10^6)$ $(n_1, n_2, n_3, n_4, n_5, n_6)\;(\times 10^6)$ $\mu_n^s$ $\mu_n^r$ $\mu_n^e$
$6$$(1,1,1,1,1,1)$$3.4856$$3.4997$$3.5000$
$15$$(4,2,1,4,1,3)$$3.2263$$3.2376$$3.3333$
$18$$(3,3,3,3,3,3)$$3.4037$$3.5002$$3.5000$
$20$$(20,0,0,0,0,0)$$0.8389$$1.0000$$1.0000$
 Table 1: Mean values $\mu_n^s$ and $\mu_n^r$ computed using equations \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean} and \eqref{post_c34d06f4f4de2375658ed41f70177d59_mean_stable} respectively, and exact mean values $\mu_n^e$ computed using equation \eqref{post_c34d06f4f4de2375658ed41f70177d59_dice_mean}. All values of $n$ and $n_k$ for $k = 1, 2, \ldots, 6$ are shown divided by $10^6$. Notice how the values of $\mu_n^r$ are significantly better than those of $\mu_n^s$.

For completeness, here is the Python (version 3) script used for computing $\mu_n^s$, $\mu_n^r$ and $\mu_n^e$:

import random
import numpy

# dice face values
face = [1, 2, 3, 4, 5, 6]

# number of times each dice face falls upwards
n_face = [4000000, 2000000, 1000000, 4000000, 1000000, 3000000]

# simulate n dice throws (with counts for each face given by n_face)
values = []
for i in range(0, 6):
values += [face[i]] * n_face[i]
random.seed(0)
random.shuffle(values)

# mean computed using the sum formula
mean = numpy.sum(values, dtype=numpy.float32) / numpy.float32(len(values))
print("mu^s: %.4f" % mean)

# mean computed using the recursive formula
mean = numpy.float32(0.0)
n = 1
for x in values:
mean += (numpy.float32(x) - mean) / numpy.float32(n)
n += 1
print("mu^r: %.4f" % mean)

# exact mean value (up to float32 precision)
mean = numpy.float32(numpy.dot(face, n_face) / len(values))
print("mu^e: %.4f" % mean)


## Denoising data without using fitting functions

Posted by Diego Assencio on 2015.07.19 under Computer science (Numerical methods)

Suppose you are given a sequence of $n$ data points $x_i$ for $i = 1, 2, \ldots, n$ which are corrupted by noise. Suppose also that you do not have any knowledge regarding what kind of shape the original data had before noise was added to it, i.e., that you do not know which type of function (e.g. sine, linear, etc.) would model the original data well. This post describes a technique which can be used to denoise data in this type of situation. An implementation of the algorithm presented below can be found here.

Our goal here is to determine $n$ new data points $\tilde{x}_i$ which achieve the two goals below:

 1 each value $\tilde{x}_i$ should not be too far from the original value $x_i$ it approximates 2 the sequence of points $\tilde{x}_i$ should be smooth

One way to satisfy the first goal is by not allowing the Euclidean distance between the two sequences to be too large. In other words, we would like to keep $$\|{\bf \tilde{x}} - {\bf x}\|^2_2 = \sum_{i=1}^n(\tilde{x}_i - x_i)^2 \label{post_df0bdbd936cfac191141770bf91a6b6e_term1}$$ small, where ${\bf x}$ and ${\bf\tilde{x}}$ are vectors which hold the sequence of values $x_i$ and $\tilde{x}_i$ respectively for $i = 1, 2, \ldots, n$. However, if we force this term to be too small, we will end up forcing each $\tilde{x}_i$ to be too close to the value $x_i$ it approximates, but this means we are doing nothing but generating the original noisy sequence. As an extreme case, we can take ${\bf \tilde{x}} = {\bf x}$, in which case $\|{\bf \tilde{x}} - {\bf x}\|_2$ will be identically zero but nothing useful will have been achieved.

Now let us focus on the second goal. Noisy data is often characterized by large differences between adjacent values, i.e., $x_i$ and $x_{i+1}$ can be very different. When the data is smooth, $x_i \approx x_{i+1}$, so ideally our generated denoised sequence $\tilde{x}_i$ will be such that $$\sum_{i=1}^{n-1}(\tilde{x}_{i+1} - \tilde{x}_i)^2 \label{post_df0bdbd936cfac191141770bf91a6b6e_term2}$$ is also not too large. Since the values $x_i$ are usually the results of measurements which are conducted at regular time intervals (e.g. every $\Delta{t}$ seconds), we can interpret the minimization of the term above as an attempt to make the approximate derivative values of the sequence $\tilde{x}_i$ be small: $$\sum_{i=1}^{n-1}(\tilde{x}_{i+1} - \tilde{x}_i)^2 = (\Delta{t})^2\sum_{i=1}^{n-1} \left(\frac{\tilde{x}_{i+1} - \tilde{x}_i}{\Delta{t}}\right)^2$$ Notice that a perfect minimization of this sum is achieved when all $\tilde{x}_i$ are equal, i.e., $\tilde{x}_1 = \tilde{x}_2 = \ldots = \tilde{x}_n = c$, but a constant sequence is of little value to us unless the original sequence itself is approximately constant.

Collecting the facts above, we see that our desired goals are in direct conflict with each other, so our best approach to the problem is to compromise on both sides and provide a solution which achieves each goal only partially but well enough for our purposes. One good strategy is to minimize the sum of both terms in \eqref{post_df0bdbd936cfac191141770bf91a6b6e_term1} and \eqref{post_df0bdbd936cfac191141770bf91a6b6e_term2} simultaneously, i.e., minimize the following cost function: $$J_\mu({\bf x}, {\bf\tilde{x}}) = \sum_{i=1}^n(\tilde{x}_i - x_i)^2 + \mu\sum_{i=1}^{n-1}(\tilde{x}_{i+1} - \tilde{x}_i)^2 \label{post_df0bdbd936cfac191141770bf91a6b6e_cost_func}$$ The factor $\mu$ allows us to tune how much we want to denoise the original data. If $\mu = 0$, we will only minimize the first term and generate the original sequence $\tilde{x}_i = x_i$. As $\mu$ becomes larger, the second term becomes more important and, in the limit $\mu \rightarrow \infty$, minimizing $J_\mu({\bf x}, {\bf\tilde{x}})$ becomes equivalent to minimizing this second term; as we have seen above, the generated sequence becomes then a constant sequence $\tilde{x}_i = c$ for $i = 1,2,\ldots,n$.

Since we now have reduced our problem to finding the vector ${\bf\tilde{x}}$ which minimizes the value of $J_\mu({\bf x}, {\bf\tilde{x}})$ ($\mu$ is a constant here), we must now find a way to solve this minimization problem. Luckily, as we will see, this is nothing more than a linear least squares problem in disguise. Indeed, if we define: $${\bf b} = \left( \begin{matrix} {\bf x} \\ {\bf 0}_{n-1} \end{matrix} \right) \label{post_df0bdbd936cfac191141770bf91a6b6e_b}$$ where ${\bf 0}_{n-1}$ is the $(n-1)$-dimensional zero vector and also $$D_{n-1} = \left(\begin{matrix} -1 & 1 & 0 & 0 & \ldots & 0 \\ 0 & -1 & 1 & 0 & \ldots & 0 \\ 0 & 0 & -1 & 1 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & -1 & 1 \end{matrix}\right) \label{post_df0bdbd936cfac191141770bf91a6b6e_D}$$ where $D_{n-1}$ is an $(n-1) \times n$ matrix which can be interpreted as a "differentiation matrix", then we can rewrite equation \eqref{post_df0bdbd936cfac191141770bf91a6b6e_cost_func} as below: $$J_\mu({\bf x}, {\bf\tilde{x}}) = \|A{\bf\tilde{x}} - {\bf b}\|^2_2$$ where: $$A = \left(\begin{matrix} I_{n\times n} \\ \sqrt{\mu}D_{n-1}\end{matrix}\right) \label{post_df0bdbd936cfac191141770bf91a6b6e_A}$$ is a $(2n-1) \times n$ matrix, with $I_{n \times n}$ being the $n$-dimensional identity matrix. Indeed, from equations \eqref{post_df0bdbd936cfac191141770bf91a6b6e_b}, \eqref{post_df0bdbd936cfac191141770bf91a6b6e_D} and \eqref{post_df0bdbd936cfac191141770bf91a6b6e_A}, we have that: $$A{\bf\tilde{x}} - {\bf b} = \left(\begin{matrix} I_{n\times n} \bf\tilde{x} - {\bf x}\\ \sqrt{\mu}D_{n-1}\bf\tilde{x}\end{matrix}\right) = \left(\begin{matrix} \bf\tilde{x} - {\bf x}\\ \sqrt{\mu}D_{n-1}\bf\tilde{x}\end{matrix}\right) \label{post_df0bdbd936cfac191141770bf91a6b6e_Ax_tilde_b}$$ and the square of the Euclidean norm of the vector in equation \eqref{post_df0bdbd936cfac191141770bf91a6b6e_Ax_tilde_b} is equal to $J_\mu({\bf x}, {\bf\tilde{x}})$. At this stage, all we need to do is use any of the readily-available linear least squares solving algorithms to determine ${\bf\tilde{x}}$ and we are done.

Although somewhat outside the scope of this post, it is worth mentioning that finding the vector ${\bf\tilde{x}}$ which minimizes $J_\mu({\bf x}, {\bf\tilde{x}})$ is equivalent to solving the following linear system (these are called the normal equations for this linear least squares problem): $$(A^T A){\bf\tilde{x}} = A^T{\bf b} \label{post_df0bdbd936cfac191141770bf91a6b6e_linsys}$$ This linear system has a unique solution since $A^T A$ is invertible. Indeed, from the definition of $A$ in equation \eqref{post_df0bdbd936cfac191141770bf91a6b6e_A}, we must have $A{\bf y} \neq {\bf 0}$ unless ${\bf y} = {\bf 0}$, so if $A^T A{\bf y} = {\bf 0}$, we must as well have: $${\bf y}^TA^T A{\bf y} = (A{\bf y})^T(A{\bf y}) = \|A{\bf y}\|^2_2 = {\bf 0} \Longrightarrow A{\bf y} = {\bf 0}$$ and therefore ${\bf y} = {\bf 0}$, so $A^T A{\bf y} = {\bf 0}$ if and only if ${\bf y} = {\bf 0}$ (i.e., $A^TA$ is invertible).

Now we can summarize our method for producing a denoised version ${\bf\tilde x}$ of a noisy sequence ${\bf x}$:

 1 compute ${\bf b}$ and $A$ as given by equations \eqref{post_df0bdbd936cfac191141770bf91a6b6e_b} and \eqref{post_df0bdbd936cfac191141770bf91a6b6e_A} respectively 2 solve the linear system given on equation \eqref{post_df0bdbd936cfac191141770bf91a6b6e_linsys}; its solution ${\bf\tilde{x}}$ is the denoised version of the original sequence ${\bf x}$

If the final solution is either too smooth or not smooth enough, try using a different values of $\mu$: increasing $\mu$ will make the generated sequence ${\bf\tilde{x}}$ smoother, while smaller values of $\mu$ will enforce less smoothing, i.e., more confidence is put on the precision of the original data ${\bf x}$.

Figure 1 shows a typical result for the algorithm described above. Here, Gaussian noise is added to data representing a sine wave to simulate a noisy data sequence. The algorithm we developed is then used to produce a denoised version of the noisy sequence.

 Fig. 1: Gaussian noise with mean zero and standard deviation $\sigma = 0.1$ is added to a pure sine wave. The algorithm described above is then used to (partially) filter out this added noise component.

### Extra notes on the differentiation term

As mentioned above, $D_{n-1}{\bf x}$ represents a derivative which is computed at each point $x_i$ for $i = 2, 3, \ldots, n$. This is a first order accurate backward difference approximation to the derivatives computed. We can also use different formulas to approximate the derivatives. For instance, if we use the second order accurate central difference formula, we will get the following cost function: $$J^c_\mu({\bf x}, {\bf\tilde{x}}) = \sum_{i=1}^n(\tilde{x}_i - x_i)^2 + \mu\sum_{i=2}^{n-1}(\tilde{x}_{i+1} - \tilde{x}_{i-1})^2 \label{post_df0bdbd936cfac191141770bf91a6b6e_cost_func2}$$ The differentiation matrix associated with $J^c_\mu({\bf x}, {\bf\tilde{x}})$ is now an $(n-2)\times n$ matrix: $$D^c_{n-2} = \left(\begin{matrix} -1 & 0 & 1 & 0 & \ldots & \ldots & 0 \\ 0 & -1 & 0 & 1 & \ldots & \ldots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & -1 & 0 & 1 \end{matrix}\right) \label{post_df0bdbd936cfac191141770bf91a6b6e_Dc}$$ Figure 2 shows the results obtained for the same data sequence from figure 1 and the same $\mu$ value.

 Fig. 2: Results obtained when the cost function given in equation \eqref{post_df0bdbd936cfac191141770bf91a6b6e_cost_func2} is used.

The results look rather odd. Where is this oscillation on the denoised sequence coming from? Well, consider first the original matrix $D_{n-1}$. If $D_{n-1}{\bf z} = {\bf 0}$ for some $n$-dimensional vector ${\bf z}$, then by the definition of $D_{n-1}$ we must have $z_2 - z_1 = 0$, $\ldots$, $z_n - z_{n-1} = 0$, i.e., $z_1 = z_2 = \ldots = z_n$, so the only vector in the null space of $D_{n-1}$ is the constant vector (as should be the case for a good derivative operator). But $D^c_{n-2}$ suffers from a bad disease: $D^c_{n-2}{\bf z} = {\bf 0}$ implies only that $z_1 = z_3$, $z_2 = z_4$, $z_3 = z_5$, $z_4 = z_6$ and so on, meaning as long as all the odd and even components of ${\bf z}$ are set to two distinct constants $c_{\textrm{odd}}$ and $c_{\textrm{even}}$ respectively, the resulting vector will be in the null space of $D^c_{n-2}$. But this means a highly oscillatory vector such as ${\bf z} = (1,-1,1,-1,\ldots)$ is in the null space of $D^c_{n-2}$ and therefore such a vector produces no error on the second term of equation \eqref{post_df0bdbd936cfac191141770bf91a6b6e_cost_func2}. This explains why the resulting "denoised sequence" is plagued by a clearly visible oscillatory noise which is actually noticeable even for large values of $\mu$.

Instead of trying to generate a smooth sequence by limiting the values which the first derivative of the denoised sequence can have, we can instead try limiting the values of its second derivative. Using the second order accurate central difference approximation to the second derivative, we get the following cost function: $$J^{c,2}_\mu({\bf x}, {\bf\tilde{x}}) = \sum_{i=1}^n(\tilde{x}_i - x_i)^2 + \mu\sum_{i=2}^{n-1}(\tilde{x}_{i+1} - 2\tilde{x}_i + \tilde{x}_{i-1})^2 \label{post_df0bdbd936cfac191141770bf91a6b6e_cost_func3}$$ The resulting differentiation matrix has dimensions $(n-2)\times n$ and has the same form as the discrete Laplacian operator in one dimension: $$L_{n-2} = \left(\begin{matrix} 1 & -2 & 1 & 0 & \ldots & \ldots & 0 \\ 0 & 1 & -2 & 1 & \ldots & \ldots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & 1 & -2 & 1 \end{matrix}\right) \label{post_df0bdbd936cfac191141770bf91a6b6e_Lc}$$ Figure 3 shows the results obtained for the same data sequence from figure 1 and the same $\mu$ value. They are clearly better than the results show in figure 2 because the null space of $L_{n-2}$ does not contain highly oscillatory vectors (but this fact will not be proved here). Notice that since in this case we are not trying to limit variations in the denoised data (i.e., the distance between $\tilde{x}_i$ and $\tilde{x}_{i+1}$), but variations on the first derivative values instead, the resulting sequence ${\bf\tilde{x}}$ tends to be more linear than the one from figure 1.

 Fig. 3: Results obtained when the cost function given in equation \eqref{post_df0bdbd936cfac191141770bf91a6b6e_cost_func3} is used.