## The set of languages over {0,1} is not countable

Posted by Diego Assencio on 2015.12.01 under Computer science (Automata theory)

In this post, we will prove that the set $\mathcal{L}_{\{0,1\}}$ of all languages over the set $\{0,1\}$ is not countable, i.e., we cannot enumerate the infinitely many languages in $\mathcal{L}_{\{0,1\}}$. A language over $\{0,1\}$ is a set of finite-length strings formed using only the symbols $0$ and $1$. For example, $\{0, 10, 01, 101\}$ and $\{1, 10, 100, 1000, \ldots\}$ are languages over $\{0,1\}$. Notice that while each string in a language must have finite length, the language itself may have infinitely many strings as illustrated in the second example just given.

Our proof will be based on the fact that a contradiction is obtained if $\mathcal{L}_{\{0,1\}}$ is countable. To start, notice that we can enumerate the strings in the set $W$ of all finite-length strings over $\{0,1\}$. To see that, let $W' = \{1s \mid s \in W\}$ be the set which is built from $W$ by adding a $1$ in front of each of its strings. We can interpret every string in $W'$ as the binary representation of some natural number. As an example, for a string $0110 \in W$, we have $10110 \in W'$, and $10110$ represents the natural number $22$ in the decimal base. The reason why we need to build $W'$ comes from the fact that distinct strings such as $0010$ and $10$ are both in $W$ but represent the same natural number in binary representation because they differ only by leading zeros; $W'$ does not have this issue and shows us how we can generate a one-to-one mapping from $W$ to $\mathbb{N}$: just add a $1$ in front of each string in $W$ and interpret the resulting strings as binary numbers (distinct strings in $W$ are mapped to distinct numbers in $\mathbb{N}$). Since $W$ has infinitely many strings and since a one-to-one mapping from $W$ to $\mathbb{N}$ exists, $W$ is countable. In other words, the strings in $W$ can be enumerated and we can therefore write $W = \{s_1, s_2, s_3, \ldots\}$, with $s_j$ being the $j$-th string over $\{0,1\}$.

Now assume that $\mathcal{L}_{\{0,1\}}$ is countable, i.e., that that $\mathcal{L}_{\{0,1\}} = \{L_1, L_2, L_3, \ldots\}$ with each $L_i$ being a language over $\{0,1\}$. Given that each $L_i$ is a set whose elements are strings from $W$, and since $W$ is countable, we can build a table whose row indices are language indices and whose column indices are string indices as follows: for each table cell with row index $i$ and column index $j$, write $1$ if the language $L_i$ contains the string $s_j$ or $0$ otherwise. This table completely specifies which strings $s_j \in W$ are contained in each language $L_i \in \mathcal{L}_{\{0,1\}}$. Below is an example of what such a table would look like:

$s_1$ $s_2$ $s_3$ $s_4$ $\ldots$
$L_1$ $1$ $0$ $1$ $1$ $\ldots$
$L_2$ $0$ $0$ $0$ $1$ $\ldots$
$L_3$ $1$ $1$ $0$ $0$ $\ldots$
$L_4$ $0$ $0$ $1$ $1$ $\ldots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\ddots$

Consider now the language built through the following procedure: flip the value of every diagonal cell on the table above, then collect all strings $s_j$ such that the diagonal cell on the column of $s_j$ has a $1$ after flipping; let $L_{\textrm{diag}}$ be the set of all such strings.

To clarify the way $L_{\textrm{diag}}$ is built, take a look at the table below which is built by flipping the diagonal entries of the table above:

$s_1$ $s_2$ $s_3$ $s_4$ $\ldots$
$L_1$ $0$ $0$ $1$ $1$ $\ldots$
$L_2$ $0$ $1$ $0$ $1$ $\ldots$
$L_3$ $1$ $1$ $1$ $0$ $\ldots$
$L_4$ $0$ $0$ $1$ $0$ $\ldots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\ddots$

From the procedure just described, $L = \{s_2, s_3, \ldots\}$, so $L$ does not contain $s_1$ and $s_4$ but contains $s_2$ and $s_3$.

$L$ is a language with a special property: it is different from every language $L_i \in \mathcal{L}_{\{0,1\}}$. Indeed, for every $L_i \in \mathcal{L}_{\{0,1\}}$, if $L_i$ contains $s_i$, $L$ does not, but if $L_i$ does not contain $s_i$, $L$ does. This implies $L \neq L_i$ for all $L_i \in \mathcal{L}_{\{0,1\}}$ and therefore $L \notin \mathcal{L}_{\{0,1\}}$. However, since $L$ is a set of strings which are in $W$, $L$ is a language over $\{0,1\}$ and therefore $L \in \mathcal{L}_{\{0,1\}}$, a contradiction. Hence $\mathcal{L}_{\{0,1\}}$ cannot be countable.