In this post, I will show how solving a
Sudoku puzzle
is equivalent to solving an
integer linear programming
(ILP) problem. This equivalence allows us to solve a Sudoku puzzle using any of
the many freely available ILP solvers; an implementation of a solver (in Python 3)
which follows the formulation described in this post can be found
found here.
A Sudoku puzzle is an $N \times N$ grid divided in blocks of size $m \times n$, i.e.,
each block contains $m$ rows and $n$ columns, with $N = mn$ since the number of
cells in a block is the same as the number of rows/columns on the puzzle.
The most commonly known version of Sudoku is a $9 \times 9$ grid
(i.e., $N = 9$) with $3 \times 3$ blocks (i.e., $m = n = 3$).
Initially, a cell can be either empty or contain an integer value
in the interval $[1,N]$; non-empty cells are fixed and cannot be modified
as the puzzle is solved. The rules for solving the puzzle are:
1. | each integer value $k \in [1,N]$ must appear exactly once in each row |
2. | each integer value $k \in [1,N]$ must appear exactly once in each column |
3. | each integer value $k \in [1,N]$ must appear exactly once in each block. |
Each rule above individually implies that every cell of the puzzle will have a
number assigned to it when the puzzle is solved, and that each row/column/block
of a solved the puzzle will represent some permutation of the sequence $\{1, 2, \ldots, N\}$.
The version of Sudoku outlined by these rules is the standard one. There are many variants
of Sudoku, but I hope that after reading this post, you will realize that the
these variants can also be expressed as ILP problems using the same ideas
presented here.
The rules above can be directly expressed as constraints of an ILP problem. Our
formulation will be such that the constraints will enforce everything needed to
determine a valid solution to the puzzle, and the objective function will therefore
be irrelevant since any point which satisfies the constraints will represent a
solution to the problem (notice, however, that some Sudoku puzzles may contain
more than one
solution, but I am assuming the reader will be content with finding a single one of those).
Therefore, our objective function will be simply ${\bf 0}^T{\bf x} = 0$,
where ${\bf 0}$ is a vector with all elements set to $0$ and ${\bf x}$ is a vector
representing all variables used in the ILP formulation below.
The puzzle grid can be represented as an $N \times N$ matrix, and each grid
cell can be naturally assigned a pair of indices $(i,j)$, with $i$ and $j$ representing the cell
row and column respectively (see figure 1).
The top-left grid cell has $(i,j) = (1,1)$,
and the bottom-right one has $(i,j) = (N,N)$, with $i$ increasing downwards and
$j$ increasing towards the right.
Fig. 1: |
A Sudoku puzzle with blocks of size
$m \times n = 2 \times 3$. The cell indices $(i,j)$ are
shown inside every cell, and the block indices $(I,J)$ are shown on the
left and top sides of the grid respectively. Both the height (number of rows)
and width (number of columns) of the puzzle are equal to $N = m n = 6$.
The puzzle has $n = 3$ blocks along the vertical direction and $m = 2$ blocks
along the horizontal direction.
|
Let us then define $N^3$ variables as follows: $x_{ijk}$ is an integer variable
which is restricted to be either $0$ or $1$, with $1$ meaning the
value at cell $(i,j)$ is equal to $k$, and $0$ meaning the value at
cell $(i,j)$ is not $k$. Rule (1) above, i.e.,
the requirement that each $k \in [1,N]$ must appear exactly once per row, can be
expressed as:
$$
\sum_{j=1}^N x_{ijk} = 1 \quad \textrm{ for } \quad i,k = 1,2,\ldots,N
\label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_row_constraint}
$$
In other words, for a fixed row $i$ and a fixed $k \in [1,N]$, only a single
$x_{ijk}$ will be $1$ on that row for $j = 1, 2, \ldots, N$.
If the fact that the constraints above do not have any "$\leq$" is bothering
you, remind yourself of the fact that $x = a$ can be expressed as $a \leq x \leq a$,
which in turn is equivalent to the combination of $-x \leq -a$ and $x \leq a$, i.e.,
any equality constraint can be expressed as a pair of "less-than-or-equal-to" constraints
like the ones we need in linear programming problems.
Rule (2), i.e., the requirement that each $k \in [1,N]$ must appear
exactly once per column, can be expressed as:
$$
\sum_{i=1}^N x_{ijk} = 1 \quad \textrm{ for } \quad j,k = 1,2,\ldots,N
\label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_column_constraint}
$$
Expressing rule (3), i.e., the requirement that each $k \in [1,N]$ must appear
exactly once per block, is a bit more complicated. A way to simplify this task
is by assigning pairs of indices $(I, J)$ to each block in a similar way as
we did for cells (see figure 1):
$(I,J) = (1,1)$ represents the top-left block, and
$(I, J) = (n,m)$ represents the bottom-right block, with $I$ increasing
downwards and ranging from $1$ to $n$ (there are $n$ blocks along the vertical
direction) and $J$ increasing towards the right and ranging from $1$ to $m$
(there are $m$ blocks along the horizontal direction). Block $(I,J)$ will
therefore contain cells with row indices $i = (I-1)m + 1, \ldots, Im$ and
column indices $j = (J-1)n + 1, \ldots, Jn$.
Therefore, rule (3) can be expressed as:
$$
\sum_{i=(I-1)m + 1}^{Im} \sum_{j=(J-1)n + 1}^{Jn} x_{ijk} = 1
\quad \textrm{ for }\quad
\left\{
\begin{matrix}
\; I = 1,2,\ldots,n \\[5pt]
\; J = 1,2,\ldots,m \\[5pt]
\; k = 1,2,\ldots,N
\end{matrix}
\right.
\label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_block_constraint}
$$
Notice that both equations \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_row_constraint} and \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_column_constraint}
represent $N^2$ constraints each. As it turns out, equation \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_block_constraint}
contains $nmN = N^2$ constraints as well.
So far, our formulation does not prevent $x_{ijk}$ from being equal to $1$
for two or more distinct values $k$ at the same cell $(i,j)$. We need therefore
to impose these constraints by hand:
$$
\sum_{k=1}^N x_{ijk} = 1 \quad \textrm{ for } \quad i,j = 1,2,\ldots,N
\label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_cell_constraint}
$$
Not all cells are initially empty on a Sudoku puzzle. Some cells will already
contain values at the beginning, and those values are necessary so that the solution to the puzzle
can be deduced logically (ideally, there should be a single valid solution).
Let $\mathcal{C}$ be the set of tuples $(i,j,k)$ representing
the fact that a cell $(i,j)$ contains the value $k$ at the beginning. We then have:
$$
x_{ijk} = 1 \quad \textrm{ for } (i,j,k) \in \mathcal{C}
\label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_initial_puzzle_constraint}
$$
Our last set of constraints limits the values which each variable
$x_{ijk}$ can take: it's either $0$ or $1$ (our ILP formulation then technically
defines a binary integer linear programming problem, or BILP for short):
$$
0 \leq x_{ijk} \leq 1 \quad \textrm{ for } \quad i,j,k = 1,2,\ldots,N
\label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_binary_constraint}
$$
Since most ILP solvers allow bounds on the values which each $x_{ijk}$ can take
to be set directly, this last set of constraints often does not need to be specified
in the same manner as the previous ones.
We now have a complete ILP formulation of a Sudoku puzzle: our goal is
to minimize the objective function $f(x_{111}, \ldots, x_{ijk}, \ldots x_{NNN}) = 0$
subject to all constraints specified on equations \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_row_constraint}, \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_column_constraint},
\eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_block_constraint}, \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_cell_constraint},
\eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_initial_puzzle_constraint} and \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_binary_constraint}.
After solving the ILP problem outlined above, the solution to the Sudoku puzzle
can be constructed directly by placing, at each cell $(i,j)$, the value $k$ such
that $x_{ijk} = 1$.