In this post, I will show how solving a Sudoku puzzle is equivalent to solving an integer linear programming (ILP) problem. This equivalence allows us to solve a Sudoku puzzle using any of the many freely available ILP solvers; an implementation of a solver (in Python 3) which follows the formulation described in this post can be found found here.

A Sudoku puzzle is an $N \times N$ grid divided in blocks of size $m \times n$, i.e., each block contains $m$ rows and $n$ columns, with $N = mn$ since the number of cells in a block is the same as the number of rows/columns on the puzzle. The most commonly known version of Sudoku is a $9 \times 9$ grid (i.e., $N = 9$) with $3 \times 3$ blocks (i.e., $m = n = 3$). Initially, a cell can be either empty or contain an integer value in the interval $[1,N]$; non-empty cells are fixed and cannot be modified as the puzzle is solved. The rules for solving the puzzle are:

1. | each integer value $k \in [1,N]$ must appear exactly once in each row |

2. | each integer value $k \in [1,N]$ must appear exactly once in each column |

3. | each integer value $k \in [1,N]$ must appear exactly once in each block. |

Each rule above individually implies that every cell of the puzzle will have a number assigned to it when the puzzle is solved, and that each row/column/block of a solved the puzzle will represent some permutation of the sequence $\{1, 2, \ldots, N\}$.

The version of Sudoku outlined by these rules is the standard one. There are many variants of Sudoku, but I hope that after reading this post, you will realize that the these variants can also be expressed as ILP problems using the same ideas presented here.

The rules above can be directly expressed as constraints of an ILP problem. Our formulation will be such that the constraints will enforce everything needed to determine a valid solution to the puzzle, and the objective function will therefore be irrelevant since any point which satisfies the constraints will represent a solution to the problem (notice, however, that some Sudoku puzzles may contain more than one solution, but I am assuming the reader will be content with finding a single one of those). Therefore, our objective function will be simply ${\bf 0}^T{\bf x} = 0$, where ${\bf 0}$ is a vector with all elements set to $0$ and ${\bf x}$ is a vector representing all variables used in the ILP formulation below.

The puzzle grid can be represented as an $N \times N$ matrix, and each grid cell can be naturally assigned a pair of indices $(i,j)$, with $i$ and $j$ representing the cell row and column respectively (see figure 1). The top-left grid cell has $(i,j) = (1,1)$, and the bottom-right one has $(i,j) = (N,N)$, with $i$ increasing downwards and $j$ increasing towards the right.

Fig. 1: | A Sudoku puzzle with blocks of size $m \times n = 2 \times 3$. The cell indices $(i,j)$ are shown inside every cell, and the block indices $(I,J)$ are shown on the left and top sides of the grid respectively. Both the height (number of rows) and width (number of columns) of the puzzle are equal to $N = m n = 6$. The puzzle has $n = 3$ blocks along the vertical direction and $m = 2$ blocks along the horizontal direction. |

Let us then define $N^3$ variables as follows: $x_{ijk}$ is an integer variable which is restricted to be either $0$ or $1$, with $1$ meaning the value at cell $(i,j)$ is equal to $k$, and $0$ meaning the value at cell $(i,j)$ is not $k$. Rule (1) above, i.e., the requirement that each $k \in [1,N]$ must appear exactly once per row, can be expressed as: $$ \sum_{j=1}^N x_{ijk} = 1 \quad \textrm{ for } \quad i,k = 1,2,\ldots,N \label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_row_constraint} $$ In other words, for a fixed row $i$ and a fixed $k \in [1,N]$, only a single $x_{ijk}$ will be $1$ on that row for $j = 1, 2, \ldots, N$.

If the fact that the constraints above do not have any "$\leq$" is bothering you, remind yourself of the fact that $x = a$ can be expressed as $a \leq x \leq a$, which in turn is equivalent to the combination of $-x \leq -a$ and $x \leq a$, i.e., any equality constraint can be expressed as a pair of "less-than-or-equal-to" constraints like the ones we need in linear programming problems.

Rule (2), i.e., the requirement that each $k \in [1,N]$ must appear exactly once per column, can be expressed as: $$ \sum_{i=1}^N x_{ijk} = 1 \quad \textrm{ for } \quad j,k = 1,2,\ldots,N \label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_column_constraint} $$ Expressing rule (3), i.e., the requirement that each $k \in [1,N]$ must appear exactly once per block, is a bit more complicated. A way to simplify this task is by assigning pairs of indices $(I, J)$ to each block in a similar way as we did for cells (see figure 1): $(I,J) = (1,1)$ represents the top-left block, and $(I, J) = (n,m)$ represents the bottom-right block, with $I$ increasing downwards and ranging from $1$ to $n$ (there are $n$ blocks along the vertical direction) and $J$ increasing towards the right and ranging from $1$ to $m$ (there are $m$ blocks along the horizontal direction). Block $(I,J)$ will therefore contain cells with row indices $i = (I-1)m + 1, \ldots, Im$ and column indices $j = (J-1)n + 1, \ldots, Jn$. Therefore, rule (3) can be expressed as: $$ \sum_{i=(I-1)m + 1}^{Im} \sum_{j=(J-1)n + 1}^{Jn} x_{ijk} = 1 \quad \textrm{ for }\quad \left\{ \begin{matrix} \; I = 1,2,\ldots,n \\[5pt] \; J = 1,2,\ldots,m \\[5pt] \; k = 1,2,\ldots,N \end{matrix} \right. \label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_block_constraint} $$ Notice that both equations \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_row_constraint} and \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_column_constraint} represent $N^2$ constraints each. As it turns out, equation \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_block_constraint} contains $nmN = N^2$ constraints as well.

So far, our formulation does not prevent $x_{ijk}$ from being equal to $1$ for two or more distinct values $k$ at the same cell $(i,j)$. We need therefore to impose these constraints by hand: $$ \sum_{k=1}^N x_{ijk} = 1 \quad \textrm{ for } \quad i,j = 1,2,\ldots,N \label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_cell_constraint} $$ Not all cells are initially empty on a Sudoku puzzle. Some cells will already contain values at the beginning, and those values are necessary so that the solution to the puzzle can be deduced logically (ideally, there should be a single valid solution). Let $\mathcal{C}$ be the set of tuples $(i,j,k)$ representing the fact that a cell $(i,j)$ contains the value $k$ at the beginning. We then have: $$ x_{ijk} = 1 \quad \textrm{ for } (i,j,k) \in \mathcal{C} \label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_initial_puzzle_constraint} $$ Our last set of constraints limits the values which each variable $x_{ijk}$ can take: it's either $0$ or $1$ (our ILP formulation then technically defines a binary integer linear programming problem, or BILP for short): $$ 0 \leq x_{ijk} \leq 1 \quad \textrm{ for } \quad i,j,k = 1,2,\ldots,N \label{post_25ea1e49ca59de51b4ef6885dcc3ee3b_binary_constraint} $$ Since most ILP solvers allow bounds on the values which each $x_{ijk}$ can take to be set directly, this last set of constraints often does not need to be specified in the same manner as the previous ones.

We now have a complete ILP formulation of a Sudoku puzzle: our goal is to minimize the objective function $f(x_{111}, \ldots, x_{ijk}, \ldots x_{NNN}) = 0$ subject to all constraints specified on equations \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_row_constraint}, \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_column_constraint}, \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_block_constraint}, \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_cell_constraint}, \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_initial_puzzle_constraint} and \eqref{post_25ea1e49ca59de51b4ef6885dcc3ee3b_binary_constraint}.

After solving the ILP problem outlined above, the solution to the Sudoku puzzle can be constructed directly by placing, at each cell $(i,j)$, the value $k$ such that $x_{ijk} = 1$.

## Comments

Also, LPs don't have to be formulated as inequality problems. Any decent solver will handle equalities (and two-sided inequalities) efficiently.