# Sudoku Matroids and Graph Colouring Verification

As this is my first post, I feel obliged to say the obligatory Hello World!. With that aside, the problem I am going to discuss comes from the popular game Sudoku. Suppose that we are given a filled Sudoku $S$ (which we cannot see), and we wish to verify its correctness. To do so, we are granted access to an oracle which can tell us if $S$ is consistent on any row, column or block.  Consistent simply means that each number from 1 to 9 appears exactly once.

Question 1.  How many checks are necessary to verify that $S$ is correct?

Certainly, 27 checks is sufficient, but can we do better? Here’s a short proof that we can in fact do better.

Proposition 2.  Every Sudoku can be verified with at most 21 checks.

Proof.  It will be convenient to fix some notation. Let

$$K:= \{r_1, \dots, r_9\} \cup \{c_1, \dots, c_9\} \cup \{b_1, \dots, b_9\}$$

be the set of rows, columns and blocks of a Sudoku. By convention, the blocks are labelled as you (an English reader) would read the words along a page. Now suppose that a Sudoku $S$ is consistent on $b_1, b_2, b_3, r_1$ and $r_2$. We claim that this implies $S$ is also consistent on $r_3$.

Suppose not. Then some number, say 1, appears at least twice in $r_3$. However, since $S$ is consistent on $r_1$ and $r_2$, 1 appears exactly once in each of $r_1$ and $r_2$. Thus, 1 appears at least 4 times in $b_1 \cup b_2 \cup b_3$.  By the Pigeonhole Principle, 1 appears at least twice in $b_1, b_2$ or $b_3$, which is a contradiction. Thus, to verify $S$ it suffices to check $K \setminus (\{r_3, r_6, r_9\} \cup \{c_3, c_6, c_9\})$. $\square$

Can we do better than 21 checks? At first this seems rather tricky.  One can use information theory considerations to prove lower bounds.  For example, clearly at least 9 checks are necessary, since otherwise there will be a cell $x$ for which the row, column and box containing $x$ are all unchecked.  However, it seems difficult to get to a lower bound of 21 in this way. See this MathOverflow question for someone trying to carry this out.

At this point though, our hero Matroid Theory comes to the rescue. That is, somewhat remarkably, there is a matroid structure underlying this problem, which I will now describe. Define a subset $V$ of $K$ to be a verifier if every Sudoku which is consistent on $V$ is consistent everywhere.

Theorem 3.  The set of minimal (under inclusion) verifiers $\mathcal{V}$ is the set of bases of a matroid on $K$.

There is a nice proof of this fact by Emil Jeřábek as an answer to the same MathOverflow question above.  It turns out that this matroid is actually representable (over $\mathbb{Q}$) and that the proof is also valid for the more general $n \times n$ versions of Sudoku. We will not say anything more about the proof due to space limitations.  However, using the fact that $M=(K, \mathcal{V})$ is a matroid, it is now easy to show that 21 is in fact the answer to Question 1.

Proposition 4.  The minimum number of checks needed to verify the correctness of a Sudoku is 21.

Proof.  Translated into matroid theory language, the original question is simply asking what the rank of $M$ is. We have already shown that $B:=K \setminus (\{r_3, r_6, r_9\} \cup \{c_3, c_6, c_9\})$ is a verifier, so it suffices to show that it is a minimal verifier. Certainly, we cannot remove a block from $B$ and remain verifying, since there would then be a completely unchecked cell. On the other hand, for any two rows (or columns) $a$ and $b$ in the same band, it is easy to construct a Sudoku which is consistent everywhere except $a$ and $b$. Thus, we also cannot remove a row or column from $B$ and remain verifying. $\square$

Of course, there is nothing special about the Sudoku graph, and we can attempt to play this game on an arbitrary graph. Let $G$ be a graph and suppose that we wish to verify the correctness of a (not necessarily proper) $\chi(G)$-colouring $C$ of $G$. Again we are given access to an oracle which can tell us whether $C$ is consistent on any maximal clique of $G$. As before, we can define the set family $(K(G), \mathcal{V}(G))$, where $K(G)$ is the set of maximal cliques of $G$ and $\mathcal{V}(G)$ is the family of minimal verifiers.

We now arrive at our main problem.

Problem 5.  Characterize the graphs $G$ for which $(K(G), \mathcal{V}(G))$ is a matroid.

We now show that there are many such graphs.

Proposition 6.  For all bipartite graphs $G$, $(K(G), \mathcal{V}(G))$ is a matroid.

Proof.  We may assume that $G$ is connected. Since $G$ is bipartite, its set of maximal cliques is simply its set of edges. Let $T \subseteq E(G)$ be a minimal verifier. Evidently, $T$ must span all the vertices of $G$. On the other hand, if a 2-colouring of $G$ is correct on a spanning tree, then it is correct everywhere. Thus, $T$ is a spanning tree, and so $(K(G), \mathcal{V}(G))$ is actually the graphic matroid of $G$. $\square$

It would be nice if there were a unified proof that worked for both the bipartite case and the Sudoku graphs. Such a proof would have to utilize some common structure of these two classes, since it is not true that for all graphs $G$, $(K(G), \mathcal{V}(G))$ is a matroid.

Proposition 7.  If $G=K_{2,2,2}$, then $(K(G), \mathcal{V}(G))$ is not a matroid.

Proof.  It will be convenient to regard $K_{2,2,2}$ as the octahedron $O$. Let $1234$ be the middle 4-cycle of $O$ and $t$ and $b$ be the top and bottom vertices.  Now, it is easy to see that $B_1:=\{t12, t34, b23\}$ and $B_2:=\{t23, t14, b34\}$ are minimal verifiers. On the other hand, one easily checks that basis exchange fails for $B_1$ and $B_2$. $\square$

There are also many interesting questions that one can ask about the framework we have set up for graph colouring verification.  For example, we can define the verification number of a graph $G$ to be the minimum size of a verifier (this is the the same thing as the size of a minimal verifier if $(K(G), \mathcal{V}(G))$ is a matroid).

Question 8.  Among all $n$-vertex graphs, which ones have the largest verification number?

I was not able to find much in the literature, so any comments or references would be appreciated. Thanks for reading and see you in $3 + \epsilon$ months!

Acknowledgements.  Parts of this post are based on contributions from François Brunault, Emil Jeřábek and Zack Wolske on MathOverflow and from Ross Kang, Rohan Kapadia, Peter Nelson and Irene Pivotto at Winberie’s in Princeton.