The geometry of cocircuits and circuits. A tombeau for Henry Crapo 1933 — 2019

A guest post by Joseph Kung.


The road less traveled

Henry Howland Crapo died on September 3, 2019, at the age of 86. Henry’s mathematical work has had significant influence in at least three areas: his early results were landmarks in matroid theory, he was one of the creators of structural topology, and he was one of the first to apply exterior or Cayley algebra methods to matroid theory, computational geometry and automatic theorem proving.

Henry’s life did not follow the usual path of an academic mathematician in the second half of the twentieth century. After a short more-or-less conventional academic career, ending as professor at the University of Waterloo, he chose to resign and pursue a peripatetic career in France, taking inappropriately junior positions. This is probably not because he needed a job, as he had private means, but to gain residency in France. He achieved this, eventually retiring to La Vacquerie (Hérault), a small village in the mountains near Montpellier. His beautifully restored house, on the Grand’Rue, was the venue for several private conferences — the Moutons Matheux — first in structural topology and then in matroid theory. Henry was a contrarian, not only in mathematics, but in life. Henry’s politics were to the left (in the French sense), and they were never dogmatic, but infused with a deep humanity. He was fond of conspiracy theories, but as intellectual exercises, so that it was possible to discuss issues like the Kennedy assassination with him, reasonably and humorously. He was constantly exploring the possibilities and improbabilities of the human mind, individual and collective; for example, as the guests and performers of a musical evening at a Moutons Matheux, one found a violist da gamba playing J.S. Bach and Tobias Hume, and a Dervish musician singing traditional melodies, accompanying himself on the Saz.

On a more personal note, my introduction to matroids is through Henry’s book with Rota “On the Foundations of Combinatorial Theory: Combinatorial Geometries”, the flimsy preliminary edition which, to be honest, taught me a minimum about matroids in the technical sense, but a maximum about their deep structures. I also “lifted” much of the chapter on strong maps in the book “Theory of Matroids” from his Bowdoin lecture notes. I should also apologize to Henry for not using the ineffable effable effanineffable name “combinatorial geometry” instead of “matroid”, but the latter name has stuck, and there is no doing anything about it.

I should also mention two expository or survey papers Henry wrote. The first is “Structural topology, or the fine art of rediscovery” (MR1488863) and the second ``Ten abandoned gold mines” (MR1854472). These papers are well-worth reading.

The geometry of circuits

I had intended to write in some detail about Henry’s work, but cold reflection soon indicates that this is a task which will almost certainly exceed my own mortality. One way to remember Henry is to write about one of his idées fixes, the problem of defining a matroid on the set of circuits or the set of cocircuits of a matroid. The program of finding (all) the dependencies on the circuits of a matroid was stated by Gian-Carlo Rota, in the 1960’s, as a part of his vision that matroid theory is an “arithmetical” or discrete counterpart of classical invariant theory. Thus, this program asks an analogue of the algebraic geometry question of “finding the syzygies among the syzygies”. This is an easy task when the matroid comes with a representation over a field, but I believe, impossible in general. Henry’s favorite approach was to build a free exterior algebra with “formal” coordinates from the matroid. One can then perform algebraic or symbolic calculations with formal coordinates as if the matroid were represented in a vector space. The ultimate version of this is the Whitney algebra, which Henry developed in joint work with Bill Schmitt. See the paper “The Whitney algebra of a matroid” (MR1779781). However, I think that there are intrinsic difficulties in this approach because the combinatorial interpretation of one algebraic identity is sufficiently ambiguous that when repeated, as is necessary when an algebraic derivation is performed, the interpretation usually becomes meaningless. However, there are exceptions — the Rota basis conjecture, for example — which show that this approach could be fruitful. Henry wrote an introductory article “The geometry of circuits” (unpublished) about this approach.

The geometry of cocircuits

In the remainder of this note, I will describe what seems to be practical from Henry’s ideas. I begin by describing a way of building a matroid of cocircuits from a given representation of a matroid. Before starting, I should say all this is elementary linear algebra. Some would say, and they would be right, that it is a special case of the theory of chain groups of Tutte.

Recall that a cocircuit is the complement of a copoint. (A copoint is also known as a hyperplane.) Let $M^{(0)}$ be a matroid of rank $r$ with a specific representation as a multiset $S$ of vectors in $\mathbb{K}^r$, where $\mathbb{K}$ is a field. For each cocircuit $D$, the copoint $S \backslash D$ spans a unique (linear) hyperplane in $\mathbb{K}^r$ and hence, each cocircuit $D$ defines a linear functional $\alpha_D$, unique up to a non-zero multiplicative factor, whose value on a vector $x$ we denote by
\[
\langle \alpha_D | x \rangle.
\]
For each vector $x$ in $S$, $\langle \alpha_D | x \rangle \neq 0$ if and only if $x$ is in the cocircuit $D$. It is easy to check that the cocircuit matrix, with columns indexed by $S$ and rows by the set $\mathcal{D}$ of cocircuits, with the $D,x$-entry equal to $\langle \alpha_D | x \rangle$, is a representation of $M^{(0)}$. Transposing the matrix, one obtains a matroid $M^{(1)}$ on the set $\mathcal{D}$ of cocircuits, This matroid is the cocircuit matroid of $M^{(0)}$. It has the same rank $r$ as $M^{(0)}$. Among the copoints of $M^{(1)}$ are the sets $E_a,$ where $a$ is a vector in $S$ and
\[
E_a = \{D: a \in D\},
\]
the set of all copoints containing the vector $a$. If $M^{(0)}$ is simple, the map $a \mapsto E_a$ embeds $S$ injectively into $\mathcal{D}$.

Deleting rows from the transpose of the cocircuit matrix so that there remains a matrix having $r$ rows and rank $r$, one obtains a representation of $M^{(1)}$ on which one can repeat the construction. Iterating, one constructs a sequence — I will call it the Crapo sequence
\[
M^{(0)}, M^{(1)}, M^{(2)}, M^{(3)}, \,\,\ldots
\]
with $M^{(i)}$ embedded naturally in $M^{(i+2)},$ so that there are two non-decreasing sequences of matroids, all having the same rank $r$ as the original matroid $M^{(0)}$, one starting with the matroid and the other starting with the cocircuit matroid.

It is not easy to describe the Crapo sequence given $M^{(0)}$, but an example might clarify the situation. Let $M^{(0)} = M(K_4)$, the cycle matroid of the complete graph on $4$ vertices, or as Henry would have preferred, the complete quadrilateral, with the specific representation $e_1,e_2,e_3,e_1-e_2, e_1-e_3,e_2 – e_3$. The matroid $M(K_4)$ has $7$ copoints (four $3$-point lines and three $2$-point lines), so if one takes the representation over $\mathrm{GF}(2),$ then $M^{(1)}$ is the Fano plane $F_7$ and the Crapo sequence stabilizes, with $M^{(i)}= F_7$ for $i \geq 2$. On the other hand, if one takes the representation over $\mathrm{GF}(3)$, then $M^{(1)}$ is the non-Fano configuration $F_7^-$. The non-Fano configuration has $9$ copoints ($6$ are $3$-point lines and $3$ are $2$-point lines) with $4$ points on (exactly) $3$ lines, and $3$ points on $4$-lines. From this, one sees that $M^{(2)}$ has nine points, three $4$-point lines, four $3$-point lines, and perhaps more lines. One concludes that $M^{(2)}$ is the rank-$3$ Dowling matroid $Q_3$ on the group $\{+1,-1\}$. The matroid $Q_3$ has six additional $2$-point lines, making a total of thirteen lines. Thus, $M^{(3)}$ is the ternary projective plane $\mathrm{PG}(2,3)$. The Crapo sequence now stabilizes and $M^{(i)} = \mathrm{PG}(2,3)$ forever after. Over $\mathrm{GF}(5)$ and larger fields, the situation is much more complicated and I do not know simple descriptions, although as will be shown shortly, the Crapo sequence of $M(K_4)$ eventually stabilizes over any finite field. I might note that since $M(K_n)$ has $2^{n-1}-1$ copoints, its Crapo sequence stabilizes at $i=1$ over $\mathrm{GF}(2)$.

I should make two remarks at this point. The first is that the construction of $M^{(1)}$ from $M^{(0)}$ may change if one chose inequivalent representations. An example is $U_{6,3}$ and its two inequivalent representations. The second is that it is possible to define a looser structure, that of $2$- or line-closure, on the set of cocircuits; the closed sets in this structure are the linear subclasses of Tutte. Being the closed sets of a closure, the linear subclasses form a lattice, with the points (that is, the elements covering the minimum, in this case the empty set) being the one-element linear subclasses consisting of one copoint. Henry discussed this theory in his papers “Single-element extensions of matroids” (MR0190045) and “Erecting geometries” (MR0272655, 0277407).

It is almost true that if one takes a matroid with a specific representation over a finite field $\mathrm{GF}(q)$, then the Crapo sequence $M^{(i)}$ eventually stabilizes at $\mathrm{PG}(r-1,q^{\prime})$, where $q^{\prime}$ divides $q$. It is clear that if the sequence reaches a projective geometry, then the sequence stabilizes. For a sequence to stabilize at $M^{(i)}$, the number of points and the number of copoints must be the same, and a theorem (discovered many times) says that $M^{(i)}$ must be modular, and by a theorem of G. Birkhoff, a direct sum of projective geometries. Thus, the assertion is true except when $M^{(0)} = U_{r,r}$. In particular, with one exception, if one starts with a representation of $M^{(0)}$ over a finite field $\mathbb{K}$ and $r \geq 3,$ then from the projective geometry in the Crapo sequence, one can recover a subfield of $\mathbb{K}$. When one takes a representation over a field of characteristic $0$ (and $M^{(0)} \neq U_{r,r}$), then the Crapo sequence cannot stabilize at a finite index $i$, but one should be able to take an infinite limit to get a geometric structure in which points are in bijection with copoints.

The matroid of circuits of a matroid $M$ with a representation as a multiset of vectors can be defined as the cocircuit matroid of the dual $M^{\perp}$ with an orthogonal dual of the given representation. If the original matroid has size $n$ and rank $r$, the circuit matroid has rank $n-r$ and iterating this construction usually gives a sequence which blows up. See the papers of Longyear (MR566870) and Oxley and Wang (MR4014616). There are specific problems about circuit matroids which are interesting. For example, one can ask whether there is a matroid of circuits of a transversal matroid which is a gammoid. (The orthogonal dual of a transversal matroid is a gammoid, and the transpose of a circuit matrix gives a representation of the dual, so the answer is probably affirmative.) Or, one can ask for a construction (which has to be combinatorial) of a circuit matroid of a paving matroid (if one exists).

Adjoints

The adjoint is an attempt at constructing the cocircuit matroid without using a representation. I will work with geometric lattices rather than matroids. The opposite $P^{\mathrm{opp}}$ of a partial order $P$ is the partial order obtained by reversing the order relation. An adjoint $\mathrm{Adj}(L)$ of a geometric lattice $L$ is a geometric lattice of the same rank as $L$ such that there is a one-to-one order-preserving function mapping points of $L^{\mathrm{opp}}$ (which are copoints of $L$) into the points of $\mathrm{Adj}(L)$.

Alan Cheung (MR0373976) showed that the lattice of flats of the Vamós matroid (or the “twisted cube”) does not have an adjoint. Alan proved this using the lattice of linear subclasses. The Vamós matroid is the smallest rank-$4$ paving matroid which is not representable over any field as it is a relaxation of the configuration given by the bundle theorem in projective geometry. It might be worthwhile to look at bigger rank-$4$ paving matroids: do the non-representable ones also not have adjoints? The Vamós matroid is self-dual. A natural question is whether the existence of an adjoint for the lattice of flats of a matroid implies the existence of an adjoint for the lattice of flats of its dual.

The question now arises of what happens in rank $3$. In rank $3$ or the plane, if two lines do not intersect at a point, then it is always possible to add that point of intersection. Thus adjoints exist at rank $3$. When one iterates taking adjoints, the Crapo sequence does not necessarily stabilize at a finite projective plane, but it is possible to construct the infinite limit. This limit is the free closure of D.R. Hughes; see his book “Projective Planes”, written with Piper (MR0333959). It might be of interest to extend the Hughes construction for rank $4.$

I end with matroids which can be said to be hiding in plain sight. Henry was the first to look at what are now called paving matroids. He and Rota called them Hartmanis partitions, which is correct, but hardly “catchy”. (Incidentally, Hartmanis’ papers (MR0098358, 0086045) are well worth studying.) In his work with Blackburn and Higgs (MR0419270) on cataloging all simple matroids having at most $8$ elements, a remarkably far-sighted effort for its time, it was observed that paving matroids seem to “predominate”, an observation which has now been formalized as a conjecture. However, while much work is done on asymptotic predomination, the matroids themselves have hardly been studied. One way to study paving matroids in detail (suggested by Cheung’s work), is to look at the lattice of linear subclasses, which should answer questions such as representability or the existence of adjoints.


Signed difference analysis

I’d like to discuss an application of oriented matroids to psychology that I’ve been working on with John Dunn. John is a mathematical psychologist who, along with Ralph James, started this ball rolling — I got an email from him out of the blue one day beginning “I hope you don’t think I’m a crackpot, but I think I have an application of oriented matroids”.

The type of psychological model we consider has three types of variables. Two of them are things one can measure in the lab: independent variables represent factors one can change to affect a subject’s responses, and dependent variables represent the responses. Between these, typically impossible to measure, are latent variables — theoretical constructs that can be viewed as depending on the independent variables and determining the dependent variable. For instance, a memory experiment might ask people to memorize words under different conditions (the conditions are independent variables), and then give them a test whose score is the dependent variable. The theoretical construct would describe mental processes involved in memory (the latent variables), which might be changed by changes in the independent variables, and would describe how values of the latent variables determine values of the dependent variables. We assume all latent variables and all dependent variables to be numerical.

If a theory were to propose that the functional relationship between latent and dependent variables has a particular algebraic form, then we could easily test the theory. For instance, if the relationship between the vector $\mathbf{x}$ of latent variables and $\mathbf{y}$ of dependent variables were theorized to be $\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{b}$ for some specified matrix $\mathbf{A}$ and vector $\mathbf{b}$, then the theory predicts that each $\mathbf{y}$ will lie in the affine space ${\mathrm{col}}(\mathbf{A})+\mathbf{b}$. If we expanded the theory to leave $\mathbf{b}$ unspecified, then we could still test this theory: it predicts that a set $\{\mathbf{y}_1, \ldots, \mathbf{y}_N\}$ of $N$ measurements of the vector of dependent variables should satisfy $\mathbf{y}_i-\mathbf{y}_j\in{\mathrm{col}}(\mathbf{A})$ for all $i$ and $j$.

But, if you think it seems odd to propose an algebraic relationship between numbers directly describing mental processes and numbers arising as test scores, then you are not alone.

However, in some situations a reasonable theory proposes that the vector of dependent variables $\mathbf{y}$ is related to the vector of latent variables $\mathbf{x}$ by a function \[\mathbf{y}=f(\mathbf{A}\mathbf{x})\]
where $\mathbf{A}$ is a specified matrix and $f$ is a function that the model does not specify but proposes to be componentwise monotonic. That is, $f$ is assumed to have the form $f(z_1, \ldots,z_m)=(f_1(z_1),\ldots, f_m(z_m))$, where either each $f_i$ is a strictly monotonically increasing function or each $f_i$ is a strictly monotonically decreasing function. I’ll give some examples of such theories shortly, and I’ll also expand this idea to a broader range of theories. First I’ll discuss how this form of hypothesis leads us to oriented matroids.

Signed difference analysis

For a vector $\mathbf{x}\in\mathbb{R}^m$, let $\mathrm{sign}(\mathbf{x})\in\{0,+,-\}^m$ be the componentwise sign. For a vector space $V$, let $\mathcal{V}^*(V)=\{\mathrm{sign}(\mathbf{v}): \mathbf{v}\in V\}$. Thus ${\mathcal{V}}^*(V)$ is the signed covector set of an oriented matroid realized by $V$.

Proposition 1([3])
Let $g$ be a function with codomain ${\mathbb{R}^n}$, and let $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ be a componentwise monotonic function. Then for any $\mathbf{x}$ and $\tilde{\mathbf{x}}$ in the domain of $g$,
\[\mathrm{sign}(f(g(\mathbf{x}))-f(g(\tilde{\mathbf{x}})))=\pm\mathrm{sign}(g(\mathbf{x})-g(\tilde{\mathbf{x}})).\]

This is really more of an observation than a proposition — the proof is immediate. But it tells us that for models of the form I described earlier, oriented matroids provide an appropriate test for the model.

Corollary 2 Let $\mathbf{A}$ be an $m\times n$ matrix over $\mathbb{R}$, and $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ a componentwise monotonic function. Then for any $\mathbf{x},\tilde{\mathbf{x}}\in{\mathbb{R}^m}$,
\[\mathrm{sign}(f(\mathbf{A}\mathbf{x})-f(\mathbf{A}\tilde{\mathbf{x}}))\in{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A})).\]

As a very simple example, a theory might propose a single latent variable $x$ such that one’s score at each of several tests is a monotonically increasing function of $x$. Thus one proposes that the vector $\mathbf{y}$ of test scores satisfies
\[\mathbf{y}=f\begin{pmatrix}
\begin{pmatrix}
1\\1\\
\vdots\\1
\end{pmatrix}\mathbf{x}
\end{pmatrix}\]
for some componentwise increasing $f$. Our theory and Corollary 2 says that
if $\mathbf{y}$ is the vector of test scores for Maria and $\tilde{\mathbf{y}}$ is the vector of test scores for Fred, then $\mathrm{sign}(\mathbf{y}-\tilde{\mathbf{y}})^T$ is in $\{(+,\ldots, +), (0,\ldots, 0),(-,\ldots,-)\}$.

This end result is obvious: our original theory was that the test-taker with the larger value of $x$ should do better on all of the tests. The point of bringing this example up is to note that this theory is more realistic than a theory that proposes some specific formula relating latent and dependent variables. By considering only the sign of $\mathbf{y}-\mathbf{y}’$, we discard numerical issues and consider only ordinal relationships between corresponding components of $\mathbf{y}$ and $\mathbf{y}’$.

To add some realism: Perhaps Maria has a larger value of $x$ than Fred, but not all of the tests are sensitive enough to detect her superior skill: on some tests they get the same score. Thus, perhaps $f$ is only componentwise weakly increasing. To encompass this we have the following observation. We order $\{0,+,-\}$ so that 0 is the unique minimum and $+,-$ are incomparable, and we order $\{0,+,-\}^n$ componentwise. For a subset $P$ of $\{0,+,-\}^n$, we will let $P_\downarrow$ be the order ideal of $P$ and $P^\uparrow$ be the filter of $P$. As another easy proposition we have the following.

Proposition 3 Let $g$ be a function with codomain ${\mathbb{R}^n}$, and let $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ be a componentwise weakly monotonic function. Then for any $\mathbf{x}$ and $\tilde{\mathbf{x}}$ in the domain of $g$,
\[\mathrm{sign}(f(g(\mathbf{x}))-f(g(\tilde{\mathbf{x}})))\leq\mathrm{sign}(g(\mathbf{x})-g(\tilde{\mathbf{x}})).\]

Corollary 4 Let $\mathbf{A}$ be an $m\times n$ matrix over ${\mathbb{R}}$, and $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ a componentwise weakly monotonic function. Then for any $\mathbf{x},\tilde{\mathbf{x}}\in{\mathbb{R}^m}$,
\[\mathrm{sign}(f(\mathbf{A}\mathbf{x})-f(\mathbf{A}\tilde{\mathbf{x}}))\in{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow.\]

Thus, if we have a theory that proposes that dependent variables $\mathbf{y}$ and latent variables $\mathbf{x}$ are related by $\mathbf{y}=f(\mathbf{A}(\mathbf{x}))$ for some matrix $\mathbf{A}$ and componentwise weakly increasing $f$, and we have data $\{\mathbf{y}_1, \ldots, \mathbf{y}_N\}$, where each $\mathbf{y}_i$ is the vector of dependent variables measured in one trial, then to test our theory we do the following:

  1. Find all signed difference vectors $\mathrm{sign}(\mathbf{y}_i-\mathbf{y}_j)$.
  2.  For each signed difference vector $S$, see if there is a $X\in{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))$ such that $S$ and $X$ coincide on the support of $S$. Any signed difference vector for which there is no such $X$ is a violation of the model.

We can carry this out more efficiently (at least, when $|{\mathcal{C}}({\mathrm{col}}(\mathbf{A}))|<|{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))|$) using the following result.
Proposition 5 ([2]) For any oriented matroid on elements $E$ with signed covector set ${\mathcal{V}}^*$ and signed circuit set ${\mathcal{C}}$, we have that $\{{\mathcal{V}}^*_\downarrow,{\mathcal{C}}^\uparrow\}$ is a partition of $\{0,+,-\}^E$.

Thus Step 2 above can be replaced with

2. For each signed difference vector $S$, see if there is a $Z\in{\mathcal{C}}({\mathrm{col}}(\mathbf{A}))$ such that $S$ and $Z$ coincide on the support of $Z$. Any signed difference vector for which such a $Z$ exists is a violation of the model.

Example: the Remember-Know problem

Think about the last time you saw someone and thought, “I know I’ve seen that person before, but I can’t remember where”. Some theories hold that remembering — recalling details such as “Oh yeah, I see that person working at the grocery store sometimes” — is an essentially different mental process from knowing — having a sense of familiarity without any context around it.

Another theory proposes that the only difference between remembering and knowing is the strength of the recall. More specifically, a signal detection model proposes that each of the above questions elicits in your mind a response (a signal) of some strength, and that the strength of this signal determines whether you “remember” the answer, merely “know” the answer, or don’t know the answer.
A testable instantiation of this ([2]) is as follows. On Day 1, a subject is given a list of words. On Day 2, the subject is given a new list and is asked, for each word on the new list, whether they “remember” seeing it on the old list, whether they “know” it was there despite not “remembering”, or whether they don’t recall that it was on the list.

A signal detection model theorizes that each word on the new list elicits a signal, so that

  •  if the word was on the old list, there is a probability distribution $p(s)$ for the strength of this signal, and
  • if the word was not on the old list, the probability distribution for the strength of this signal is $p(s+d)$ for some $d$.

The model then says that there are numbers $a<b$ such that the subject will “remember” the word from the list if the strength of the signal is greater than $b$, will merely “know” the word was on the list if the strength was in the interval $[a,b]$, and will not recall the word if the strength is less than $a$.

Often the function $p(x)$ is assumed to be a normal distribution. But there’s frequently little theoretical basis for this assumption, and the use of signed difference analysis allows us to avoid any such assumption. We have latent variables $a$, $b$, and $d$, and we have that

  •  the probability ${\mathrm{Prob}}(\mbox{rem:true})$ of “remembering” a word that was on the old list is $\int_b^\infty p(t) dt=1-P(b)$. Here $P$ is the cumulative probability distribution for $p$,
  •  the probability ${\mathrm{Prob}}(\mbox{know:true})$ of either “remembering” or “knowing” a word that was on the old list is $\int_a^\infty p(t) dt=1-P(a)$,
  •  the probability ${\mathrm{Prob}}(\mbox{rem:false})$ of “remembering” a word that was not on the old list is $\int_b^\infty p(t+d) dt=1-P(b+d)$, and
  •  the probability ${\mathrm{Prob}}(\mbox{know:false})$ of either “remembering” or “knowing” a word that was not on the old list is $\int_a^\infty p(t+d) dt=1-P(a+d)$.

Let $f(x)=1-P(x)$, a monotonically decreasing function, and let $F:{\mathbb{R}}^4\to R^4$ act by $f$ in each component. Thus our model proposes that
\[\begin{pmatrix}
{\mathrm{Prob}}(\mbox{rem:true})\\
{\mathrm{Prob}}(\mbox{know:true})\\
{\mathrm{Prob}}(\mbox{rem:false})\\
{\mathrm{Prob}}(\mbox{know:false})
\end{pmatrix}
=F\left(\begin{pmatrix}
0&1&0\\
1&0&0\\
0&1&1\\
1&0&1
\end{pmatrix}
\begin{pmatrix}
a\\b\\d
\end{pmatrix}\right).
\]
So, if we conduct this experiment under different conditions (i.e., with different values for the independent variables) and record our data in terms of the four variables on the left-hand side of this equation, the model predicts that the signed differences of data points will be in ${\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow$, where $A$ is the matrix on the right-hand side of the above equation.

This is a pretty low bar: the only two elements of $\{0,+,-\}^4$ which are not in ${\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow$ are $\pm (+,-,-,+)$. But that is part of the point. Previous work had dismissed signal-detection models for the Remember-Know problem based on results of experiments similar to type I’ve described, but their analyses assumed a particular form for the probability distribution $p(x)$, and hence assumed data points must lie on a curve of a certain form. Signed difference analysis suggests that without making such an assumption, a signal-detection model for this problem is very difficult to falsify.

Nonlinear structure in SDA

We can broaden signed difference analysis further by considering models of the form $\mathbf{y}=f(g(\mathbf{x}))$, where $\mathbf{y}$ is the vector of dependent variables, $\mathbf{x}$ is the vector of latent variables, $f$ is an unspecified componentwise monotonic function, and $g$ is close enough to some affine function $\mathbf{x}\to\mathbf{A}\mathbf{x}+\mathbf{b}$ that, for all $\mathbf{x}$ and $\tilde{\mathbf{x}}$,
\[\mathrm{sign}(g(\mathbf{x})-g(\tilde{\mathbf{x}}))\in{\mathcal{V}}^*({\mathrm{col}}(A)).\]
Proposition 1 then promises us that for all $\mathbf{x}$ and $\tilde{\mathbf{x}}$,
\[\mathrm{sign}(f(g(\mathbf{x}))-f(g(\tilde{\mathbf{x}})))\in{\mathcal{V}}^*({\mathrm{col}}(A)).\]

For instance, if $g:{\mathbb{R}^m}\to{\mathbb{R}^n}$ is differentiable and the column space $V$ of the Jacobian $Dg$ is constant, then for every $\mathbf{x},\tilde{\mathbf{x}}\in{\mathbb{R}^m}$ and differentiable path $\lambda:[0,1]\to{\mathbb{R}^m}$ from $\mathbf{x}$ to $\tilde{\mathbf{x}}$ we have
\[ g(\tilde{\mathbf{x}})-g(\mathbf{x})=\int_0^1 Dg(\lambda(t))\lambda'(t) dt\in V.\]

For instance, in ([2]) we consider a signal-detection model for inductive and deductive reasoning that predicts data of the form
$\mathbf{y}=f(d,-d,(d-c)/s, -d+c)$, where $d$, $c$, and $s$ are latent variables and $f$ is componentwise monotonic. The Jacobian of the function $g(d,c,s)=(d,-d,(d-c)/s, -d+c)$ is
\[Dg=\begin{pmatrix}
1&0&0\\
-1&0&0\\
1/s&-1/s&{c-d}/s\\
-1&1&0
\end{pmatrix}\]
which has the same column space as
\[\mathbf{A}=\begin{pmatrix}
1&0&0\\
-1&0&0\\
0&0&1\\
0&1&0
\end{pmatrix}\]
and thus for every $(d,c,s)$ and $(\tilde d,\tilde c,\tilde s)$ and every componentwise monotonic $f$, the model proposes that the signed difference $f(g(d,c,s))-f(g(\tilde d,\tilde c,\tilde s))$ is in
${\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow$.

As another example, one model for reading ([1]) proposes that in reading single words one uses one or both of two skills. To oversimplify a bit, either one sounds out a word phonetically or one compares the word to known words. The model was proposed, in part, to account for cases of acquired dyslexia affecting either ability to read phonetically regular words but not phonetically irregular words or ability to read known words but not phonetically regular unknown words. Let $a$ denote the probability that a particular reader is will be successful at phonetically reading a phonetically regular word, and let $b$ be the probability that this reader will successfully read a word based on comparison to known words. These are the latent variables. Let $\mathbf{y}=(y_1, y_2, y_3)^T$ be the reader’s scores on three tests: $y_1$ is the score on a test of phonetically regular nonsense words, $y_2$ is the score on a test of phonetically irregular common words, and $y_3$ is the score on a test of words that are both phonetically regular and common. The model proposes that $\mathbf{y}=f(a,b,a+b-ab)$, where $f$ is componentwise weakly increasing. The Jacobian of $g(a,b)=(a,b,a+b-ab)$ is
\[Dg=\begin{pmatrix}
1&0\\
0&1\\
1-b&1-a
\end{pmatrix}\]
which no longer has constant column space. However, we can still see that, for all $(a,b)$ and $(\tilde a,\tilde b)\in [0,1]^2$, the sign difference $\mathrm{sign}(g(a,b)-g(\tilde a, \tilde b))$ is in
\[{\mathcal{V}}^*\left({\mathrm{col}}\begin{pmatrix}
1&0\\
0&1\\
1&1
\end{pmatrix}\right)_\downarrow\]
by calculating
\begin{eqnarray*}
g(a,b)-g(\tilde a, \tilde b)&=\int_0^1 Dg(t(\tilde a,\tilde b)+(1-t(a,b))\begin{pmatrix}\tilde a-a\\\tilde b-b\end{pmatrix} dt\\
&=\begin{pmatrix}
1&0\\
0&1\\
\int_0^1 (1-t)b+t\tilde b\ dt&\int_0^1 (1-t)a+t\tilde a\ dt
\end{pmatrix}\begin{pmatrix}\tilde a-a\\\tilde b-b\end{pmatrix}\\
&=\begin{pmatrix}
1&0\\
0&1\\
1-\frac{1}{2}(b+\tilde b)&1-\frac{1}{2}(a+\tilde a)
\end{pmatrix}\begin{pmatrix}\tilde a-a\\\tilde b-b\end{pmatrix}
\end{eqnarray*}
and noting that
\[{\mathcal{V}}^*\left({\mathrm{col}}\begin{pmatrix}
1&0\\
0&1\\
1-\frac{1}{2}(b+\tilde b)&1-\frac{1}{2}(a+\tilde a)
\end{pmatrix}\right)
\subseteq
{\mathcal{V}}^*\left({\mathrm{col}}\begin{pmatrix}
1&0\\
0&1\\
1&1
\end{pmatrix}\right)_\downarrow.\]

As an aside: ${\mathcal{V}}^*({\mathrm{col}}(Dg))$ is constant on $(0,1)^2$, and one might hope that this already implies that all signed differences $\mathrm{sign}(f(g(a,b))-f(g(\tilde a,\tilde b)))$ are in ${\mathcal{V}}^*({\mathrm{col}}(Dg))$. This would render the above integration argument unnecessary. If $Dg$ were rank 1, such a result follows from
the Mean Value Theorem. However, in higher rank this fails — see \cite{DA18} for an example.

References

  1. Coltheart, M., “Cognitive neuropsychology and the study of reading”, in Attention and performance XI, M. I. Posner and O. S. M. Marin, eds. (1985)
  2. Dunn, J. C. and Anderson, L., Signed difference analysis: Testing for structure under monotonicity, Journal of Mathematical Psychology 85 (2018), 36-54
  3. Dunn, J. C. and James, R. N., Signed difference analysis: Theory and application, Journal of Mathematical Psychology 47 (2003), 389–416