I’d like to discuss an application of oriented matroids to psychology that I’ve been working on with John Dunn. John is a mathematical psychologist who, along with Ralph James, started this ball rolling — I got an email from him out of the blue one day beginning “I hope you don’t think I’m a crackpot, but I think I have an application of oriented matroids”.
The type of psychological model we consider has three types of variables. Two of them are things one can measure in the lab: independent variables represent factors one can change to affect a subject’s responses, and dependent variables represent the responses. Between these, typically impossible to measure, are latent variables — theoretical constructs that can be viewed as depending on the independent variables and determining the dependent variable. For instance, a memory experiment might ask people to memorize words under different conditions (the conditions are independent variables), and then give them a test whose score is the dependent variable. The theoretical construct would describe mental processes involved in memory (the latent variables), which might be changed by changes in the independent variables, and would describe how values of the latent variables determine values of the dependent variables. We assume all latent variables and all dependent variables to be numerical.
If a theory were to propose that the functional relationship between latent and dependent variables has a particular algebraic form, then we could easily test the theory. For instance, if the relationship between the vector $\mathbf{x}$ of latent variables and $\mathbf{y}$ of dependent variables were theorized to be $\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{b}$ for some specified matrix $\mathbf{A}$ and vector $\mathbf{b}$, then the theory predicts that each $\mathbf{y}$ will lie in the affine space ${\mathrm{col}}(\mathbf{A})+\mathbf{b}$. If we expanded the theory to leave $\mathbf{b}$ unspecified, then we could still test this theory: it predicts that a set $\{\mathbf{y}_1, \ldots, \mathbf{y}_N\}$ of $N$ measurements of the vector of dependent variables should satisfy $\mathbf{y}_i-\mathbf{y}_j\in{\mathrm{col}}(\mathbf{A})$ for all $i$ and $j$.
But, if you think it seems odd to propose an algebraic relationship between numbers directly describing mental processes and numbers arising as test scores, then you are not alone.
However, in some situations a reasonable theory proposes that the vector of dependent variables $\mathbf{y}$ is related to the vector of latent variables $\mathbf{x}$ by a function \[\mathbf{y}=f(\mathbf{A}\mathbf{x})\]
where $\mathbf{A}$ is a specified matrix and $f$ is a function that the model does not specify but proposes to be componentwise monotonic. That is, $f$ is assumed to have the form $f(z_1, \ldots,z_m)=(f_1(z_1),\ldots, f_m(z_m))$, where either each $f_i$ is a strictly monotonically increasing function or each $f_i$ is a strictly monotonically decreasing function. I’ll give some examples of such theories shortly, and I’ll also expand this idea to a broader range of theories. First I’ll discuss how this form of hypothesis leads us to oriented matroids.
Signed difference analysis
For a vector $\mathbf{x}\in\mathbb{R}^m$, let $\mathrm{sign}(\mathbf{x})\in\{0,+,-\}^m$ be the componentwise sign. For a vector space $V$, let $\mathcal{V}^*(V)=\{\mathrm{sign}(\mathbf{v}): \mathbf{v}\in V\}$. Thus ${\mathcal{V}}^*(V)$ is the signed covector set of an oriented matroid realized by $V$.
Proposition 1([3])
Let $g$ be a function with codomain ${\mathbb{R}^n}$, and let $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ be a componentwise monotonic function. Then for any $\mathbf{x}$ and $\tilde{\mathbf{x}}$ in the domain of $g$,
\[\mathrm{sign}(f(g(\mathbf{x}))-f(g(\tilde{\mathbf{x}})))=\pm\mathrm{sign}(g(\mathbf{x})-g(\tilde{\mathbf{x}})).\]
This is really more of an observation than a proposition — the proof is immediate. But it tells us that for models of the form I described earlier, oriented matroids provide an appropriate test for the model.
Corollary 2 Let $\mathbf{A}$ be an $m\times n$ matrix over $\mathbb{R}$, and $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ a componentwise monotonic function. Then for any $\mathbf{x},\tilde{\mathbf{x}}\in{\mathbb{R}^m}$,
\[\mathrm{sign}(f(\mathbf{A}\mathbf{x})-f(\mathbf{A}\tilde{\mathbf{x}}))\in{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A})).\]
As a very simple example, a theory might propose a single latent variable $x$ such that one’s score at each of several tests is a monotonically increasing function of $x$. Thus one proposes that the vector $\mathbf{y}$ of test scores satisfies
\[\mathbf{y}=f\begin{pmatrix}
\begin{pmatrix}
1\\1\\
\vdots\\1
\end{pmatrix}\mathbf{x}
\end{pmatrix}\]
for some componentwise increasing $f$. Our theory and Corollary 2 says that
if $\mathbf{y}$ is the vector of test scores for Maria and $\tilde{\mathbf{y}}$ is the vector of test scores for Fred, then $\mathrm{sign}(\mathbf{y}-\tilde{\mathbf{y}})^T$ is in $\{(+,\ldots, +), (0,\ldots, 0),(-,\ldots,-)\}$.
This end result is obvious: our original theory was that the test-taker with the larger value of $x$ should do better on all of the tests. The point of bringing this example up is to note that this theory is more realistic than a theory that proposes some specific formula relating latent and dependent variables. By considering only the sign of $\mathbf{y}-\mathbf{y}’$, we discard numerical issues and consider only ordinal relationships between corresponding components of $\mathbf{y}$ and $\mathbf{y}’$.
To add some realism: Perhaps Maria has a larger value of $x$ than Fred, but not all of the tests are sensitive enough to detect her superior skill: on some tests they get the same score. Thus, perhaps $f$ is only componentwise weakly increasing. To encompass this we have the following observation. We order $\{0,+,-\}$ so that 0 is the unique minimum and $+,-$ are incomparable, and we order $\{0,+,-\}^n$ componentwise. For a subset $P$ of $\{0,+,-\}^n$, we will let $P_\downarrow$ be the order ideal of $P$ and $P^\uparrow$ be the filter of $P$. As another easy proposition we have the following.
Proposition 3 Let $g$ be a function with codomain ${\mathbb{R}^n}$, and let $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ be a componentwise weakly monotonic function. Then for any $\mathbf{x}$ and $\tilde{\mathbf{x}}$ in the domain of $g$,
\[\mathrm{sign}(f(g(\mathbf{x}))-f(g(\tilde{\mathbf{x}})))\leq\mathrm{sign}(g(\mathbf{x})-g(\tilde{\mathbf{x}})).\]
Corollary 4 Let $\mathbf{A}$ be an $m\times n$ matrix over ${\mathbb{R}}$, and $f:{\mathbb{R}^n}\to{\mathbb{R}^n}$ a componentwise weakly monotonic function. Then for any $\mathbf{x},\tilde{\mathbf{x}}\in{\mathbb{R}^m}$,
\[\mathrm{sign}(f(\mathbf{A}\mathbf{x})-f(\mathbf{A}\tilde{\mathbf{x}}))\in{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow.\]
Thus, if we have a theory that proposes that dependent variables $\mathbf{y}$ and latent variables $\mathbf{x}$ are related by $\mathbf{y}=f(\mathbf{A}(\mathbf{x}))$ for some matrix $\mathbf{A}$ and componentwise weakly increasing $f$, and we have data $\{\mathbf{y}_1, \ldots, \mathbf{y}_N\}$, where each $\mathbf{y}_i$ is the vector of dependent variables measured in one trial, then to test our theory we do the following:
- Find all signed difference vectors $\mathrm{sign}(\mathbf{y}_i-\mathbf{y}_j)$.
- For each signed difference vector $S$, see if there is a $X\in{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))$ such that $S$ and $X$ coincide on the support of $S$. Any signed difference vector for which there is no such $X$ is a violation of the model.
We can carry this out more efficiently (at least, when $|{\mathcal{C}}({\mathrm{col}}(\mathbf{A}))|<|{\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))|$) using the following result.
Proposition 5 ([2]) For any oriented matroid on elements $E$ with signed covector set ${\mathcal{V}}^*$ and signed circuit set ${\mathcal{C}}$, we have that $\{{\mathcal{V}}^*_\downarrow,{\mathcal{C}}^\uparrow\}$ is a partition of $\{0,+,-\}^E$.
Thus Step 2 above can be replaced with
2. For each signed difference vector $S$, see if there is a $Z\in{\mathcal{C}}({\mathrm{col}}(\mathbf{A}))$ such that $S$ and $Z$ coincide on the support of $Z$. Any signed difference vector for which such a $Z$ exists is a violation of the model.
Example: the Remember-Know problem
Think about the last time you saw someone and thought, “I know I’ve seen that person before, but I can’t remember where”. Some theories hold that remembering — recalling details such as “Oh yeah, I see that person working at the grocery store sometimes” — is an essentially different mental process from knowing — having a sense of familiarity without any context around it.
Another theory proposes that the only difference between remembering and knowing is the strength of the recall. More specifically, a signal detection model proposes that each of the above questions elicits in your mind a response (a signal) of some strength, and that the strength of this signal determines whether you “remember” the answer, merely “know” the answer, or don’t know the answer.
A testable instantiation of this ([2]) is as follows. On Day 1, a subject is given a list of words. On Day 2, the subject is given a new list and is asked, for each word on the new list, whether they “remember” seeing it on the old list, whether they “know” it was there despite not “remembering”, or whether they don’t recall that it was on the list.
A signal detection model theorizes that each word on the new list elicits a signal, so that
- if the word was on the old list, there is a probability distribution $p(s)$ for the strength of this signal, and
- if the word was not on the old list, the probability distribution for the strength of this signal is $p(s+d)$ for some $d$.
The model then says that there are numbers $a<b$ such that the subject will “remember” the word from the list if the strength of the signal is greater than $b$, will merely “know” the word was on the list if the strength was in the interval $[a,b]$, and will not recall the word if the strength is less than $a$.
Often the function $p(x)$ is assumed to be a normal distribution. But there’s frequently little theoretical basis for this assumption, and the use of signed difference analysis allows us to avoid any such assumption. We have latent variables $a$, $b$, and $d$, and we have that
- the probability ${\mathrm{Prob}}(\mbox{rem:true})$ of “remembering” a word that was on the old list is $\int_b^\infty p(t) dt=1-P(b)$. Here $P$ is the cumulative probability distribution for $p$,
- the probability ${\mathrm{Prob}}(\mbox{know:true})$ of either “remembering” or “knowing” a word that was on the old list is $\int_a^\infty p(t) dt=1-P(a)$,
- the probability ${\mathrm{Prob}}(\mbox{rem:false})$ of “remembering” a word that was not on the old list is $\int_b^\infty p(t+d) dt=1-P(b+d)$, and
- the probability ${\mathrm{Prob}}(\mbox{know:false})$ of either “remembering” or “knowing” a word that was not on the old list is $\int_a^\infty p(t+d) dt=1-P(a+d)$.
Let $f(x)=1-P(x)$, a monotonically decreasing function, and let $F:{\mathbb{R}}^4\to R^4$ act by $f$ in each component. Thus our model proposes that
\[\begin{pmatrix}
{\mathrm{Prob}}(\mbox{rem:true})\\
{\mathrm{Prob}}(\mbox{know:true})\\
{\mathrm{Prob}}(\mbox{rem:false})\\
{\mathrm{Prob}}(\mbox{know:false})
\end{pmatrix}
=F\left(\begin{pmatrix}
0&1&0\\
1&0&0\\
0&1&1\\
1&0&1
\end{pmatrix}
\begin{pmatrix}
a\\b\\d
\end{pmatrix}\right).
\]
So, if we conduct this experiment under different conditions (i.e., with different values for the independent variables) and record our data in terms of the four variables on the left-hand side of this equation, the model predicts that the signed differences of data points will be in ${\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow$, where $A$ is the matrix on the right-hand side of the above equation.
This is a pretty low bar: the only two elements of $\{0,+,-\}^4$ which are not in ${\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow$ are $\pm (+,-,-,+)$. But that is part of the point. Previous work had dismissed signal-detection models for the Remember-Know problem based on results of experiments similar to type I’ve described, but their analyses assumed a particular form for the probability distribution $p(x)$, and hence assumed data points must lie on a curve of a certain form. Signed difference analysis suggests that without making such an assumption, a signal-detection model for this problem is very difficult to falsify.
Nonlinear structure in SDA
We can broaden signed difference analysis further by considering models of the form $\mathbf{y}=f(g(\mathbf{x}))$, where $\mathbf{y}$ is the vector of dependent variables, $\mathbf{x}$ is the vector of latent variables, $f$ is an unspecified componentwise monotonic function, and $g$ is close enough to some affine function $\mathbf{x}\to\mathbf{A}\mathbf{x}+\mathbf{b}$ that, for all $\mathbf{x}$ and $\tilde{\mathbf{x}}$,
\[\mathrm{sign}(g(\mathbf{x})-g(\tilde{\mathbf{x}}))\in{\mathcal{V}}^*({\mathrm{col}}(A)).\]
Proposition 1 then promises us that for all $\mathbf{x}$ and $\tilde{\mathbf{x}}$,
\[\mathrm{sign}(f(g(\mathbf{x}))-f(g(\tilde{\mathbf{x}})))\in{\mathcal{V}}^*({\mathrm{col}}(A)).\]
For instance, if $g:{\mathbb{R}^m}\to{\mathbb{R}^n}$ is differentiable and the column space $V$ of the Jacobian $Dg$ is constant, then for every $\mathbf{x},\tilde{\mathbf{x}}\in{\mathbb{R}^m}$ and differentiable path $\lambda:[0,1]\to{\mathbb{R}^m}$ from $\mathbf{x}$ to $\tilde{\mathbf{x}}$ we have
\[ g(\tilde{\mathbf{x}})-g(\mathbf{x})=\int_0^1 Dg(\lambda(t))\lambda'(t) dt\in V.\]
For instance, in ([2]) we consider a signal-detection model for inductive and deductive reasoning that predicts data of the form
$\mathbf{y}=f(d,-d,(d-c)/s, -d+c)$, where $d$, $c$, and $s$ are latent variables and $f$ is componentwise monotonic. The Jacobian of the function $g(d,c,s)=(d,-d,(d-c)/s, -d+c)$ is
\[Dg=\begin{pmatrix}
1&0&0\\
-1&0&0\\
1/s&-1/s&{c-d}/s\\
-1&1&0
\end{pmatrix}\]
which has the same column space as
\[\mathbf{A}=\begin{pmatrix}
1&0&0\\
-1&0&0\\
0&0&1\\
0&1&0
\end{pmatrix}\]
and thus for every $(d,c,s)$ and $(\tilde d,\tilde c,\tilde s)$ and every componentwise monotonic $f$, the model proposes that the signed difference $f(g(d,c,s))-f(g(\tilde d,\tilde c,\tilde s))$ is in
${\mathcal{V}}^*({\mathrm{col}}(\mathbf{A}))_\downarrow$.
As another example, one model for reading ([1]) proposes that in reading single words one uses one or both of two skills. To oversimplify a bit, either one sounds out a word phonetically or one compares the word to known words. The model was proposed, in part, to account for cases of acquired dyslexia affecting either ability to read phonetically regular words but not phonetically irregular words or ability to read known words but not phonetically regular unknown words. Let $a$ denote the probability that a particular reader is will be successful at phonetically reading a phonetically regular word, and let $b$ be the probability that this reader will successfully read a word based on comparison to known words. These are the latent variables. Let $\mathbf{y}=(y_1, y_2, y_3)^T$ be the reader’s scores on three tests: $y_1$ is the score on a test of phonetically regular nonsense words, $y_2$ is the score on a test of phonetically irregular common words, and $y_3$ is the score on a test of words that are both phonetically regular and common. The model proposes that $\mathbf{y}=f(a,b,a+b-ab)$, where $f$ is componentwise weakly increasing. The Jacobian of $g(a,b)=(a,b,a+b-ab)$ is
\[Dg=\begin{pmatrix}
1&0\\
0&1\\
1-b&1-a
\end{pmatrix}\]
which no longer has constant column space. However, we can still see that, for all $(a,b)$ and $(\tilde a,\tilde b)\in [0,1]^2$, the sign difference $\mathrm{sign}(g(a,b)-g(\tilde a, \tilde b))$ is in
\[{\mathcal{V}}^*\left({\mathrm{col}}\begin{pmatrix}
1&0\\
0&1\\
1&1
\end{pmatrix}\right)_\downarrow\]
by calculating
\begin{eqnarray*}
g(a,b)-g(\tilde a, \tilde b)&=\int_0^1 Dg(t(\tilde a,\tilde b)+(1-t(a,b))\begin{pmatrix}\tilde a-a\\\tilde b-b\end{pmatrix} dt\\
&=\begin{pmatrix}
1&0\\
0&1\\
\int_0^1 (1-t)b+t\tilde b\ dt&\int_0^1 (1-t)a+t\tilde a\ dt
\end{pmatrix}\begin{pmatrix}\tilde a-a\\\tilde b-b\end{pmatrix}\\
&=\begin{pmatrix}
1&0\\
0&1\\
1-\frac{1}{2}(b+\tilde b)&1-\frac{1}{2}(a+\tilde a)
\end{pmatrix}\begin{pmatrix}\tilde a-a\\\tilde b-b\end{pmatrix}
\end{eqnarray*}
and noting that
\[{\mathcal{V}}^*\left({\mathrm{col}}\begin{pmatrix}
1&0\\
0&1\\
1-\frac{1}{2}(b+\tilde b)&1-\frac{1}{2}(a+\tilde a)
\end{pmatrix}\right)
\subseteq
{\mathcal{V}}^*\left({\mathrm{col}}\begin{pmatrix}
1&0\\
0&1\\
1&1
\end{pmatrix}\right)_\downarrow.\]
As an aside: ${\mathcal{V}}^*({\mathrm{col}}(Dg))$ is constant on $(0,1)^2$, and one might hope that this already implies that all signed differences $\mathrm{sign}(f(g(a,b))-f(g(\tilde a,\tilde b)))$ are in ${\mathcal{V}}^*({\mathrm{col}}(Dg))$. This would render the above integration argument unnecessary. If $Dg$ were rank 1, such a result follows from
the Mean Value Theorem. However, in higher rank this fails — see \cite{DA18} for an example.
References
- Coltheart, M., “Cognitive neuropsychology and the study of reading”, in Attention and performance XI, M. I. Posner and O. S. M. Marin, eds. (1985)
- Dunn, J. C. and Anderson, L., Signed difference analysis: Testing for structure under monotonicity, Journal of Mathematical Psychology 85 (2018), 36-54
- Dunn, J. C. and James, R. N., Signed difference analysis: Theory and application, Journal of Mathematical Psychology 47 (2003), 389–416