Picture of Simone.

A model-theoretic scroll through the fields

The goal of this post is to explore some of the ideas behind model theory and its analysis of algebraic structures, and given an application of these methods in the form of the Ax-Grothendieck theorem. It is especially directed at an audience of students in mathematics with some notions of algebra but none of logic. Some of the ideas might have been slightly deformed to make them more relatable for this specific audience, so caveat lector if you know model theory.

What is model theory, anyway?

Many definitions of model theory have been given over the (few) years of its existence. As with all tentative definitions, it has pretty soon become clear that giving a precise delineation of the field is nearly impossible; thus, we have to make do with different slogans, some older, some newer. Historically, one of the oldest ideas was \[ \text{model theory} \; = \; \text{algebraic geometry} \; - \; \text{fields}. \] (One could point out, a bit tongue-in-cheek, that indeed this is true if interpreted to mean that model theory does not have all the Fields medals that algebraic geometry has.) The problem with this slogan is, of course, that we now know what algebraic geometry without fields is -- namely, it is algebraic geometry. We clearly arrive to a contradiction, as model theory and algebraic geometry are indeed distinct subjects. We have to find another way.
A more modern slogan was proposed by Hrushovski, describing model theory as the geography of tame mathematics. This is, however, a bit too far ahead for the purpose of this post -- we will get nowhere near the matter of tameness of mathematical theories and objects. So, we better settle for something less catchy, but quite descriptive: let's think of model theory as the linguistics of mathematics.
Along this intuition, we first establish a formal language which we will use to express mathematical properties of our favourite objects. For this, it is perhaps useful to go back to our first Algebra class, and think of the way groups were defined, for example. Typically, the lecturer would say something along the lines of ''a group is a set \(G\) together with an operation \(\cdot\) and a special element \(e\), which satisfies the following axioms: \(\forall x (x\cdot e = x)\), ...''. They would then proceed to write down a list of symbolic statements like \(\forall x (x \cdot e = x)\), expressing some properties of the operation \(\cdot\) and the special constant \(e\). The crucial insight behind mathematical logic is that we are allowed to treat statements like \(\forall x (x \cdot e = x)\) as mathematical objects themselves, and thus do mathematics with them.

Formulas or formulae?

Let's give a precise definition of these statements in a special case, namely the one useful to treat rings and fields. We choose a language \(\mathcal{L} = \{+, \cdot, -, 0, 1\}\), which as the name suggests is a choice of special symbols that we will use to talk about our objects. In this case, we choose two binary functions \(+,\cdot\), a unary function \(-\), and two special elements \(0,1\). This is a very natural process -- when we first introduce groups, we consider their operation \(\cdot\) and their neutral element \(e\). These are enough symbols to express their properties (or at least, their very basic ones).
Similarly, if we want to talk about fields, we better be able to talk about their operations \(+\) and \(\cdot\), their neutral elements \(0\) and \(1\), and out of laziness and exposition, also about the operation of additive inverse \(-\) (this is not really crucial, as it can be recovered from the others, but this is irrelevant for us today). We then recursively define what a formula in the language \(\mathcal{L}\) is.
  1. If \(f(X_1, \dots X_n)\) is a polynomial over \(\mathbb{Z}\), then the equality \[ f(X_1, \dots X_n) = 0 \] is a formula in \(\mathcal{L}\) (using the variables \(X_1, \dots X_n\)).
  2. If \(\varphi\), \(\psi\) are formulas (for example, \(\varphi\) is \(f(X) = 0\) and \(\psi\) is \(g(Y) = 0\), for some polynomials \(f,g\)), then we can combine them together using \(\land\) (''and''), \(\lor\) (''or''), \(\to\) (''implies''), and negate them using \(\neg\) (''not''). In other words, we declare \(\varphi\land\psi\), \(\varphi\lor\psi\), \(\varphi\to\psi\), \(\neg\varphi\) and \(\neg\psi\) to be formulas as well.
  3. If \(\varphi\) is a formula and \(X\) is a variable, then \(\forall X\varphi\) (''for all \(X\), \(\varphi\)'') and \(\exists X\varphi\) (''there exists \(X\) such that \(\varphi\)'') are formulas. This step is really meaningful only if \(\varphi\) uses the variable \(X\).
We are allowed to repeat and intertwine these steps as many times as we like. The first step isolates the basic building blocks of our process; as for steps two and three, they are built along our intuition of what mathematical statements look like. Before, we saw an example: \(\forall x(x \cdot e = x)\). Let us see some further examples in our chosen language \(\mathcal{L}\), and try to understand what they mean in plain English.
The first three examples were all sentences, on the other hand, so we could check whether they were true in our favourite structure. But what does true mean, precisely?

Structures and models

Suppose I gave you the sentence \(\forall X_1 \forall X_2 \forall X_3 \exists Y(X_1Y^2+X_2Y+X_3=0)\) and I asked you to check whether it was true in \(\mathbb{R}\). You would take any \(a_1, a_2, a_3 \in \mathbb{R}\) and then try and figure out if the equation \(a_1Y^2+a_2Y+a_3 = 0\) had a solution in \(\mathbb{R}\). This process can naturally take place in any so-called \(\mathcal{L}\)-structure.
A structure in the language \(\mathcal{L}\) is nothing but some set \(M\) together with interpretations of the symbols of \(\mathcal{L}\). In our particular case, this means that \(M\) comes together with two functions \(+^M,\cdot^M\colon M^2 \to M\), a function \(-^M\colon M \to M\), and two special elements \(0^M,1^M\in M\). Together, \[ \mathcal{M} = (M,+^M,\cdot^M,-^M,0^M,1^M) \] is a structure in the language \(\mathcal{L}\). Note that these are just functions and special elements -- \(\mathcal{M}\) need not be a field, or even a ring! For example, \[ (\mathbb{R}, \dagger, \star, \sin, 3, \frac{9}{2}), \] where \(\dagger\) is the binary function \((a,b) \mapsto \exp(ab)\), \(\star\) is the binary function \((a,b) \mapsto \cos(a+b)\), is a honest \(\mathcal{L}\)-structure. So is, of course, the ''real'' real numbers, \[ (\mathbb{R}, +, \cdot, -, 0, 1). \] The difference between these two structures is which sentences are true in them. While the second one will satisfy all field axioms, the first one will not even satisfy the ring axioms! This is encoded in their theories: given a structure \(\mathcal{M}\) in the language \(\mathcal{L}\), we denote by \(\operatorname{Th}(\mathcal{M})\) the set of sentences in the language \(\mathcal{L}\) which are true in the structure. The process of defining what it means to be true in an arbitrary structure is the same one we would use if we wanted to check truth in \(\mathbb{R}\): we reduce to checking whether a polynomial (correctly interpreted, using the interpretations of \(+, \cdot, -, 0, 1\) in our structure) vanishes. This ''correctly interpreted'' point is a bit tricky, and perhaps a fault of the chosen presentation of the construction of formulas using polynomials over \(\mathbb{Z}\) (which carry some ambiguity if, for example, we don't choose associative operations). Since all structures we will consider are indeed rings, we will ignore this issue for today. Whenever we consider a field \(K\), we will write \(\operatorname{Th}(K)\) to mean the theory of the ''obvious'' structure of \(K\) (i.e., where \(+,\cdot,-\) are the corresponding field operations, and \(0,1\) are the corresponding neutral elements).

Algebraically closed fields

We have already seen a reason why \(\operatorname{Th}(\mathbb{R})\) and \(\operatorname{Th}(\mathbb C)\) are different: namely, the sentence \(\exists X(X^2+1=0)\) is an element of the latter, but not of the former. A bit naïvely, one could say that the point of model theory is understanding how much \(\operatorname{Th}(\mathcal{M})\) knows about \(\mathcal{M}\), and how well the theory can be described. As you can imagine, the set \(\operatorname{Th}(\mathbb{C})\) contains a lot of information, a priori described by very complicated sentences: imagine a sentence using a million quantifiers and two million connectives. These sentences talk about all sorts of polynomial equations having solutions, projections of varieties intersecting other varieties -- in a sense, the whole of (classical, complex) algebraic geometry is encoded there. Strikingly, however, it can be completely described by an infinite set of natural sentences, that make sense in plain English (I believe the human brain struggles beyond two or three quantifiers, as witnessed by everyone's horrific first encounter with \(\epsilon-\delta\) definitions of limits).
We let \(\mathrm{ACF}_0\) be the following set of sentences (which we usually call a theory):
  1. all field axioms,
  2. \(\neg(2=0)\), \(\neg(3=0)\), and so on -- in other words, infinitely many sentences expressing that ''the characteristic is not \(p\)'', for all primes \(p\),
  3. the sentence expressing ''all polynomials of degree at most \(d\) have a root'', for every \(d \geq 1\).
The last point is the trickiest of them all, but it is actually very natural: for example, for \(d=2\), it is simply \[ \forall X_1 \forall X_2 \forall X_3 \exists Y (X_1Y^2+X_2Y+X_3=0), \] which we have already seen before. As the degree goes up, it becomes a long list of quantifiers, but they come with a reasonable interpretation in plain English, so it is not too bad. We can then define \(\mathrm{ACF}_p\) to be the same set of sentences, but stopping the list in 2. at the prime before \(p\), and then adding \(p=0\).
Briefly going back to this idea of tame mathematics, the theory \(\mathrm{ACF}_0\) is one of the tamest theories we know. For us today, the crucial fact -- which is not at all obvious -- is that \(\mathrm{ACF}_0\) and \(\mathrm{ACF}_p\) are complete. Without defining what ''complete'' is, I want to highlight a crucial consequence of it: given any sentence \(\varphi\), if \(\varphi\) is true in \(\mathbb{C}\) (or in the algebraic closure \(\overline{\mathbb{F}_p}\) of a field with \(p\) elements), then there is a proof of \(\varphi\) which only uses finitely many sentences from \(\mathrm{ACF}_0\) (or \(\mathrm{ACF}_p\)).

Between characteristic \(0\) and characteristic \(p\)

This fact has a fundamental consequence. Let us take a sentence \(\varphi\). If it is true in \(\mathbb{C}\), then there is a proof of it that uses only finitely many sentences from \(\mathrm{ACF}_0\). Note that \(\mathrm{ACF}_0\) and \(\mathrm{ACF}_p\) differ only in the sentences about the characteristic. This means that, in particular, only finitely many of the sentences \(\neg(p=0)\) are used in this proof, and hence, if we choose \(p'\) big enough, we can run the same proof inside of \(\overline{\mathbb{F}_{p'}}\). In other words, if \(\varphi\) is true in \(\mathbb{C}\), then it must be true in \(\overline{\mathbb{F}_p}\), for \(p\) big enough.
This is a precise example of an overarching philosophy in algebra, number theory, and algebraic geometry: phenomena observed in characteristic \(0\) should also be observed in characteristic \(p\), for \(p\) big enough; and viceversa. As an example, we will prove the Ax-Grothendieck theorem.

Theorem. Let \(n \geq 0\) and suppose \(f \colon \mathbb{C}^n \to \mathbb{C}^n\) is a function such that, for some polynomials \(f_1, \dots f_n \in \mathbb{C}[X_1, \dots X_n]\), we have for all \(\overline{x} \in \mathbb{C}^n\), \[ f(\overline{x}) = (f_1(\overline{x}), \dots f_n(\overline{x})). \] If \(f\) is injective, it is also surjective.
Proof. We aim to use what we have just observed and reduce the question from \(\mathbb{C}\) to \(\overline{\mathbb{F}_p}\). To do that, for any \(n \geq 0\) and \(d \geq 0\) we build a sentence \(\Sigma_{n,d}\) such that \(\Sigma_{n,d}\) holds if and only if for every polynomial function \(f\) in dimension \(n\) given by polynomials of degree at most \(d\), if \(f\) is injective then it is surjective. Writing out each \(\Sigma_{n,d}\) is a bit tedious, so let's see what \(\Sigma_{1,1}\) looks like: \[ \underbrace{\forall X_1\forall X_2}_{\text{coefficients}} \\ (\underbrace{\forall Y \forall Z((X_1Y+X_2 = X_1Z + X_2) \to (Y = Z))}_{f \; \text{is injective}} \to \underbrace{\forall W \exists U(X_1U+X_2 = W)}_{f \; \text{is surjective}}). \] As \(n\) and \(d\) grow, the sentences get lengthier, but the pattern is the same.
Now, let us assume that the theorem is false. This means that we can find a counterexample, so for some dimension \(m > 0\) and degree \(g > 0\) we have that \(\Sigma_{m,g}\) is false in \(\mathbb{C}\). Indeed, this means that \(\neg\Sigma_{m,g}\) is true in \(\mathbb{C}\), so by our remark above, it must be true in \(\overline{\mathbb{F}_p}\), for \(p\) big enough. So once again, this means that \(\Sigma_{m,g}\) must be false in \(\overline{\mathbb{F}_p}\), for \(p\) big enough -- in other words, from a counterexample to the theorem in \(\mathbb{C}\), we obtain that there must be one in \(\overline{\mathbb{F}_p}\), for \(p\) big enough.
We now argue that this cannot be, because the theorem is true if we substitute \(\mathbb{C}\) with \(\overline{\mathbb{F}_p}\), for any \(p\). First of all, it is true if we substitute \(\mathbb{C}\) with \(\mathbb{F}_q\), for \(q\) some prime power: indeed, we then have that \(f\colon \mathbb{F}_q^n \to \mathbb{F}_q^n\) is an injective map between finite sets of the same cardinality, so it must be surjective. We now have to remember something about \(\overline{\mathbb{F}_p}\): it can be described as \[ \overline{\mathbb{F}_p} = \bigcup_{n \geq 1} \mathbb{F}_{p^n}. \] Then, if \(f\colon \overline{\mathbb{F}_p}^n \to \overline{\mathbb{F}_p}^n\) is given by polynomials over \(\overline{\mathbb{F}_p}\), there is \(m\) big enough so that the coefficients of these polynomials actually come from \(\mathbb{F}_{p^m}\). Suppose \(f\) is injective and we want to prove it is surjective: we then take some \(\overline{b} \in \overline{\mathbb{F}_p}^n\) and look for \(\overline{a} \in \overline{\mathbb{F}_p}^n\) such that \(f(\overline{a}) = \overline{b}\). But once again, there is \(\ell\) big enough so that \(\overline{b} \in \mathbb{F}_{p^\ell}^n\).
Pick now your favourite \(k \geq m, l\). We can see \(f\) as a map \(\mathbb{F}_{p^k}^n \to \mathbb{F}_{p^k}^n\), with \(\overline{b}\) in its codomain. We are back to the beginning of our argument: \(f\vert_{\mathbb{F}_{p^k}^n}\) must be surjective, and so there is \(\overline{a} \in \mathbb{F}_{p^k}^n\) such that \(f(\overline{a}) = \overline{b}\). Since \(\overline{a} \in \overline{\mathbb F_{p}}^n\), we have found our desired preimage.
We have then shown that the theorem holds in \(\overline{\mathbb{F}_p}\), so each \(\Sigma_{n,d}\) must be true in \(\overline{\mathbb{F}_p}\), against our assumption that \(\Sigma_{m,g}\) is false in \(\overline{\mathbb{F}_p}\). \(\square\)