Random Permutations

Representation theory, for the impatient

2016-02-23T23:21:00.000+00:00

$\def\C{\mathbf{C}}$

Let $G$ be a finite group. The group algebra $\C G$ is the complex algebra with basis $G$, with multiplication defined by linearly extending the group law of $G$. The description of $\C G$ as an algebra is called representation theory. Relatedly, we want to understand the structure of $\C G$-modules. We often call $\C G$-modules (complex) representations of $G$, and simple $\C G$-modules irreducible representations.

Lemma: (Schur's lemma) If $V$ and $V'$ are simple $\C G$-modules, then every nonzero homomorphism $V\to V'$ is an isomorphism. Moreover every homomorphism $V\to V$ is a scalar multiple of the identity.

Proof: Suppose that $f:V\to V'$ is a nonzero homomorphism. Then $\ker f$ is a submodule, so $\ker f = 0$, and similarly $f(V) = V'$, so $f$ is an isomorphism. Now suppose $V=V'$. Since every eigenspace of $f$ is a submodule, by simplicity every eigenspace is either $0$ or $V$. From the spectral theorem we deduce that $f$ is a scalar multiple of the identity.

It turns out that we can ask for a unitary structure in $\C G$-modules for free.

Lemma: (Weyl's averaging trick) Every $\C G$-module $V$ has a $G$-invariant inner product. Moreover if $V$ is simple then this inner product is unique up to scaling.

Proof: Let $V$ be a $\C G$-module and let $(,)$ be any inner product on $V$. Define a new inner product $\langle,\rangle$ by

$$\langle u, v\rangle = \frac{1}{|G|} \sum_{g\in G} (g\cdot u, g\cdot v).$$

Clearly $\langle,\rangle$ has the desired properties. Now suppose that $V$ is simple, and that $\langle,\rangle'$ is another $G$-invariant inner product. Let $f:V\to V$ be the adjoint of the formal identity

$$(V,\langle,\rangle)\to(V,\langle,\rangle').$$

In other words let $f$ be the unique function $V\to V$ satisfying

$$\langle u,v\rangle' = \langle u, f(v)\rangle$$

for all $u,v\in V$. Then $f$ is a homomorphism, so by Schur's lemma it must be a multiple of the identity, so $\langle,\rangle'$ must be a multiple of $\langle,\rangle$.

Lemma: Let $U$ be a finite-dimensional $\C G$-module with $G$-invariant inner product $\langle,\rangle$, and for each simple $\C G$-module $V$ let $U_V$ be the sum of all submodules of $U$ isomorphic to $V$ (the isotypic component of $U$ corresponding to $V$). Then $U_V\cong V^{\oplus m}$ for some $m$, and $U$ is the orthogonal direct sum of the submodules $U_V$.

Proof: Since the orthogonal complement of a submodule is a submodule, it follows by induction on dimension that $U$ is an orthogonal direct sum of simple submodules, so $U = \bigoplus U'_V$, where $V$ runs over simple $\C G$-modules and each $U'_V \cong V^{\oplus m}$ for some $m\geq 0$. Moreover by Schur's lemma any two nonisomorphic simple submodules must be orthogonal, as the orthogonal projection from one to the other is a homomorphism, so every submodule of $U$ isomorphic to $V$ must be contained in $U'_V$, so $U'_V=U_V$.

The above lemma is particularly interesting when applied to the $\C G$-module $\C G$ itself, the regular representation. We give $\C G$ a $G$-invariant inner product by declaring the basis $G$ to be orthonormal. From the above lemma we know then that $\C G = \bigoplus \C G_V$, where $V$ runs over simple $\C G$-modules, and $\C G_V\cong V^{\oplus m}$ for some $m\geq 0$. Note then by Schur's lemma that $\dim\text{Hom}(\C G,V)=m$. On the other hand every homomorphism $\C G\to V$ is determined uniquely by the destination of the unit $1$ in $V$, so $\dim\text{Hom}(\C G,V)=\dim V$. We deduce that as $\C G$-modules

$$\C G \cong \bigoplus V^{\oplus \dim V}.$$

In particular there are only finitely many simple $\C G$-modules, and their dimensions obey

$$|G| = \sum (\dim V)^2.$$

One can also prove $\C G_V\cong V^{\oplus\dim V}$ in a more informative way, as follows. Fix an invariant inner product $\langle,\rangle_V$ on $V$, and consider any homomorphism $f:V\to\C G$. The adjoint $f^*:\C G\to V$ is also a homomorphism, so

$$ f(v) = \sum_{g\in G}\langle f(v),g\rangle_{\C G} g = \sum_{g\in G}\langle v, f^*(g)\rangle_V g = \sum_{g\in G}\langle v,g f^*(1)\rangle_V g. $$

Conversely for any $u\in V$ we may define a homomorphism $V\to\C G$ by

$$ f_{V,u}(v) = \sum_{g\in G} \langle v,g\cdot u\rangle_V g.$$

We deduce therefore that $\C G_V$ is the subspace of $\C G$ spanned by the elements $f_{V,u}(v)$, where $u,v\in V$. Moreover using Schur's lemma one can show that the images of $f_{V,u}$ and $f_{V,u'}$ are orthogonal whenever $u$ and $u'$ are orthogonal in $V$, so by letting $u$ range over a basis of $V$ we thus see that $\C G_V$ is the orthogonal direct sum of $\dim V$ copies of $V$.

In any case we now understand the structure of $\C G$ as a $\C G$-module, and we are only a short step away from understanding its structure as an algebra. Consider the obvious map

$$\C G \longrightarrow \bigoplus \text{End}(V),$$

where as always the sum runs over a complete set of irreducible representations up to isomorphism. We claim this map is an isomorphism. Since we already know the dimensions agree, it suffices to prove injectivity. Thus suppose $x\in\C G$ maps to zero, i.e., that $x$ acts trivially on each simple $\C G$-module $V$. Then $x$ acts trivially on $\C G$, so $x = x1 = 0$. Thus we have proved the following theorem.

Theorem: As complex algebras, $\C G \cong \bigoplus \text{End}(V)$.

$\def\tr{\text{tr}}$

Finally, it is useful to understand how to project onto isotypic components. Given $g\in G$, we can compute the trace of $g$ as an operator on $\C G$ in two different ways. On the one hand, by looking at the basis $G$,

$$ \tr_{\C G} (g) = \begin{cases} |G| & \text{if }g=1,\\ 0 &\text{if }g\neq 1.\end{cases}$$

On the other hand from the decomposition $\C G\cong\bigoplus V^{\oplus\dim V}$ we have

$$ \tr_{\C G} (g) = \sum_V (\dim V) \tr_V(g).$$

As a consequence, for every $x\in \C G$ we have

$$ x = \sum_V P_V x, $$

where $P_V : \C G \to \C G$ is the operator defined by

$$ P_V x = \frac{\dim V}{|G|} \sum_{g\in G} \tr_V(xg^{-1}) g = \frac{\dim V}{|G|} \sum_{g\in G} \tr_V(g^{-1}) gx.$$

These identities are most easily verified first for $x\in G$, then extending to all of $\C G$ by linearity. Now if $x \in \C G_U$ for $U\neq V$ then $x$ acts as zero on $V$, so $\tr_V(x g^{-1}) = 0$, so $P_V x = 0$. On the other hand one can verify directly that $P_V$ is a homomorphism, so by Schur's lemma the image of $P_V$ must be contained in $\C G_V$. We deduce therefore from $x = \sum_V P_V x$ that $P_V$ is the orthogonal projection onto $\C G$.

The function $\chi_V(g) = \tr_V(g)$ is usually called the character of $V$. From the relations $P_V^2 = P_V$ and $P_U P_V = 0$ for $U\neq V$ one can deduce the well known orthogonality relations for characters. In fact the distinction between $P_V$ and $\chi_V$ is hardly more than notational. Often we identify functions $f:G\to \C$ with elements $\sum_{g\in G} f(g) g \in \C G$, in which case the operation of convolution corresponds to multiplication in the group algebra. The operator $P_V$ then is just convolution with $(\dim V)\chi_V$. So, in brief, to project onto the $V$-isotypic component you convolve with the character of $V$ and multiply by $\dim V$.

We have kept to almost the bare minimum in the above discussion: the complex numbers $\C$ and finite groups $G$. There are a number of directions we could try to move in. We could replace $\C$ with a different field, say one which is not algebraically closed, or one which has positive characteristic. Alternatively we could replace $G$ with an infinite group, say with a locally compact topology. We mention two such generalisations.

Theorem: (Artin--Wedderburn) Every semisimple ring is isomorphic to a product $\prod_{i=1}^k M_{n_i}(D_i)$ of matrix rings, where the $n_i$ are integers and the $D_i$ are division rings. In particular every semisimple $\C$-algebra is isomorphic to a product $\prod_{i=1}^k M_{n_i}(\C)$.

When defining unitary representations for compact groups $G$ we demand that the map $G\to U(V)$ be continuous, where $U(V)$ is given the strong operator topology.$\def\HS{\text{HS}}$

Theorem: (Peter--Weyl) Let $G$ be a compact group and $\mu$ its normalised Haar measure. Let $\widehat{G}$ be the set of all irreducible unitary representations of $G$ up to isomorphism. Then $\widehat{G}$ is countable, every $V\in\widehat{G}$ is finite-dimensional, and the algebra $L^2(G)$ of square-integrable functions with the operation of convolution decomposes as a Hilbert algebra as $$ L^2(G) \cong \bigoplus_{V\in\widehat{G}} (\dim V) \cdot \HS(V), $$ where $\HS(V)$ is the space $\text{End}(V)$ together with the Hilbert-Schmidt inner product.

Group limits

2015-02-09T23:50:00.000+00:00

A few months ago on his blog Terry Tao explained how one could, by analogy with the theory of graph limits, replace the explicit use of arithmetic regularity with a soft device which he calls an additive limit. Roughly speaking, the additive limit, or Kronecker factor, of a sequence of finite abelian groups $G_i$ is a compact quotient of the ultraproduct $\prod G_i$ which controls the convolutions. The result is that many theorems from additive combinatorics, such as Roth's theorem, which are usually proved using quantitative tools like Fourier analysis can instead be proved using soft tools more like the Lebesgue differentiation theorem.

The purpose of this post is to extend Tao's construction to the nonabelian setting. Tao already stated in his post that this should be possible, so one could say that this is just an exercise in nonabelian Fourier analysis. On the other hand the proof in the nonabelian setting more or less forces a more categorical point of view, so certain points of this exercise are instructive.

1. Measurable Bohr compactification

$\def\B{\text{Bohr}}\def\Ba{\text{Baire}}$Given any topological group $G$ there is a compact group $\B(G)$, called the Bohr compactification of $G$, such every continuous homomorphism from $G$ to a compact group $K$ factors uniquely through $\B(G)$. One can think of $\B(G)$ as the 'largest' compact group in which $G$ has dense image. We need a variant of this definition for groups $G$ endowed only with a $\sigma$-algebra instead of a topology.

By a measurable group we mean a group $G$ together with a $\sigma$-algebra $\Sigma$ of subsets of $G$. Note that we do not make any measurability assumptions about the group operation or even the left- or right-shifts, though certainly it would be sensible to do so in other contexts. For us the only role of $\Sigma$ is to distinguish among all homomorphisms the measurable homomorphisms. The analogue of Bohr compactification for measurable groups is given by the following theorem.

Theorem 1 (Existence of measurable Bohr compactification): For every measurable group $G$ there is a compact group $\B_m(G)$ together with a $(\Sigma,\Ba)$-measurable homomorphism $\pi:G\to \B_m(G)$ such that every $(\Sigma,\Ba)$-measurable homomorphism from $G$ to a compact group $K$ factors uniquely as the composition of $\pi:G\to \B_m(G)$ and a continuous homomorphism $\B_m(G)\to K$.

In category-theoretic terms $\B_m$ is a left adjoint to the functor $\Ba$ from compact groups to measurable groups which replaces a group's topology with its Baire $\sigma$-algebra. By the adjoint functor theorem it suffices to check that the functor $\Ba$ is continuous, which boils down to the following lemma. Incidentally the analogue of this lemma fails for the Borel $\sigma$-algebra, which is why we must consider the Baire $\sigma$-algebra instead.

Lemma 2: For compact Hausdorff spaces $X_i$ we have $\Ba(\prod X_i) = \prod \Ba(X_i)$. In words, the Baire $\sigma$-algebra of the product is the product of the Baire $\sigma$-algebras.

Proof: The containment $\prod \Ba(X_i) \subset \Ba(\prod X_i)$ is immediate from the defintions. To prove the opposite containment it suffices to check that every continuous function $f:\prod X_i \to \mathbf{R}$ is measurable with respect to $\prod\Ba(X_i)$. This is certainly true of functions $f$ which depend on only finitely many coordinates, and thus for all continuous functions $f$ by the Stone-Weierstrass theorem.$\square$

Those who, like me, are not used to thinking in terms of the adjoint functor theorem will appreciate a more pedestrian proof of Theorem 1. To this end, let $\mathcal{K}$ be the set of all pairs $(f,K)$, where $K$ is a compact group and $f:G\to K$ is a $(\Sigma,\Ba)$-measurable homomorphism with $f(G)$ dense in $K$. Then if$$\pi:G\to\prod_{(f,K)\in\mathcal{K}} K$$ is the diagonal map then $\pi:G\to\overline{\pi(G)}$ is the Bohr compactification of $G$. Indeed, the measurability of $\pi$ follows from Lemma 2, and the universal property is essentially obvious: given a $(\Sigma,\Ba)$-measurable homomorphism $f:G\to K$ with $K$ compact, the pair $(f,\overline{f(G)})$ appears in $\mathcal{K}$, so we have continuous maps$$\overline{\pi(G)}\to\prod_{(g,L)\in\mathcal{K}} L \to \overline{f(G)}\to K$$whose composition $h$ satisfies $h\pi = f$; moreover $h$ is unique because $\pi(G)$ is dense in $\overline{\pi(G)}$.

Alternatively, by the Peter-Weyl theorem, $\B_m(G)$ can be defined as the inverse limit of all measurable finite-dimensional unitary representations of $G$.

It is natural to ask what relation the measurable Bohr compactification $\B_m$ bears to the usual Bohr compactification $\B$. In particular, is $\B$ just the composition $\B_m\circ\Ba$? Clearly $\B(G) = \B_m(\Ba(G))$ if and only if every measurable homomorphism $f:G\to K$ from $G$ to a compact group $K$ is continuous. This follows from Steinhaus's theorem if $G$ is locally compact, but it certainly fails in general, for instance for $G=\mathbf{Q}$ with the topology inherited from $\mathbf{R}$.

2. The Bohr compactification of an ultrafinite group

Let $G_1,G_2,\dots$ be a sequence of finite groups and let $p\in \beta\mathbf{N} \setminus\mathbf{N}$ be a nonprincipal ultrafilter. We form the ultraproduct $G=\prod_{n\to p} G_n$ and make it into a measurable group by giving it the Loeb $\sigma$-algebra $\mathcal{L}_G$, the $\sigma$-algebra generated by internal sets $\prod_{n\to p} A_n$, where $A_n\subset G_n$. We define the Loeb measure $\mu_G$ on internal sets $\prod_{n\to p}A_n$ by putting$$\mu_G\left(\prod_{n\to p}A_n\right)=\text{st}\lim_{n\to p} |A_n|/|G_n|,$$and we define $\mu_G$ on $\mathcal{L}_G$ by extension.

While the group operation $G\times G\to G$ is not generally measurable with respect to the product $\sigma$-algebra $\mathcal{L}_G\times\mathcal{L}_G$, it is measurable with respect to the larger $\sigma$-algebra $\mathcal{L}_{G\times G}$. Moreover this latter $\sigma$-algebra is still 'product-like' in the sense that all $\mathcal{L}_{G\times G}$-measurable $f:G\times G\to\mathbf{R}_{\geq0}$ obey Fubini's theorem, so we have a sensibly defined convolution operation $L^1(G)\times L^1(G)\to L^1(G)$ given $$f*g(x) = \int f(y)g(y^{-1}x)\,d\mu_G(y).$$

Now consider the Bohr compactification $\B_m(G)$ of $G$. The first thing to notice is that $\pi_*\mu_G$ is a $\pi(G)$-invariant Baire probability measure on $\B_m(G)$. Since $\pi(G)$ is dense in $\B_m(G)$ we conclude that $\pi_*\mu_G$ is in fact $\B_m(G)$-invariant, so by the uniqueness of Haar measure we must have$$\pi_*\mu_G=\mu_{\B_m(G)}.$$

In the remainder of this section we relate the two convolution algebras $L^2(G)$ and $L^2(\B_m(G))$. Given $f\in L^2(\B_m(G))$ we can form the pullback $\pi^*f = f\circ\pi$. Since$$ \|f\circ\pi\|_{L^2(G)}^2 = \int |f\circ\pi|^2\,d\mu_G = \int |f|^2 \,d\mu_{\B_m(G)} = \|f\|_{L^2(\B_m(G))}^2,$$we see that $\pi^*f$ is a well defined element of $L^2(G)$, and in fact $\pi^*$ defines an isometric embedding$$\pi^*:L^2(\B_m(G))\to L^2(G).$$In the other direction we have the pushforward$$\pi_*:L^2(G)\to L^2(\B_m(G)),$$ defined as the adjoint of $\pi_*$. The identities $$\pi_*\pi^*f=f\quad\text{for }f\in L^2(\B_m(G)),$$$$\pi^*(f*g)=\pi^*f*\pi^*f\quad\text{for }f,g\in L^2(\B_m(G)),$$$$\pi_*(f*g)=\pi_*f*\pi_*g\quad\text{for }f,g\in L^2(G)$$are readily verified. For instance, the last of these is verified by the following computation, valid for $h\in L^2(\B_m(G))$ by $\mathcal{L}_{G\times G}$-Fubini:$$\langle h, \pi_*(f*g)\rangle = \langle \pi^* h, f*g\rangle = \int h(\pi(xy)) f(x)g(y)\,d\mu_G(x)\,d\mu_G(y)$$$$=\int h(x'y') \pi_*f(x')\pi_*g(y')\,d\mu_{\B_m(G)}(x')\,d\mu_{\B_m(G)}(y') = \langle h,\pi_*f*\pi_*g\rangle.$$We can summarise the situation as an isometric Banach algebra isomorphism$$L^2(G) \cong L^2(\B_m(G))\oplus \ker\pi_*.$$

The following theorem asserts that in fact $\B_m(G)$ alone determines convolutions in $G$, and thus $\B_m(G)$ will more generally control all 'first-order configurations' in $G$.

Theorem 3: We have $(\ker\pi_*)*L^2(G)=0$. Thus all convolutions in $L^2(G)$ can be computed in $L^2(\B_m(G))$, in the sense that$$f*g=\pi^*(\pi_*f*\pi_*g)$$for all $f,g\in L^2(G)$.

The theorem follows from the following lemma.

Lemma 4: For all $f,g\in L^2(G)$ and $d\in\mathbf{R}$ we have$$\|f*g\|^2_{L^2(G)} \leq \left(\frac{1}{d}\|f\|_{L^2(G)}^2 + d\|\pi_*f\|_{L^2(\B_m(G))}^2\right)\|g\|_{L^2(G)}^2.$$In particular by optimising $d$ we have$$\|f*g\|^2_{L^2(G)} \leq 2\|\pi_*f\|_{L^2(\B_m(G))}\|f\|_{L^2(G)}\|g\|_{L^2(G)}^2.$$

$\def\st{\text{st}}$Proof: We borrow nonabelian harmonic analysis notation from Tao. Certainly we may assume that $f$ and $g$ are internal, say $f=\st\lim_{n\to p}f_n$ and $g=\st\lim_{n\to p}g_n$. Then by nonabelian Plancherel,$$\|f*g\|_{L^2(G)}^2 = \st\lim_{n\to p} \|f_n*g_n\|_{L^2(G_n)}^2= \st\lim_{n\to p} \sum_{\xi\in\widehat{G_n}} \dim V_\xi \|\hat{f_n}(\xi)\hat{g_n}(\xi)\|^2_{\text{HS}(V_\xi)}$$$$\leq\st\lim_{n\to p} \sup_\xi\|\hat{f_n}(\xi)\|^2_{\text{HS}(V_\xi)} \|g_n\|_{L^2(G_n)}^2 = \sup_\xi\st\lim_{n\to p}\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} \|g\|_{L^2(G)}^2,$$where in the last line the supremum is taken over all $\xi=(\xi_n)$. Fixing some such $\xi$, by Plancherel again$$\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} \leq \frac{1}{\dim V_{\xi_n}} \|f_n\|_{L^2(G_n)}^2,$$so we may assume that $\dim V_{\xi_n}\leq d$ for $p$-most $n$. But then the representations $\rho_{\xi_n}:G_n\to U(d)$ induce a measurable representation $\rho_\xi:G\to U(d)$, which in turn by the universal property of $\B_m(G)$ factors through a continuous representation $\rho'_\xi:\B_m(G)\to U(d)$. Thus$$\st\lim_{n\to p}\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} = \st\lim_{n\to p} \int f_n(x)f_n(y) \text{tr}\rho_{\xi_n}(xy^{-1})\,d\mu_{G_n}(x)\,d\mu_{G_n}(y)$$$$= \int f(x) f(y) \text{tr}\rho_\xi(xy^{-1})\,d\mu_G(x)\,d\mu_G(y) = \int f(x) f(y) \text{tr}\rho'_\xi(\pi(xy^{-1}))\,d\mu_G(x)\,d\mu_G(y)$$$$= \int \pi_*f(x') \pi_*f(y') \text{tr}\rho'_\xi(x'y'^{-1})\,d\mu_{\B_m(G)}(x)\,d\mu_{\B_m(G)}(y) \leq d \|\pi_*f\|_{L^1(\B_m(G))}^2\leq d\|\pi_*f\|_{L^2(\B_m(G))}^2.\square$$

3. Quasirandomness

From an additive combinatorics point of view, nonabelian groups obey a structure versus randomness principle: the asymptotic behaviour with respect to linear configurations can usually be described as some combination of abelian and random-like behaviour. Following Gowers, we call a sequence of finite groups $(G_n)$ quasirandom if the least dimension of a nontrivial representation of $G_n$ tends to infinity with $n$. For example for $n\geq 7$ every nontrivial representation of the alternating group $\text{Alt}(n)$ has dimension at least $n-1$, so the sequence $(\text{Alt}(n))$ is quasirandom.

Theorem 5: The ultrafinite group $G_p=\prod_{n\to p} G_n$ has trivial Bohr compactification if and only if the least dimension of a nontrivial representation of $G_n$ tends to infinity as $n\to p$. In particular, $(G_n)$ is quasirandom if and only if $\B_m(G_p)$ is trivial for every $p\in\beta\mathbf{N}\setminus\mathbf{N}$.

First we need a simple lemma.

Lemma 6: Let $G$ and $H$ be groups with $G$ finite and $f:G\to H$ a map which satisfies $f(xy)=f(x)f(y)$ for $1-o(1)$ of the pairs $(x,y)\in G^2$. Then there is a homomorphism $h:G\to H$ such that $f(x)=h(x)$ for $1-o(1)$ of the points $x\in G$.

Proof: For every $x\in G$ and for $1-o(1)$ of the pairs $(y,z)\in G^2$ we have$$f(xyz)f(yz)^{-1} = f(xy)f(z)(f(y)f(z))^{-1} = f(xy)f(y)^{-1},$$so for each $x\in G$ there is a unique $h(x)\in H$ such that$$h(x) = f(xy)f(y)^{-1}$$ for $1-o(1)$ of the points $y\in G$. Clearly $h(x)=f(x)$ for $1-o(1)$ of the points $x\in G$, and for $x_1,x_2\in G$ we have$$h(x_1)h(x_2)^{-1} = f(x_1y)f(y)^{-1}f(y)f(x_2 y)^{-1} = f(x_1y)f(x_2y)^{-1}=h(x_1x_2^{-1})$$for $1-o(1)$ of the points $y\in G$, in particular for at least one $y\in G$, so $h$ is a homomorphism.$\square$

Proof of Theorem 5: Suppose we have a sequence of nontrivial homomorphisms $f_n:G_n\to U(d)$ for all $n$ on some neighbourhood of $p$. Then $(f_n)$ induces a measurable homomorphism $f:G_p\to U(d)$, and since $U(d)$ has no small subgroups the induced homomorphism $f:G_p\to U(d)$ will also be nontrivial, so $\B_m(G_p)$ must be nontrivial.

Conversely suppose the least dimension of a nontrivial representation of $(G_n)$ tends to infinity as $n$ tends to $p$, and let $f:G_p\to U(d)$ be a measurable homomorphism. By a countable saturation argument there is an internal function $g:G_p\to U(d)$, say $g=\st\lim_{n\to p}g_n$, such that $f=g$ almost everywhere. Then $g_n$ satisfies $g_n(xy)=g_n(x)g_n(y)$ for $1-o(1)$ of the pairs $(x,y)\in G_n$, so by the lemma there is a homomorphism $h_n:G_n\to U(d)$ such that $g_n(x) = h_n(x)$ for $1-o(1)$ of the points $x\in G_n$. But by assumption any such homomorphism $h_n$ must be trivial for $n$ near enough to $p$, so we must have $g=1$ almost everywhere, so $f=1$ almost everywhere. Moreover since $f(x) = f(xy)f(y)^{-1}$ we must in fact have $f=1$ identically. Since the Peter-Weyl theorem implies that $\B_m(G_p)$ is an inverse limit of matrix groups this implies that $\B_m(G_p)$ is trivial.$\square$

Equations are generally easy to solve in quasirandom groups. We illustrate this point with the following theorem.

Theorem 7: Let $(G_n)$ be quasirandom and let $\epsilon>0$. Then there exists $n_0$ such that if $n\geq n_0$ and $A_n\subset G_n$ has density $|A_n|/|G_n|\geq\epsilon>0$ then we can find $x,y,z\in A_n$ with $xy=z$.

Proof: Let $p\in\beta\mathbf{N}\setminus\mathbf{N}$, let $G=\prod_{n\to p}G_n$, and let $f$ be the internal function $\st\lim_{n\to p} 1_{A_n}$. Then $\int f\,d\mu_G \geq \epsilon$, so if $\pi:G\to\B_m(G)$ is the Bohr compactification then $\int\pi_* f\,d\mu_{\B_m(G)}\geq\epsilon$. But by the previous theorem $\B_m(G)$ is trivial, so $\pi_* f$ is a constant $\geq\epsilon$, so by Theorem 3 we have$$\langle f*f,f\rangle_{L^2(G)} = \langle \pi^*(\pi_*f*\pi_*f),f\rangle_{L^2(G)} = \langle \pi_*f*\pi_*f,\pi_*f\rangle_{L^2(\B_m(G))} \geq \epsilon^3.$$In other words the number of pairs $(x,y)\in G_n^2$ such that $x,y,xy\in A_n$ is at least $(\epsilon^3-o(1))|G_n|$ as $n\to p$, but since $p$ was arbitrary this must hold as $n\to\infty$.$\square$

Here is another nice criterion for quasirandomness, which can be found in Gowers's original paper: $(G_n)$ is not quasirandom if and only if the groups $G_n$ have nontrivial abelian quotients or nontrivial small quotients. In our setup we can write this the following way.

Theorem 8: Let $G$ be an ultrafinite group. Then one of the following three alternatives hold:
1. $\B_m(G)$ is trivial.
2. $\B_m(G)$ has a nontrivial abelian quotient.
3. $\B_m(G)$ has a nontrivial finite quotient.

Proof: Suppose $G=\prod_{n\to p} G_n$. By the previous theorem if $\B_m(G)$ is nontrivial then the groups $G_n$ have bounded-dimensional nontrivial representations $\pi_n:G_n\to U(d)$ as $n\to p$. By Jordan's theorem, $\pi_n(G_n)$ has a normal abelian subgroup $A_n$ of bounded index. If $A_n=\pi_n(G_n)$ as $n\to p$ then 2 holds, while if $A_n<\pi_n(G_n)$ as $n\to p$ then 3 holds.$\square$

Using this theorem one can prove a sort of converse to Theorem 7. If $(G_n)$ is not quasirandom then there are arbitrarily large $n$ and product-free subsets $A_n\subset G_n$ of density bounded away from $0$. We leave the details to the reader.

4. Roth's theorem

As an application of group limits we can prove the following version of Roth's theorem.

Theorem 9: Let $G$ be a finite group on which the squaring map $s:y\mapsto y^2$ is $O(1)$-to-$1$. Let $A\subset G$ be a subset of density $|A|/|G|\geq\epsilon>0$. Then there are $\gtrsim_\epsilon |G|^2$ solutions to $y^2=xz$ in $A$. Equivalently, there are $\gtrsim_\epsilon|G|^2$ pairs $(a,b)\in G^2$ such that $a,ab,bab\in A$.

Proof: If the theorem fails then we have finite groups $G_n$, some $\epsilon>0$, and subsets $A_n\subset G_n$ of density $|A_n|/|G_n|\geq\epsilon$ for which there are fewer than $|G_n|^2/n$ pairs $(a,b)\in G_n^2$ such that $a,ab,bab\in A_n$. Let $G=\prod_{n\to p} G_n$ and $f = \st\lim_{n\to p} 1_{A_n}$. Then $\int_G f \,d\mu_G \geq \epsilon$ but$$\int_{G^2} f(a)f(ab)f(bab)\,d\mu_G^2 = 0.(*)$$But note$$\int_{G^2} f(a)f(ab)f(bab)\,d\mu_G^2 = \int_{G^2} f(x) f(y) f(x^{-1}y^2)\,d\mu_G^2= \langle f,(f*f)\circ s\rangle_{L^2(G)},$$and by Theorem 3 this becomes$$\langle f,\pi^*(\pi_*f*\pi_*f)\circ s\rangle_{L^2(G)} = \langle f, \pi^*((\pi_*f*\pi_*f)\circ s)\rangle_{L^2(G)}$$$$= \langle \pi_*f,(\pi_*f*\pi_*f)\circ s\rangle_{L^2(B)} = \int_{B^2} \pi_*f(a)\,\pi_*f(ab)\,\pi_*f(bab)\,d\mu_B^2,(**)$$where $B=\B_m(G)$.

Fix a small $\delta>0$, and let $g:B\to[0,1]$ be a continuous function such that $\|\pi_*f-g\|_{L^1(G)} \leq \delta$. Since a continuous function on a compact space is uniformly continuous we can find a neighbourhood $U$ of $1$ such that$$|g(x)-g(ux)|\leq \delta\quad\text{and}\quad|g(x)-g(xu)|\leq\delta$$ for all $x\in B$ and $u\in U$. Now note$$\int_B\int_{xU\times U} |\pi_*f(a)-g(a)| \,d\mu_B(a)d\mu_B(b)d\mu_B(x) = \mu_B(U)^2\|\pi_*f-g\|_{L^1(B)} \leq \mu_B(U)^2\delta,$$$$\int_B\int_{xU\times U} |\pi_*f(ab)-g(ab)| \,d\mu_B(a)d\mu_B(b)d\mu_B(x) = \mu_B(U)^2\|\pi_*f-g\|_{L^1(B)} \leq \mu_B(U)^2\delta,$$$$\int_B\int_{xU\times U} |\pi_*f(bab)-g(bab)| \,d\mu_B(a)d\mu_B(b)d\mu_B(x) = \mu_B(U)^2\|\pi_*f-g\|_{L^1(B)} \leq \mu_B(U)^2\delta,$$so if $R$ is the set of $x\in B$ such that$$\int_{xU\times U}|\pi_*f(a)-g(a)|\,d\mu_B(a)d\mu_B(b) \leq \delta^{1/2}\mu_B(U)^2,$$$$\int_{xU\times U}|\pi_*f(ab)-g(ab)|\,d\mu_B(a)d\mu_B(b) \leq \delta^{1/2}\mu_B(U)^2,$$$$\int_{xU\times U}|\pi_*f(bab)-g(bab)|\,d\mu_B(a)d\mu_B(b) \leq \delta^{1/2}\mu_B(U)^2,$$then $\mu_B(R^c)\leq 3\delta^{1/2}$. Thus$$\int_R g(x)\,d\mu_B(x) \geq \int_B g(x)\,d\mu_B(x) - 3\delta^{1/2}\geq \epsilon-4\delta^{1/2},$$
so there is some $x\in R$ such that$$g(x)\geq\epsilon-4\delta^{1/2}.$$Now using all our information about $x$ we can bound $(**)$ below:$$\int \pi_*f(a)\pi_*f(ab)\pi_*f(bab) \,d\mu_B(a)d\mu_B(b)\geq \int_{xU\times U} \pi_*f(a)\pi_*f(ab)\pi_*f(bab)\,d\mu_B(a)d\mu_B(b)$$$$\geq \int_{xU\times U} g(a)g(ab)g(bab)\,d\mu_B(a)d\mu_B(b) - 3\delta^{1/2}\mu_B(U)^2$$$$\geq \int_{xU\times U} g(x)^3\,d\mu_B(a)d\mu_B(b) - 6\delta\mu_B(U)^2 - 3\delta^{1/2}\mu_B(U)^2$$$$\geq\left( (\epsilon-4\delta^{1/2})^3 - 6\delta - 3\delta^{1/2}\right)\mu_B(U)^2.$$Now if $\delta$ is sufficiently small depending on $\epsilon$ this figure is positive, contradicting $(*)$.$\square$

We chose to count configurations of the form $(a,ab,bab)$ precisely because they are alternatively described by the rather simple equation $y^2=xz$. If instead we chose to count the more "obvious" nonabelian analgues of three-term arithmetic progressions, namely configurations of the form $(a,ab,ab^2)$, then we would be counting solutions to the more complicated equation $z = yx^{-1}y$. The problem is that the count of these configurations is not obviously controlled by convolutions, so we can't easily transport the problem to the Bohr compactification. In fact the situation is delicate and not completely understood: see for example this paper of Tao for the case of $\text{SL}_d(F)$.

Commuting probability of compact groups

2015-02-06T21:33:00.000+00:00

I mentioned before the following theorem of Hofmann and Russo, extending earlier work by Levai and Pyber on the profinite case.

Theorem (Hofmann and Russo): If $G$ is a compact group of positive commuting probability then the FC-center $F(G)$ is an open subgroup of $G$ with finite-index center $Z(F(G))$.

(I actually stated this theorem incorrectly previously, asserting the conclusion $G=F(G)$ as well; this is clearly false in general, for instance for $G=O(2)$.)

Here the FC-center of a group is the subgroup of elements with finitely many conjugates. In general a group is called FC if each of its elements has finitely many conjugates, and BFC if its elements have boundedly many conjugates. A theorem of Bernhard Neumann states that a group $G$ is BFC if and only if $[G,G]$ is finite.

I noticed today that one can prove this theorem rather easily by adapting the proof of Peter Neumann's theorem that a finite group with commuting probability bounded away from $0$ is small-by-abelian-by-small. Some parts of the argument below are present in scattered places in the above two papers, but I repeat them for completeness.

Proof: Let $\mu$ be the normalised Haar measure of $G$, and suppose that$$\mu(\{(x,y):xy=yx\})\geq\epsilon.$$Let $X_n$ be the set of elements in $G$ with at most $n$ conjugates. Then $X_n$ is closed, since any element $x$ with at least $n+1$ distinct conjugates $g_i^{-1}xg_i$ has a neighbourhood $U$ such that for all $u\in U$ the points $g_i^{-1}ug_i$ are distinct. Since$$\mu(\{(x,y):xy=yx\}) = \int 1/|x^G| \,d\mu(x) \leq \mu(X_n) + 1/n,$$we see that $\mu(X_n)\geq\epsilon/2$ for all $n\geq 2/\epsilon$. This implies that the group $H_n$ generated by $X_n$ is generated in at most $6/\epsilon$ steps, i.e., $H_n = X_n^{\lfloor 6/\epsilon\rfloor}$, which implies that $H_n$ is an open BFC subgroup of $G$. Since $(H_n)$ is an increasing sequence of finite-index subgroups it must terminate with some subgroup $F$, and in fact $F$ must be the FC-center of $G$. This proves that $F(G)$ is an open BFC subgroup of $G$.

In particular in its own right $F$ is a compact group with $[F,F]$ finite (by the theorem of Bernhard Neumann mentioned at the top of the page). Since the commutator map $[,]:F\times F\to[F,F]$ is a continuous map to a discrete set satisfying $[F,1]=1$ there must be a neighbourhood $U$ of $1$ such that $[F,U]=1$. This implies that $Z(F)$ is open, hence of finite-index in $F$.$\square$

For me, the Hofmann-Russo theorem is a negative result: it states that commuting probability does not extend in an interesting way to the category of compact groups. To be more specific we have the following corollary.

Corollary: If $G$ is a compact group of commuting probability $p>0$ then there is a finite group $H$ also of commuting probability $p$.

We need a simple lemma before proving the corollary.

Lemma: For each $n>0$ there is a finite group $K_n$ of commuting probability $1/n$.

Proof: If $n$ is odd then $D_n$ has commuting probability $(n+3)/(4n)$. We can use this formula alone and induction on $n$ to define appropriate groups $K_n$. Take $K_1=D_1$ and $K_2=D_3$. If $n>2$ is even take $K_n = K_2\times K_{n/2}$. If $n = 4k+1>2$ take $K_n = K_{k+1}\times D_n$. If $n=4k+3>2$ take $K_n = K_{k+1}\times D_{3n}$.$\square$

An isoclinism between two groups $G$ and $H$ is a pair of isomorphisms $G/Z(G)\to H/Z(H)$ and $[G,G]\to [H,H]$ which together respect the commutator map $[,]:G/Z(G)\times G/Z(G)\to[G,G]$. Clearly isoclinism preserves commuting probability. A basic theorem on isoclinism, due to Hall, is that every group $G$ is isoclinic to a stem group, a group $H$ satisfying $Z(H)\leq [H,H]$.

Proof of corollary: By the theorem the FC-center $F$ of $G$ has finite-index, say $n$, and moreover $F$ has finite-index center $Z(F)$ and therefore finite commutator subgroup $[F,F]$. Let $E$ be a stem group isoclinic to $F$. Then $E$ and $F$ have the same commuting probability, and $E$ is finite since $E/Z(E) \cong F/Z(F)$, $[E,E]\cong [F,F]$, and $Z(E)\leq [E,E]$, so we can take $H=K_{n^2}\times E$.$\square$

Leon Green's theorem

2015-01-29T01:29:00.000+00:00

The fundamental objects of study in higher-order Fourier analysis are nilmanifolds, or in other words spaces given as a quotient $G/\Gamma$ of a connected nilpotent Lie group $G$ by a discrete cocompact subgroup $\Gamma$. Starting with Furstenberg's work on Szemeredi's theorem and the multiple recurrence theorem, work by Host and Kra, Green and Tao, and several others has gradually established that nilmanifolds control higher-order linear configurations in the same way that the circle, as in the Hardy-Littlewood circle method, controls first-order linear configurations.

Of basic importantance in the study of nilmanifolds is equidistribution: one needs to know when the sequence $g^n x$ equidistributes and when it is trapped inside a subnilmanifold. It turns out that this problem was already studied by Leon Green in the 60s. To describe the theorem note first that the abelianisation map $G\to G/G_2$ induces a map from $G/\Gamma$ to a torus $G/(G_2\Gamma)$ which respects the action of $G$, and recall that equidistribution on tori is well understood by Weyl's criterion. Leon Green's beautiful theorem then states that $g^n x$ equidistributes in the nilmanifold if and only if its image in the torus $G/(G_2\Gamma)$ equidistributes.

Today at our miniseminar Aled Walker showed us Parry's nice proof of this theorem, which is more elementary than Green's original proof. During the talk there was some discussion about the importantance of various hypotheses such as 'simply connected' and 'Lie'. It turns out that the proof works rather generally for connected locally compact nilpotent groups, so I thought I would record the proof here with minimal hypotheses. The meat of the argument is exactly as in Aled's talk and, presumably, Parry's paper.

Let $G$ be an arbitrary locally compact connected nilpotent group, say with lower central series$$G=G_1\geq G_2 \geq\cdots\geq G_s\geq G_{s+1}=1,$$and let $\Gamma\leq G$ be a closed cocompact subgroup. Under these conditions the Haar measure $\mu_G$ of $G$ induces a $G$-invariant probability measure $\mu_{G/\Gamma}$ on $G/\Gamma$. We say that $x_n\in G/\Gamma$ is equidistributed if for every $f\in C(G/\Gamma)$ we have$$\frac{1}{N}\sum_{n=0}^{N-1} f(x_n) \to \int f \,d\mu_{G/\Gamma}.$$We fix our attention on the sequence$$x_n = g^n x$$for some $g\in G$ and $x\in G/\Gamma$. As before we have an abelianisation map$$\pi: G/\Gamma\to G/G_2\Gamma$$from the $G$-space $G/\Gamma$ to the compact abelian group $G/G_2\Gamma$. We define equidistribution on $G/G_2\Gamma$ similarly. The theorem is then the following.

Theorem: (Leon Green) For $g\in G$ and $x\in G/\Gamma$ the following are equivalent.
1. The sequence $g^n x$ is equidistributed in $G/\Gamma$.
2. The sequence $\pi(g^n x)$ is equidistributed in $G/G_2\Gamma$.
3. The orbit of $\pi(g)$ is dense in $G/G_2\Gamma$.
4. $\chi(\pi(g))\neq 0$ for every nontrivial character $\chi:G/G_2\Gamma\to\mathbf{R}/\mathbf{Z}$.

Item 1 above trivially implies every other item. The implication 4$\implies$3 (a generalised Kronecker theorem) follows by pulling back any nontrivial character of $(G/G_2\Gamma)/\overline{\langle\pi(g)\rangle}$. The implication 3$\implies$2 (a generalised Weyl theorem) follows from the observation that every weak* limit point of the sequence of measures$$ \frac{1}{N}\sum_{n=0}^{N-1} \delta_{\pi(g^n x)}$$must be shift-invariant and thus equal to the Haar measure. So the interesting content of the theorem is 2$\implies$1.

A word about the relation to ergodicity: By the ergodic theorem the left shift $\tau_g:x\mapsto gx$ is ergodic if and only if for almost every $x$ the sequence $g^n x$ equidistributes; on the other hand $\tau_g$ is uniquely ergodic, i.e., the only $\tau_g$-invariant measure is the given one, if and only if for every $x$ the sequence $g^n x$ equidistributes. Thus to prove the theorem above we must not only prove that $\tau_g$ is ergodic but that it is uniquely ergodic. Fortunately one can prove these two properties are equivalent in this case.

Lemma: If $\tau_g:G/\Gamma\to G/\Gamma$ is ergodic then it's uniquely ergodic.

Proof: (Furstenberg) By the ergodic theorem the set $A$ of $\mu_{G/\Gamma}$-generic points, in other words points $x$ for which$$\frac{1}{N}\sum_{n=0}^{N-1} f(g^n x) \to \int f \,d\mu_{G/\Gamma}$$for every $f\in C(G/\Gamma)$, has $\mu_{G/\Gamma}$-measure $1$, and clearly if $x\in A$ and $c\in G_s$ then $xc\in A$, so $A = p^{-1}(p(A))$, where $p$ is the projection of $G/\Gamma$ onto $G/G_s\Gamma$.

Now let $\mu'$ be any $\tau_g$-invariant ergodic measure. By induction we may assume that $\tau_g:G/G_s\Gamma\to G/G_s\Gamma$ is uniquely ergodic, so we must have $p_*\mu' = p_*\mu_{G/\Gamma}$, so$$\mu'(A) = p_*\mu'(p(A)) = p_*\mu_{G/\Gamma}(p(A)) = \mu_{G/\Gamma}(A) = 1.$$But by the ergodic theorem the set of $\mu'$-generic points must also have $\mu'$-measure $1$, so there must be some point which is both $\mu_{G/\Gamma}$- and $\mu'$-generic, and this implies that $\mu'=\mu_{G/\Gamma}$.$\square$

We need one more preliminary lemma about topological groups before we really get started on the proof.

Lemma: If $H$ and $K$ are connected subgroups of some ambient topological group then $[H,K]$ is also connected.

Proof: Since $(h,k)\mapsto [h,k]=h^{-1}k^{-1}hk$ is continuous certainly $C = \{[h,k]:h\in H,k\in K\}$ is connected, so $C^n = CC\cdots C$ is also connected, so because $1\in C^n$ for all $n$ we see that $[H,K]=\bigcup_{n=1}^\infty C^n$ is connected.$\square$

Thus if $G$ is connected then every term $G_1,G_2,G_3,\dots$ in the lower central series of $G$ is connected.

Proof of Leon Green's theorem: As noted it suffices to prove that $\tau_g$ acts ergodically on $G/\Gamma$ whenever it acts ergodically on $G/G_2\Gamma$. By induction we may assume that $\tau_g$ acts ergodically on $G/G_s\Gamma$. So suppose that $f\in L^2(G/\Gamma)$ is $\tau_g$-invariant. By decomposing $L^2(G/\Gamma)$ as a $\overline{G_s\Gamma}/\Gamma$-space we may assume that $f$ obeys$$f(cx)=\gamma(c)f(x)\quad(c\in G_s, x\in G/\Gamma)$$for some character $\gamma:G_s\Gamma/\Gamma\to S^1$. In particular $|f|$ is both $G_s$-invariant and $\tau_g$-invariant, so it factors through a $\tau_g$-invariant function $G/G_s\Gamma\to\mathbf{R}$, so it must be constant, say $1$. Moreover for every $b\in G_{s-1}$ the function$$\Delta_bf(x) = f(bx)\overline{f(x)}$$is $G_s$-invariant, and also a $\tau_g$ eigenvector:$$\Delta_bf(gx) = \gamma([b,g])\Delta_bf(x).$$By integrating this equation we find that either $\gamma([b,g])=1$, so $\Delta_bf$ is constant, or $\int\Delta_bf \,d\mu_{G/\Gamma}= 0$, so either way we have$$\int \Delta_bf\,d\mu_{G/\Gamma}\in \{0\}\cup S^1.$$But since $\int\Delta_bf\,d\mu_{G/\Gamma}$ is a continuous function of $b$ and equal to $1$ when $b=1$ we must have $\gamma([b,g])=1$ for all sufficiently small $b$, and thus for all $b$ by connectedness of $G_{s-1}$ and the identity$$[b_1b_2,g]=[b_1,g][b_2,g].$$Thus setting $\gamma(b)=\Delta_bf$ extends $\gamma$ to a homomorphism $G_{s-1}\to S^1$. In fact we can extend $\gamma$ still further to a function $G\to D_1$, where $D_1$ is the unit disc in $\mathbf{C}$, by setting$$\gamma(a) = \int \Delta_af\,d\mu_{G/\Gamma}.$$Now if $a\in G$ and $b\in G_{s-1}$ then$$\gamma(ba) = \int f(bax) \overline{f(x)}\,d\mu_{G/\Gamma} = \int \gamma(b)f(ax)\overline{f(x)}\,d\mu_{G/\Gamma}=\gamma(b)\gamma(a),$$and$$\gamma(ab) = \int f(abx)\overline{f(x)}\,d\mu_{G/\Gamma} = \int f(ax) \overline{f(b^{-1}x)}\,d\mu_{G/\Gamma} = \int f(ax) \overline{\gamma(b^{-1})}\overline{f(x)}\,d\mu_{G/\Gamma} = \gamma(b)\gamma(a),$$so$$\gamma(b)\gamma(a)=\gamma(ab)=\gamma(ba[a,b]) = \gamma(ba)\gamma([a,b])=\gamma(b)\gamma(a)\gamma([a,b]).$$Since $|\gamma(b)|=1$ we can cancel $\gamma(b)$, so$$\gamma(a)(\gamma([a,b])-1) = 0.$$Finally observe that $\gamma(a)$ is a continuous function of $a$, and $\gamma(1)=1$, so we must have $\gamma([a,b])=1$ for all sufficiently small $a$, and thus by connectedness of $G$ and the identity$$[a_1a_2,b]=[a_1,b][a_2,b]$$we must have $\gamma([a,b])=1$ identically. But this implies that $\gamma$ vanishes on all $s$-term commutators and thus on all of $G_s$, so in fact $f$ factors through $G/G_s\Gamma$, so it must be constant.$\square$

A remark is in order about the possibility that some of the groups $G_i$ and $G_i\Gamma$ are not closed. This should not matter. One could either read the above proof as it is written, noting carefully that I never said groups should be Hausdorff, or, what's similar, instead modify it so that whenever you quotient by a group $H$ you instead quotient by the group $\overline{H}$.

Embarrassingly, it's difficult to come up with a non-Lie group to which this generalised Leon Green's theorem applies. It seems that many natural candidates have the property that $G$ is not connected but $G/\Gamma$ is: for example consider$$\left(\begin{array}{ccc}1&\mathbf{R}\times\mathbf{Q}_2&\mathbf{R}\times\mathbf{Q}_2\\0&1&\mathbf{R}\times\mathbf{Q}_2\\0&0&1\end{array}\right)/\left(\begin{array}{ccc}1&\mathbf{Z}[1/2]&\mathbf{Z}[1/2]\\0&1&\mathbf{Z}[1/2]\\0&0&1\end{array}\right).$$So it would be interesting to know whether the theorem extends to such a case. Or perhaps there are no interesting non-Lie groups for this theorem, which would be a bit of a let down.

Joseph's conjectures on commuting probability, an ultrafinitary perspective

2014-11-19T20:41:00.000+00:00

The commuting probability $\Pr(G)$ of a finite group $G$ is the proportion of pairs $(x,y)\in G^2$ which commute. In 1977 Keith Joseph made three conjectures about the set
$$\mathcal{P} = \{\Pr(G) : G \text{ a finite group}\},$$
namely the following.

Joseph's conjectures:

J1. All limit points of $\mathcal{P}$ are rational.
J2. $\mathcal{P}$ is well ordered by >.
J3. $\{0\}\cup\mathcal{P}$ is closed.

Earlier this month I uploaded a preprint to the arxiv which proves the first two of these conjectures, and yesterday I gave a talk at the algebra seminar in Oxford about the proof. While preparing the talk I noticed that some aspects of the proof are simpler from an ultrafinitary perspective, basically because ultrafilters can be used to streamline epsilon management, and I gave one indication of this perspective during the talk. In this post I wish to lay out the ultrafinitary approach in greater detail.

Throughout this post we fix a nonprincipal ultrafilter $u\in\beta\mathbf{N}\setminus\mathbf{N}$, and we let $\mathbf{R}^*$ be the ultrapower $\mathbf{R}^\mathbf{N}/u$, where two elements of $\mathbf{R}^\mathbf{N}$ are considered equivalent iff they are equal in $u$-almost-every coordinate. The elements of $\mathbf{R}^*$ are called "nonstandard reals" or "hyperreals". There is a principle at work in nonstandard analysis, possibly called Łoś's theorem, which asserts, without going into the finer details, that all "first-order" things that you do with reals carry over in a natural way to the field of hyperreals, and everything more-or-less works just how you'd like it to. For instance, if $r$ and $s$ are hyperreals then $r<s$ naturally means that the inequality holds in $u$-almost-every coordinate, and similarly the field operations of $\mathbf{R}$ extend naturally, and with these definitions $\mathbf{R}^*$ becomes a totally ordered field. We will seldom spell out so explicitly how to naturally extend first-order properties in this way.

An ultrafinite group $G$ is an ultraproduct $\prod_u G_i = \prod_{i=1}^\infty G_i/u$ of finite groups. Its order $|G| = (|G_i|)/u$ is a nonstandard natural number, and its commuting probability $\Pr(G) = (\Pr(G_i))/u$ is a nonstandard real in the interval $[0,1]$. Joseph's first two conjectures can now be stated together in the following way.

Theorem: The commuting probability of every ultrafinite group $G$ has the form $q+\epsilon$, where $q$ is a standard rational and $\epsilon$ is a nonnegative infinitesimal.

Somewhat similarly, Joseph's third conjecture can be stated in an ultrafinitary way as follows: For every ultrafinite group $G$ there is a finite group $H$ such that $\text{st}\Pr(G)=\Pr(H)$. Here $\text{st}$ is the standard part operation, which maps a finite hyperreal to the nearest real. When phrased in this way it resembles a known result about compact groups. Every compact group $G$ has a unique normalised Haar measure, so we have a naturally defined notion of commuting probability $\Pr(G)$. However every compact group $G$ with $\Pr(G)>0$ has a finite-index abelian subgroup, and with a little more work one can actually find a finite group $H$ with $\Pr(G)=\Pr(H)$. This is a theorem of Hofmann and Russo. Nevertheless, I find Joseph's third conjecture rather hard to believe.

For us the most important theorem about commuting probability is a theorem of Peter Neumann, which states that if $\epsilon>0$ then every finite group $G$ such that $\Pr(G)\geq\epsilon$ has a normal subgroup $H$ such that $|G/H|$ and $|[H,H]|$ are both bounded in terms of $\epsilon$. To prove the above theorem we need the following "amplified" version:

Theorem (Neumann's theorem, amplified, ultrafinitary version): Every ultrafinite group $G$ has an internal normal subgroup $H$ such that $[H,H]$ is finite and such that almost every pair $(x,y)\in G^2$ such that $[x,y]\in[H,H]$ is contained in $H^2$.

Here if $G = \prod_u G_i$ we say that $S\subset G$ is internal if $S$ is itself an ultraproduct $\prod_u S_i$ of subsets $S_i\subset G_i$, and "almost every" needs little clarification because the set of pairs in question is an internal subset of $G^2$. (Otherwise we would need to introduce Loeb measure.)

Proof: If $\text{st}\Pr(G)=0$ the theorem holds with $H=1$, so we may assume $\text{st}\Pr(G)>0$. By Neumann's theorem $G$ has an internal normal subgroup $K_0$ of finite index such that $[K_0,K_0]$ is finite. Since $G/K_0$ is finite there are only finitely many normal subgroups $K\leq G$ containing $K_0$ and each of them is internal, so we may find normal subgroups $K,L\leq G$ containing $K_0$ such that $[K,L]$ is finite, and which are maximal subject to these conditions.

Suppose that a positive proportion of pairs $(x,y)\in G^2$ outside of $K\times L$ satisfied $[x,y]\in[K,L]$. Then we could find $(x,y)\in G^2\setminus (K\times L)$, say with $x\notin K$, such that for a positive proportion of $(k,l)\in K\times L$ we have $[xk,yl]\in[K,L]$. After a little commutator algebra one can show then that for a positive proportion of $l\in L$ we have $[x,l]\in[K,L]$, or in other words that the centraliser
$$ N_0 = C_{L/[K,L]}(x) = C_{L/[K,L]}(\langle K,x\rangle)$$
of $x$ in $L/[K,L]$ has finite index. But this implies that the largest normal subgroup contained in $N_0$, namely
$$ N = C_{L/[K,L]}(K'),$$
where $K'$ is the normal subgroup of $G$ generated by $K$ and $x$, also has finite index. Since certainly $K\leq C_{K'/[K,L]}(L)$ a classical theorem of Baer implies that
$$[K'/[K,L], L/[K,L]] = [K',L]/[K,L]$$
is finite, and hence that $[K',L]$ is finite, but this contradicts the maximality of $K$ and $L$.

Hence almost every pair $(x,y)\in G^2$ such that $[x,y]\in[K,L]$ is contained in $K\times L$, and thus also in $L\times K$, so the theorem holds for $H=K\cap L$.$\square$

Now let $G$ be any ultrafinite group $G$ and let $H$ be as in the theorem. Then
$$\Pr(G) = \frac1{|G/H|^2} \Pr(H) + \epsilon,$$
where $\epsilon$ is nonnegative and infinitesimal. Thus it suffices to show that $\Pr(H)$ has the form (standard rational) + (nonnegative infinitesimal) whenever $[H,H]$ is finite. Note in this case that Hall's theorem implies that the second centre
$$Z_2(H) = \{h\in H : [h,H]\subset Z(H)\}$$
has finite index. One can complete the proof using a little duality theory of abelian groups, but the ultrafinite perspective adds little here so I refer the reader to my paper.

The other thing I noticed while preparing my talk is that the best lower bound I knew for the order type of $\mathcal{P}$, $\omega^2$, is easy to improve to $\omega^\omega$, just by remembering that $\mathcal{P}$ is a subsemigroup of $(0,1]$. In fact the order type of a well ordered subsemigroup of $(0,1]$ is heavily restricted: it's either $0$, $1$, or $\omega^{\omega^\alpha}$ for some ordinal $\alpha$. This observation reduces the possibilities for the order type of $\mathcal{P}$ to $\{\omega^\omega,\omega^{\omega^2}\}$. I have no idea which it is!

The idempotent theorem

2014-11-13T00:13:00.000+00:00

Let $G$ be a locally compact abelian group and let $M(G)$ be the Banach algebra of regular complex Borel measures on $G$. Given $\mu\in M(G)$ its Fourier transform
$$\hat{\mu}(\gamma) = \int \overline{\gamma}\,d\mu,$$
is a continuous function defined on the Pontryagin dual $\hat{G}$ of $G$. If the measure $\mu$ is "nice" in some way then we expect some amount of regularity from the function $\hat{\mu}$. For instance if $\mu$ is actually an element of the subspace $L^1(G)\subset M(G)$ of measures absolutely continuous with respect to the Haar measure of $G$ then the Riemann-Lebesgue lemma asserts $\hat{\mu}\in C_0(\hat{G})$.

The idempotent theorem of Cohen, Helson, and Rudin describes the structure of measures $\mu$ whose Fourier transform $\hat{\mu}$ takes a discrete set of values, or equivalently, since $\|\hat{\mu}\|_\infty\leq\|\mu\|$, a finite set of values. To describe the theorem, note that we can define $P(\mu)$ for any polynomial $P$ by taking appropriate linear combinations of convolution powers of $\mu$, and moreover we have the relation $\widehat{P(\mu)} = P(\hat{\mu})$, where on the right hand side we apply $P$ pointwise. Thus if $\hat{\mu}$ takes only the values $a_1,\dots,a_n$ then by setting
$$P_i(x) = \prod_{j\neq i} (x-a_j)/(a_i-a_j)$$
we obtain a decomposition $\mu = a_1\mu_1 + \cdots + a_n\mu_n$ of $\mu$ into a linear combination of measures $\mu_i=P_i(\mu)$ whose Fourier transforms $\hat{\mu_i} = P_i(\hat{\mu})$ take only values $0$ and $1$. Such measures are called idempotent, because they are equivalently defined by $\mu\ast\mu=\mu$. By the argument just given it suffices to characterise idempotent measures: this explains the name of the theorem.

The most obvious example of an idempotent measure is the Haar measure $m_H$ of a compact subgroup $H\leq G$. Moreover we can multiply any idempotent measure $\mu$ by a character $\gamma\in\hat{G}$ to obtain a measure $\gamma\mu$ defined by
$$\int f \,d(\gamma\mu) = \int f\gamma\,d\mu.$$
This measure $\gamma\mu$ will again be idempotent, as
$$\int f\,d(\gamma \mu\ast\gamma \mu) = \int\int f(x+y)\gamma(x)\gamma(y)\,d\mu(x)d\mu(y) = \int\int f(x+y) \gamma(x+y)\,d\mu(x)d\mu(y) = \int f\gamma\,d\mu.$$
If we add or subtract two idempotent measures then though we may not have again an idempotent measure we certainly have a measure whose Fourier transform takes integer values. On reflection, it feels more natural in the setting of harmonic analysis to require that $\hat{\mu}$ takes values in a certain discrete subgroup than to require that it take values in $\{0,1\}$, so we relax our restriction so. The idempotent theorem states that we have already accounted for all those $\mu$ such that $\hat{\mu}$ is integer-valued.

Theorem (the idempotent theorem): For every $\mu\in M(G)$ such that $\hat{\mu}$ is integer-valued there is a finite collection of compact subgroups $G_1,\dots,G_k\leq G$, characters $\gamma_1,\dots,\gamma_k\in\hat{G}$, and integers $n_1,\dots,n_k\in\mathbf{Z}$ such that

$$\mu = n_1\gamma_1 m_{G_1} + \cdots + n_k\gamma_k m_{G_k}.$$

As a consequence we deduce a structure theorem for $\mu$ with $\hat{\mu}$ taking finitely many values, as we originally wanted: for every such $\mu$ there is a finite collection of compact subgroups $G_1,\dots,G_k\leq G$, characters $\gamma_1,\dots,\gamma_k\in\hat{G}$, and complex numbers $a_1,\dots,a_k\in\mathbf{C}$ such that
$$\mu = a_1\gamma_1 m_{G_1} + \cdots + a_k \gamma_k m_{G_k}.$$

The theorem was first proved in the case of $G=\mathbf{R}/\mathbf{Z}$ by Helson in 1953: in this case the theorem states simply that if $\hat{\mu}$ is integer-valued then it differs from some periodic function in finitely many places. In 1959 Rudin gave the theorem its present form and proved it for $(\mathbf{R}/\mathbf{Z})^d$. Finally in 1960 Cohen proved the general case, in the same paper in which he made the first substantial progress on the Littlewood problem. The proof was subsequently simplified a good deal, particularly by Amemiya and Ito in 1964. We reproduce their proof here.

First note that if $\hat{\mu}$ is integer-valued then $\mu$ is supported on a compact subgroup. Indeed by inner regularity there is a compact set $K$ such that $|\mu|(K^c)<0.1$, the set $U$ of all $\gamma\in\hat{G}$ such that $|1-\gamma|<0.1/\|\mu\|$ on $K$ is then open, and if $\gamma\in U$ then
$$\|\gamma\mu-\mu\| = \int_G |\gamma-1|\,d|\mu| \leq \int_K + \int_{K^c} < 0.1 + 0.1 < 1.$$
But if $\gamma\mu\neq\mu$ then
$$\|\gamma\mu-\mu\|\geq \|\widehat{\gamma\mu}-\hat{\mu}\|_\infty \geq 1,$$
so $\gamma\mu=\mu$ for all $\gamma\in U$. Thus $\Gamma=\{\gamma\in\hat{G}: \gamma\mu=\mu\}$ is an open subgroup of $\hat{G}$, so by Pontryagin duality its preannihilator $\Gamma^\perp = \{g\in G: \gamma(g)=1 \text{ for all }\gamma\in\Gamma\}$ is a compact subgroup of $G$. Clearly $\mu$ is supported on $\Gamma^\perp$. Thus from now on we assume $G$ is compact.

Fix a measure $\mu\in M(G)$ and let $A=\{\gamma\mu: \gamma\in\hat{G}\}$.

Lemma 1: If $\nu$ is a weak* limit point of $A$ then $\|\nu\|<\|\mu\|$.

Proof: Fix $\epsilon>0$ and suppose we could find $f\in C(G)$ such that $\|f\|_\infty\leq 1$ and $\int f\,d\nu > (1-\epsilon)\|\mu\|$. Let $\gamma\mu$ be close enough to $\nu$ that $\Re\int f\gamma\,d\mu > (1-\epsilon)\|\mu\|$. Write $\mu = \theta|\mu|$ and $f\gamma\theta = g + ih$. Then if $Z$ is the complex number $Z = \int (g+i|h|)\,d|\mu|$, then $|Z|\leq\|\mu\|$ and
$$\Re Z = \int g \,d|\mu| = \Re\int f\gamma\,d\mu > (1-\epsilon)\|\mu\|,$$
so we must have
$$\Im Z = \int |h|\,d|\mu| \leq (1-(1-\epsilon)^2)^{1/2}\|\mu\| \leq 2\epsilon^{1/2}\|\mu\|.$$
Thus also
$$\int |1 - f\gamma\theta| \,d|\mu| \leq \int |1 - g|\,d|\mu| + \int |h|\,d|\mu| \leq 3\epsilon^{1/2}\|\mu\|.$$
But if this holds for both $\gamma_1\mu$ and $\gamma_2\mu$, say with $\gamma_1\mu\neq\gamma_2\mu$, then we have
$$ 1\leq \|\gamma_1\mu-\gamma_2\mu\| \leq \int |\gamma_1 - f\gamma_1\gamma_2\theta|\,d|\mu| + \int |\gamma_2 - f\gamma_1\gamma_2\theta|\,d|\mu| \leq 6\epsilon^{1/2}\|\mu\|,$$
so $\epsilon \geq 1/(36\|\mu\|^2)$, so
$$\|\nu\| \leq \|\mu\| - \frac{1}{36\|\mu\|}.\square$$

Lemma 2: If $\nu$ is a weak* limit point of $A$ then $\nu$ is singular with respect to the Haar measure $m_G$ of $G$.

Proof: By the Radon-Nikodym theorem we have a decomposition $\mu = f m_G + \mu_s$ for some $f\in L^1(G)$ and some $\mu_s$ singular with respect to $m_G$. By the Riemann-Lebesgue lemma then $\nu$ is a limit point of $\{\gamma\mu_s:\gamma\in\hat{G}\}$. Thus for any open set $U$ and $f\in C(G)$ such that $\|f\|_\infty\leq 1$ and $f=0$ outside of $U$ we have
$$\left|\int f\,d\nu\right| \leq \sup_\gamma \left|\int f\gamma \,d\mu_s\right| \leq |\mu_s|(U),$$
so $|\nu|(U)\leq |\mu_s|(U)$. This inequality extends to Borel sets in the usual way, so $\nu$ is singular.$\square$

The theorem follows relatively painlessly from the two lemmas. Fix $\mu\in M(G)$ with $\hat{\mu}$ integer-valued and let $A = \{\gamma\mu: \int\gamma\,d\mu\neq 0\}$. Then $\overline{A}$ is weak* compact, so because $\|\cdot\|$ is lower semicontinuous in the weak* topology there is some $\nu\in\overline{A}$ of minimal norm. Since $\int d\nu$ is an integer different from $0$ we must have $\nu\neq 0$. Thus by Lemma 1 the set $\{\gamma\nu: \int\gamma\,d\nu\neq 0\}$ is finite. But this implies that
$$\nu = (n_1 \gamma_1 + \cdots + n_k \gamma_k) m_H\qquad(\ast)$$
for some $n_1,\dots,n_k\in\mathbf{Z}$, $\gamma_1,\dots,\gamma_k\in\hat{G}$, and $H=\{\gamma:\gamma\nu=\nu\}^\perp$ the support group of $\nu$. In particular $\nu$ is absolutely continuous with respect to $m_H$, so because $\nu|_H$ is in the weak* closure of $\{\gamma\mu|_H:\gamma\in\hat{G}\}$ we deduce from Lemma 2 that $\nu|_H = \gamma\mu|_H$ for some $\gamma$. Thus $\mu|_H$ is a nonzero measure of the form $(\ast)$ and we have an obvious mutually singular decomposition
$$\mu = \mu|_H + (\mu-\mu|_H).$$
Since $\|\mu-\mu|_H\| = \|\mu\| - \|\mu|_H\|\leq\|\mu\|-1$ the theorem follows by induction.

Erdős-Turán statistical group theory

2014-04-01T00:24:00.002+01:00

What is Erdős-Turán "statistical group theory"? Erdős and Turán published a series of seven papers with this title, from 1965 to 1972, in which they proved many beautiful statistical or counting results about permutations. For example, if $|\sigma|$ denotes the order of a permutation $\sigma\in S_n$, they showed that when $\sigma$ is chosen uniformly at random $\log|\sigma|$ is approximately normally distributed, with mean $(\log n)^2/2$ and variance $(\log n)^3/3$. Another typical result, though much simpler, is that if $A$ is any subset of $\{1,\dots,n\}$, then the probability that a random permutation contains no cycle with length in $A$ is at most $2\left(\sum_{a\in A} 1/a\right)^{-1}$.

Although most of their results are of the above approximate nature, they prove at least one beautiful exact counting result, and I thought I might relate it here.

Theorem: If $q$ is a prime power then the proportion of $\sigma\in S_n$ with order not divisible by $q$ is exactly

\[\left(1-\frac{1}{q}\right)\left(1-\frac{1}{2q}\right)\cdots\left(1-\frac{1}{\lfloor n/q\rfloor q}\right).\]

Proof: The number of $\sigma\in S_n$ with $m_1$ cycles of length $v_1$, $m_2$ cycles of length $v_2$, $\dots$, and $m_k$ cycles of length $v_k$, where $m_1 v_1 + \cdots m_k v_k = n$, is
\[\frac{n!}{m_1!\cdots m_k! v_1^{m_1}\cdots v_k^{m_k}}:\]
indeed, one can partition $\{1,\dots,n\}$ into $m_i$ sets of size $v_i$ for each $i$ in
\[\frac{n!}{m_1!\cdots m_k! v_1!^{m_1}\cdots v_k!^{m_k}}\]
ways, and then one can arrange each of the $m_i$ sets of size $v_i$ for each $i$ into cycles in
\[(v_1 - 1)!^{m_1} \cdots (v_k - 1)!^{m_k}\]
ways. Moreover, the order of every such $\sigma$ is $\text{lcm}(v_1,\dots,v_k)$, so the order of $\sigma$ is divisible by $q$ if and only if some $v_i$ is divisible by $q$. (This is the where we use the hypothesis that $q$ is a prime power.) Thus the proportion of $\sigma\in S_n$ with order not divisible by $q$ is
\[\sum\frac{1}{m_1!\cdots m_k! v_1^{m_1} \cdots v_k^{m_k}},\]
where the sum runs over all $k\geq 0$ and $2k$-tuples $(m_1,\dots,m_k,v_1,\dots,v_k)$ of positive integers such that $m_1 v_1 + \cdots m_k v_k = n$ and such that no $v_i$ is divisible by $q$. But this is just the coefficient of $X^n$ in
\[\prod_{v:q\nmid v} \sum_{m\geq 0} \frac{X^{mv}}{m! v^m}\\=\prod_{v:q\nmid v} \exp\left(\frac{X^v}{v}\right)\\= \exp\left(\sum_{v:q\nmid v} \frac{X^v}{v}\right)\\= \exp\left(\sum_v \frac{X^v}{v} - \sum_v \frac{X^{qv}}{qv}\right)\\= \exp\left(-\log(1-X) + \log(1-X^q)/q\right)\\ = \frac{(1-X^q)^{1/q}}{1-X}\\= (1 + X + \cdots + X^{q-1}) (1-X^q)^{-\frac{q-1}{q}}\\= (1+X+\cdots+X^{q-1}) \left(1 + \left(1-\frac{1}{q}\right) X^q + \left(1-\frac{1}{q}\right)\left(1-\frac{1}{2q}\right)X^{2q} + \cdots\right),\]
where the last equality follows from the binomial formula. This completes the proof.

Such an explicit formula should make us feel foolish for having used generating functions. Is there a direct combinatorial proof?

By the way, for those who are not already aware of this invaluable resource, almost all of Erdős's papers have been made freely available by the Rényi institute here: http://www.renyi.hu/~p_erdos/Erdos.html.

How many lattices are there?

2013-11-25T17:15:00.000+00:00

The basic question is the one in the title: How many lattices are there? I intend to answer two quite different interpretations of this question.

How many sublattices of the standard lattice $\mathbf{Z}^n$ are there? Here sublattice just means subgroup. There are infinitely many of course. Less trivially, how many sublattices of $\mathbf{Z}^n$ are there of index at most $m$?
How many lattices are there in $\mathbf{R}^n$ of covolume $1$? Here a lattice means a discrete subgroup of rank $n$, and covolume refers to the volume of a fundamental domain. Again, the answer is infinitely many. Less trivially, what (in an appropriate sense) is the volume of the set of covolume-$1$ lattices?

Question number 2 obviously needs some explanation. Covolume, as can easily be proved, is equal to the determinant of any complete set of spanning vectors, from which it follows that covolume-$1$ lattices are in some way parameterised by $\text{SL}_n(\mathbf{R})$. Actually, we are counting each lattice several times here, because each lattice has a stabliser isomorphic to $\text{SL}_n(\mathbf{Z})$. Thus, the space of covolume-$1$ lattices can be identified with the quotient space $\text{SL}_n(\mathbf{R})/\text{SL}_n(\mathbf{Z})$.

This subgroup $\text{SL}_n(\mathbf{Z})$ itself is also considered a lattice, in the group $\text{SL}_n(\mathbf{R})$. Generally, let $G$ be a locally compact group, and recall that $G$ possesses a unique (up to scale) left-invariant regular Borel measure, its Haar measure $\mu$. Suppose moreover that $G$ is unimodular, i.e., that $\mu$ is also right-invariant. If $H\leq G$ is a unimodular subgroup, and we fix a Haar measure on $H$ as well, then $\mu$ descends to a unique $G$-invariant regular Borel measure on the homogeneous space $G/H$ which we also denote by $\mu$. If $\mu(G/H)<\infty$, we say that $H$ has finite covolume. For example, if $H$ is a discrete subgroup then we can fix the counting measure as a bi-invariant Haar measure. If in this case $H$ has finite covolume then we say that $H$ is a lattice.

It is a classical theorem due to Minkowski, which we prove in a moment, that $\text{SL}_n(\mathbf{Z})$ is a lattice in the unimodular group $\text{SL}_n(\mathbf{R})$. We thus consider the volume of $\text{SL}_n(\mathbf{R})/\text{SL}_n(\mathbf{Z})$, for a natural choice of Haar measure, to be the answer to question 2 above, at least in some sense. The actual computation of this volume, I understand, is originally due to Siegel, but I learnt it from this MO answer.

Let $\lambda$ be the usual Lebesgue measure on $\mathbf{R}^{n^2}$, normalized (as usual) so that $\lambda(\mathbf{R}^{n^2}/\mathbf{Z}^{n^2}) = 1$, and note that $\lambda$ is invariant under the left multiplication action of $\text{SL}_n(\mathbf{R})$. Given closed $E\subset\text{SL}_n(\mathbf{R})$, we denote $\text{cone}(E)$ the union of all the line segments which start at $0\in\mathbf{R}^{n^2}$ and end in $E$, and we define
$$\mu(E) = \lambda(\text{cone}(E)).$$
This function $\mu$ extends to a nontrivial Borel measure in the usual way, and inherits regularity and both left- and right-invariance from $\lambda$. Thus $\text{SL}_n(\mathbf{R})$ is unimodular, and $\mu$ is a Haar measure. We consider this $\mu$ to be the natural choice of Haar measure.

Now let $E\subset\text{SL}_n(\mathbf{R})$ be a measurable fundamental domain for $\text{SL}_n(\mathbf{Z})$. Then, for $R>0$,
\[
\mu(\text{SL}_n(\mathbf{R})/\text{SL}_n(\mathbf{Z})) = \mu(E) = \lambda(\text{cone}(E)) = \frac{\lambda(R\,\text{cone}(E))}{R^{n^2}}.
\]
But $\lambda(R\,\text{cone}(E))$ is asymptotic, as $m\rightarrow\infty$, to $|R\,\text{cone}(E)\cap M_n(\mathbf{Z})|$, which is just the number of sublattices of $\mathbf{Z}^n$ of index at most $R^n$. Thus we have reduced question 2 to question 1.

To finish the proof of Minkowski's theorem, it suffices to show that there are $O(R^{n^2})$ sublattices of $\mathbf{Z}^n$ of index at most $R^n$, and this much can be accomplished by a simple inductive argument. If $\Gamma\leq\mathbf{Z}^n$ has index at most $R^n$, then by Minkowski's convex bodies theorem there exists some nonzero $\gamma\in\Gamma$ of $\|\gamma\|_\infty\leq R$. Without loss of generality, $\Gamma\cap\mathbf{R}\gamma = \mathbf{Z}\gamma$. But then $\Gamma^\prime = \Gamma/(\mathbf{Z}\gamma)$ is a lattice in $\mathbf{Z}^n/(\mathbf{Z}\gamma)\cong\mathbf{Z}^{n-1}$ of index at most $R^n$. Since there are no more than $(2R+1)^n$ choices for $\gamma$ and, by induction, at most $O(R^{n(n-1)})$ choices for $\Gamma^\prime$, there are at most $O(R^{n^2})$ choices for $\Gamma$.

But with a more careful analysis, this calculation can actually be carried out explicitly. An elementary(-ish) calculation shows that the number $a_m(\mathbf{Z}^n)$ of subgroups of $\mathbf{Z}^n$ of index exactly $m$ is the coefficient of $m^{-s}$ in the Dirichlet series of
\[
\zeta_{\mathbf{Z}^n}(s) = \zeta(s)\zeta(s-1)\cdots\zeta(s-n+1),
\]
where $\zeta(s) = \sum_{n\geq1} n^{-s}$ is the Riemann zeta function. Thus, by Perron's formula, if $x\notin\mathbf{Z}$,
\[
\frac{1}{x^n}\sum_{1\leq m< x} a_m(\mathbf{Z}^n) = \frac{1}{i2\pi} \int_{n+1/2-i\infty}^{n+1/2+i\infty} \zeta_{\mathbf{Z}^n}(z) \frac{x^{z-n}}{z}\,dz.
\]
But, by Cauchy's residue theorem, this integral differs from
\[
\text{Res}_{z=n}\left(\frac{\zeta_{\mathbf{Z}^n}(z) x^{z-n}}{z}\right) = \frac{1}{n}\zeta(2)\zeta(3)\cdots\zeta(n)
\]
by
\[
\frac{1}{i2\pi} \int_{n-1/2-i\infty}^{n-1/2+i\infty} \zeta_{\mathbf{Z}^n}(z) \frac{x^{z-n}}{z}\,dz,
\]
which tends to $0$ as $x\to\infty$. Thus there are about
\[
\frac{1}{n}\zeta(2)\zeta(3)\cdots\zeta(n) m^n
\]
sublattices of $\mathbf{Z}^n$ of index at most $m$, and
\[
\mu(\text{SL}_n(\mathbf{R})/\text{SL}_n(\mathbf{Z})) = \frac{1}{n}\zeta(2)\zeta(3)\cdots\zeta(n).
\]

Total ergodicity

2013-05-28T17:54:00.000+01:00

A measure-preserving transformation $T$ of a (finite) measure space $(X,\Sigma,\mu)$ is called ergodic if whenever $A\subset X$ satisfies $T^{-1}A=A$ we have $\mu(A)=0$ or $\mu(A^c)=0$. To aid intuition, it is useful to keep in mind the ergodic theorem, which roughly states that ergodicity equals equidistribution.

Following a discussion today, I'm concerned here with the ergodicity of $T^n$ for $n\in\mathbf{N}$. This is an entertaining topic, and I thank Rudi Mrazovic for bringing it to my attention. Here's an amusing example: While $T$ being ergodic does not generally imply that $T^2$ is ergodic, $T^2$ being ergodic does imply that $T^4$ is ergodic.

Let's start with a quite simple observation.

Lemma: If $n\in\mathbf{N}$ and $T^n$ is ergodic then $T$ is ergodic.

Proof: Suppose $T^{-1}A=A$. Then $T^{-n}A=A$, so $\mu(A)=0$ or $\mu(A^c)=0$. $\square$

What is the picture of $T$ being ergodic and $T^n$ not being ergodic? Certainly one picture to imagine is a partition $X = E\cup T^{-1}E\cup\cdots\cup T^{-(d-1)}E$ with $1<d\mid n$ and $T^{-d}E=E$, or in other words a factor $X\to \mathbf{Z}/d\mathbf{Z}$. Possibly surprisingly, this is the whole story.

Theorem: If $T$ is ergodic and $T^n$ is not ergodic then for some divisor $d>1$ of $n$ there exists a partition of $X$ as $E\cup T^{-1}E\cup\cdots\cup T^{-(d-1)}E \cup N$ where $T^{-d}E=E$ and $\mu(N)=0$.

Proof: Since $T^n$ is not ergodic there exists a set $A$ with $T^{-n}A=A$ and $\mu(A)>0$ and $\mu(A^c)>0$. For any subset $S\subset \mathbf{Z}/n\mathbf{Z}$ let $X_S$ consist of those $x\in X$ such that
$$\{j\in\mathbf{Z}/n\mathbf{Z} : T^j x \in A\} = t + S$$
for some $t\in\mathbf{Z}/n\mathbf{Z}$. (Note that because $x\in A$ iff $T^n x\in A$, the sentence "$T^j x \in A$" makes sense for $j\in \mathbf{Z}/n\mathbf{Z}$.) Note that $T^{-1}X_S = X_S$, so $\mu(X_S)=0$ or $\mu(X_S^c)=0$. Since $X = \bigcup_S X_S$, we must have $\mu(X_S^c)=0$ for some $S$.

Let $d$ be the smallest positive period of $S$. Note that $d=1$ iff $S=\emptyset$, contradicting $\mu(A)>0$, or $S=\mathbf{Z}/n\mathbf{Z}$, contradicting $\mu(A^c)>0$, so $d>1$. Let
$$E = \bigcap_{j\in S} T^{-j} A.$$
(Again note that $T^{-j}A$ makes sense for $j\in\mathbf{Z}/n\mathbf{Z}$.) Then $\mu(T^{-i}E \cap T^{-j} E) = 0$ unless $-i+S = -j+S$, i.e., iff $i-j$ is divisible by $d$, in which case $T^{-i}E=T^{-j}E$. Thus we have the desired partition
$$X_S = E \cup T^{-1}E \cup \cdots \cup T^{-(d-1)}E.\square$$

Note as a corollary that "only the primes matter" insofar as far as which $n\in N$ make $T^n$ ergodic: if $n>1$ and $T^n$ is not ergodic then for some prime $p|n$, $T^p$ is not ergodic. Combining this with the initial lemma, which shows that if $T^n$ is ergodic then $T^p$ is ergodic for every $p|n$, we have thus proved one half of the following theorem.

Theorem: Given a set $S\subset\mathbf{N}$, there exists a measure-preserving system $(X,\Sigma,\mu,T)$ such that $S = \{n\in\mathbf{N} : T^n \text{ is ergodic}\}$ if and only if $S$ is empty or the set of $n\in\mathbf{N}$ not divisible by any $p\in\mathcal{P}$, for some set of primes $\mathcal{P}$.

It remains only to construct a measure-preserving system for a given set $\mathcal{P}$ of primes. For $\mathcal{P}$ finite this can be achieved even with finite spaces. In general it suffices to look at $\prod_{p\in\mathcal{P}} \mathbf{Z}/p\mathbf{Z}$.

Fekete's lemma and sum-free sets

2013-01-13T21:38:00.001+00:00

Just a quick post to help popularise a useful lemma which seems to be well known to researchers but not to undergraduates. It would make for a good Analysis I exercise. Let $\mathbf{N}$ exclude $0$ for the purpose of this post.

Fekete's lemma: If $f:\mathbf{N}\to\mathbf{R}$ satisfies $f(m+n)\leq f(m)+f(n)$ for all $m,n\in\mathbf{N}$ then $f(n)/n\longrightarrow\inf_n f(n)/n$.

Proof: The inequality $\liminf f(n)/n \geq \inf_d f(d)/d$ is immediate from the definition of $\liminf$, so it suffices to prove $\limsup f(n)\leq f(d)/d$ for each $d\in\mathbf{N}$. Fix such a $d$ and set $M=\max\{0,f(1),f(2),\ldots,f(d-1)\}$. Set $f(0)=0$. Now for a given $n$ choose $k$ so that $0 \leq n-kd < d$. Then
\[\frac{f(n)}{n} \leq \frac{f(n-kd)}{n} + \frac{f(kd)}{n} \leq \frac{M}{n} + \frac{kf(d)}{kd} \longrightarrow \frac{f(d)}{d},\]
so $\limsup f(n)/n \leq f(d)/d$.

Example: For $n\in\mathbf{N}$ let $f(n)$ denote the largest $k\in\mathbf{N}$ such that every set of $n$ nonzero real numbers contains a subset of size $k$ containing no solutions to $x+y=z$. We call such a subset sum-free.

Claim: $f(n)/n$ converges as $n\to\infty$.

Proof: Let $A$ be a set of size $m$ containing no sum-free subset of size larger than $f(m)$, and $B$ a set of size $n$ containing no sum-free subset of size larger than $f(n)$. Then for large enough $M\in\mathbf{N}$, the set $A\cup MB$ is a set of size $m+n$ containing no sum-free subset of size larger than $f(m)+f(n)$, so $f(m+n)\leq f(m)+f(n)$. Thus by Fekete's lemma $f(n)/n$ converges to $\inf f(n)/n$.

The value of $\sigma = \lim f(n)/n$ not easy to compute, but certainly $0\leq\sigma\leq\frac{1}{2}$. A famously simple argument of Erdos shows $\sigma\geq\frac{1}{3}$: Fix a set $A$, and for simplicity assume $A\subset\mathbf{N}$. For $\theta\in[0,1]$ consider those $n\in A$ such that the fractional part of $\theta n$ lies between $\frac{1}{3}$ and $\frac{2}{3}$. This set $A_\theta$ is sum-free, and if $\theta$ is chosen uniformly at random then $A_\theta$ has size $|A|/3$ on average.

The example $A=\{1,2,3,4,5,6,8,9,10,18\}$, due to Malouf, shows $\sigma\leq\frac{2}{5}$. The current record is due to Lewko, who showed by example $\sigma\leq\frac{11}{28}$. Since these examples are somewhat ad hoc, while Erdos's argument is beautiful, one is naturally led to the following conjecture (indeed many people have made this conjecture):

Conjecture: $\sigma=\frac{1}{3}$.

Ben Green, Freddie Manners, and I have just proved this conjecture: see http://arxiv.org/abs/1301.4579.

Constructible regular polygons

2012-09-13T01:11:00.001+01:00

The proof presented below, giving an (almost) complete characterisation of constructible regular polygons, is such a beautiful gem of a proof that I can't help but record it here so that I might not forget it. I'll go over far more details than is usually done, simply because it amuses me how many different areas of undergraduate mathematics are touched upon.

Theorem: The regular $n$-gon is constructible by ruler and compass if and only if $n$ has the form $2^k p_1 \cdots p_l$, where $p_1,\ldots,p_l$ are distinct primes of the form $2^{2^m} + 1$.

Primes of this form are called Fermat primes. The only known Fermat primes are $3,5,17,257,65537$, and heuristics suggest that there may be no others, though this is an open problem.

We must say what we mean by "constructible". We assume we are given a plane (a sheet of paper, say), with two points already labelled. We parameterise the plane by points of $\mathbf{C}$, and we rotate and scale so that the two labelled points are $0$ and $1$. There are three things we are allowed to do:

Whenever we have two labelled points we can draw the straight line through them.
Whenever we have two labelled points we can draw the circle with one as centre and the other on the circumference.
We can label any point of intersection (of two lines, two circles, or a line and a circle).

Any point of the plane (considered as an element of $\mathbf{C}$) that can be labelled through a sequence of the above operations is called constructible. Denote by $\mathbf{E} \subset \mathbf{C}$ the set of constructible numbers.

Lemma: $\mathbf{E}$ is the smallest subfield of $\mathbf{C}$ closed under taking square roots.

The lemma can be proved as follows.

Show closure under $(x,y)\mapsto x+y$ by constructing the parallelogram with vertices $0,x,y,x+y$.
Show closure under the rotation $x\mapsto ix$.
Show closure under taking real parts. Hence $\mathbf{E} = \mathcal{R}\mathbf{E} + i \mathcal{R}\mathbf{E}$, where $\mathcal{R}\mathbf{E} = \mathbf{E}\cap\mathbf{R}$.
Show closure under $(x,y)\mapsto xy$ for $x,y\in\mathcal{R}\mathbf{E}$ by constructing the right triangle with leg lengths $1,x$ and the similar right triangle with base $y$. Deduce that $\mathbf{E}$ is a ring.
Show that if $0\neq x\in\mathbf{E}$ then $1/x\in\mathbf{E}$ by doing so first for $x\in\mathcal{R}\mathbf{E}$ (another right triangle construction) and then using $1/x = \bar{x}/|x|^2$. Thus $\mathbf{E}$ is a field.
Show that $\mathcal{R}\mathbf{E}$ is closed under taking square roots (yet another right triangle construction). Thus show that $\mathbf{E}$ is closed under taking square roots by bisecting an appropriate angle.
Finally show that any subfield $K$ of $\mathbf{C}$ closed under taking square roots is closed under the "constructing" rules we listed when defining $\mathbf{E}$. The idea here is that one never has to solve an equation worse than a quadratic (when intersecting two circles, first construct the perpendicular bisector to the segment connecting the two centres, and then compute the intersection of this line with one of the circles).

The following is our algebraic characterisation of constructibility.

Lemma: $x\in\mathbf{E}$ if and only if the degree $[K:\mathbf{Q}]$ of the normal closure $K$ of $\mathbf{Q}(x)$ is a power of $2$.

Proof: By the previous lemma, $\mathbf{E}$ can be described as the union of all fields $\mathbf{Q}(\alpha_1,\ldots,\alpha_m)$ such that $\alpha_k^2\in\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})$ for all $k=1,\ldots,m$. Let $x\in\mathbf{E}$. Thus $x$ lies in some such field $L = \mathbf{Q}(\alpha_1,\ldots,\alpha_m)$. But note by the tower law that
$$[L:\mathbf{Q}] = \prod_{k=1}^{m} [\mathbf{Q}(\alpha_1,\ldots,\alpha_k):\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})],$$where each factor $[\mathbf{Q}(\alpha_1,\ldots,\alpha_k):\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})]\leq 2$. Thus $[L:\mathbf{Q}]$ is a power $2$, and by another application of the tower law so is $[K:\mathbf{Q}]$.

Conversely suppose that $[K:\mathbf{Q}]$ is a power of $2$. Then the Galois group $\text{Gal}(K/\mathbf{Q})$ is a $2$-group. Recall that any $p$-group $G$ (except the trivial group) has nontrivial centre: by counting conjugacy class sizes, the size of $Z(G)$ must be divisible by $p$. Thus by induction (and classifying abelian $p$-groups) it follows that in every $p$-group $G$ there is a sequence of normal subgroups $G_i \triangleleft G$ such that $G_0 = \{e\}$, $G_n = G$, and $|G_k/G_{k-1}| = p$ for each $k=1,\ldots,n$. Specialising to $p=2$, Galois theory now implies that $K$ is the top of a tower of quadratic extensions starting from $\mathbf{Q}$, so $x\in K\subset\mathbf{E}$.

By saying "the $n$-gon is constructible" we mean that the vertices of some regular $n$-gon are constructible. Clearly this is equivalent to saying $\zeta_n = \exp(i2\pi/n)$ is an element of $\mathbf{E}$. Note that $\Phi_n(\zeta_n) = 0$, where the $n$th cyclotomic polynomial $\Phi_n$ is defined by $$\Phi_n(X) = \prod_{(k,n)=1} (X-\zeta_n^k).$$Note that $\prod_{d|n} \Phi_d(X) = X^n-1 \in \mathbf{Z}[X]$, so by the uniqueness of polynomial division and induction we have that $\Phi_n(X) \in \mathbf{Z}[X]$.

Lemma: $\Phi_n$ is irreducible over $\mathbf{Q}$.

Proof: Suppose $\Phi_n(X) = f(X) g(X)$ where $f$ is irreducible and $\deg f\geq 1$. We must show that each $\zeta_n^k$ is a root of $f$. Since some $\zeta_n^k$ is a root of $f$, it suffices to show that $f(z)=0$ implies $f(z^p)=0$ whenever $p$ is a prime not dividing $n$. Towards this end, suppose that $p$ is such a prime and $f(z)=0$ but $f(z^p)\neq 0$. Then $g(z^p) = 0$ since $\Phi_n(z^p)=0$. Since $f$ is irreducible this implies that $f(X)$ divides $g(X^p)$. Denoting reductions modulo $p$ by a tilde and using Freshman's dream, $\tilde{f}$ divides $\tilde{g}(X^p) = \tilde{g}(X)^p$. Thus $\tilde{f}$ and $\tilde{g}$ are not coprime, so $\tilde{\Phi_n}$, and thus $X^n-1$, has a multiple root over $\mathbf{F}_p$. But this is a contradiction since $X^n-1$ and its formal derivative $nX^{n-1}$ are coprime.

Thus $\zeta_n$ has minimal polynomial $\Phi_n$, and moreover $\mathbf{Q}(\zeta_n)/\mathbf{Q}$ is a normal extension. We have therefore reduced our task to finding those $n$ for which $[\mathbf{Q}(\zeta_n):\mathbf{Q}] = \deg\Phi_n = \varphi(n)$ is a power of $2$.

Writing $n$ as a product of primes $n=p_1^{e_1}\cdots p_m^{e_m}$ with each $e_i>0$, we have (OK, the details must stop somewhere) $$\varphi(n) = p_1^{e_1 - 1}(p_1 - 1) \cdots p_m^{e_m-1} (p_m-1).$$This is a power of $2$ iff every $p_i\neq 2$ which occurs has $e_i = 1$ and $p_i - 1$ a power of $2$. But if $p = 2^k + 1$ is prime, then $k$ itself must be a power of $2$, since for every odd $q$ we have $$x^q + 1 = (x + 1)(x^{q-1} - x^{q-2} + - \cdots + 1).$$

Generating direct powers

2012-03-27T15:51:00.001+01:00

Before continuing, prove or disprove: If $G$ is a group, the direct power $G^{\times n}$ is never generated by fewer than $n$ elements.

Certainly this is the case if $G$ is abelian, as in this case $G^{\times n}$ has a quotient of the form $\mathbf{F}_p^n$, i.e., an $n$-dimensional vector space over $\mathbf{F}_p$. This is also the case if $n\leq 2$. Examples of small size are therefore a little hard to find.

Here is a nice example: Denote the $m$th prime by $p_m$, and let $G = A_{p_m}$. Then I claim that $G^{\times(m-1)}$ can be generated with $3$ elements. Indeed, take $\alpha,\beta\in G^{\times(m-1)}$ to be in each factor equal to some two generators $\tilde{\alpha},\tilde{\beta}$ of $G$, and take $\gamma$ to be an element of the form (3-cycle, 5-cycle, 7-cycle, ... , $p_m$-cycle). Then the $(p_2\cdots p_j p_{j+2}\cdots p_m)$th power of $\gamma$ is a $p_{j+1}$-cycle in the $j$th factor. With $\alpha$ and $\beta$ we can generate all the conjugates of this cycle, and these together generate the whole $j$th factor by simplicity.

The probability of coprimality

2012-03-24T11:11:00.000+00:00

A famous result: Two positive integers chosen at random will be coprime with probability $6/\pi^2$.

One has to say what one means by this, since there is no uniform distribution on the positive integers. What we mean is that if we choose $x$ and $y$ uniformly and independently at random from the integers $\{1,\ldots,N\}$, then $x$ and $y$ will be coprime with probability tending to $6/\pi^2$ as $N\to\infty$.

Here is one classic proof: Consider the points $(x,y)\in\{1,\ldots,N\}^2$ such that $\gcd(x,y)=1$. If we scale this picture by $M$, we will have exactly the points $(X,Y)\in\{1,\ldots,MN\}^2$ such that $\gcd(X,Y)=M$. Therefore, whatever the limiting probability $P$ of being coprime is, the probability of having $\gcd = M$ is exactly $P/M^2$. The law of total probability then implies
\[ 1 = \lim_{N\to\infty}\mathbf{P}(\gcd(x,y)\in\mathbf{N}) = P \sum_{M=1} \frac{1}{M^2} = P\,\zeta(2).\]

Another way of imagining a random element on a noncompact group $G$ is to look at the profinite completion $\hat{G}$, which is always compact so always possesses a uniform distribution. Now by the Chinese Remainder Theorem,
\[ \hat{\mathbf{Z}} = \prod_p \mathbf{Z}_p,\]
so choosing an element at random from $\hat{\mathbf{Z}}$ is the same as choosing an element at random from $\mathbf{Z}_p$ for each $p$. The probability that a random element of $\mathbf{Z}_p$ is divisible by $p$ is precisely $1/p$, so the probably that two elements from $\hat{\mathbf{Z}}$ are coprime is precisely
\[ \prod_p (1-1/p^2) = 1/\zeta(2).\]
This does not imply the asymtotic result in $\mathbf{Z}$ mentioned above, but it is a nice way of thinking about it.

EDIT (28/4/12): As pointed out to me by Freddie Manners, the "classic proof" suggested above suffers from the fatal defect of not proving that the limit exists. Here is an alternative proof which does prove that the limit exists. More generally, let $d>1$ and consider positive integers $x_1,\ldots,x_d\leq X$ chosen uniformly at random. Let $G(X)$ be the number of $(x_1,\ldots,x_d)\in[1,X]^d$ such that $\gcd(x_1,\ldots,x_d)=1$, and observe that the number of $(x_1,\ldots,x_d)\in[1,X]^d$ such that $\gcd(x_1,\ldots,x_d) = n$ is precisely $G(X/n)$. Then
\[ \lfloor X\rfloor^d = \sum_{n\geq1} G(X/n) \]
for all $X$. Inverting this relationship,
\[ G(X) = \sum_{n\geq1} \mu(n) \lfloor X/n\rfloor^d. \]
Thus, the probability that $d$ randomly and uniformly chosen integers $x_1,\ldots,x_d\leq X$ are all coprime is exactly
\[P(X) = \sum_{n\geq1} \mu(n) \left(\frac{\lfloor X/n\rfloor}{\lfloor X\rfloor}\right)^d. \]
The $n$th term here is bounded by $n^{-d}$, so by the dominated convergence theorem $P(X)\to\sum_{n\geq1} \mu(n) n^{-d} = \zeta(d)^{-1}$.

Volume of the $n$-ball

2012-03-13T22:39:00.000+00:00

The following is a simple and beautiful proof, shown to me to my great delight while I was in high school, that the $n$-ball of radius $r$ has $n$-volume

\[\frac{\pi^{n/2} r^n}{\Gamma(\frac{n}{2}+1)}\]

Although I expect most people will know it, I believe that everybody should see it. I don't know the history of it, and would be interested in learning.

Suppose that the $n$-ball $B^n$ of radius $1$ has $n$-volume $V_n(1)$. Then, by considering linear transformations, the $n$-ball $rB^n$ of radius $r$ has $n$-volume $V_n(r) = V_n(1) r^n$. Moreover, differentiating this with respect to $r$ should produce the surface $(n-1)$-volume of the $(n-1)$-sphere $S^{n-1} = \partial B^n$: thus we expect $\text{vol}^{(n-1)}(S^{n-1}) = V_n(1) n r^{n-1}$.

Now consider the integral

\[I = \int_{\mathbf{R}^n} e^{-\pi|x|^2}\,dx.\]

We will compute $I$ in two different ways. On the one hand,

\[I = \int_{-\infty}^{+\infty}\cdots\int_{-\infty}^{+\infty} e^{-\pi x_1^2 -\cdots - \pi x_n^2}\,dx_1\cdots dx_n = \left(\int_{-\infty}^{+\infty} e^{-\pi x^2}\,dx\right)^n = 1\]

On the other hand, the form $I$ suggests introducing a radial coordinate $r=|x|$. Computing this way,

\[I = \int_0^\infty e^{-\pi r^2} (V_n(1) n r^{n-1})\,dr = \frac{n V_n(1)}{\pi^{n/2}} \int_0^\infty e^{-t^2} t^{n-1}\, dt\qquad\quad\]

\[\qquad\quad= \frac{n V_n(1)}{2 \pi^{n/2}} \int_0^\infty e^{-s} s^{n/2-1}\,ds = \frac{n V_n(1)}{2 \pi^{n/2}} \Gamma(n/2) = \frac{V_n(1)\Gamma(n/2+1)}{\pi^{n/2}}.\]

Rationals are repeating $p$-adics

2012-01-19T21:32:00.000+00:00

I've been amusing myself with $p$-adic arithmetic lately: I've never really got to know it until now.

We are all familiar with the fact that every $q\in\mathbf{Q}$ gets represented in $\mathbf{R}$ as a repeating decimal, or whatever your favourite base is. (Here I count terminating decimals as decimals which eventually repeat $000...$.) The converse is of course true as well: repeating decimals are rationals.

Is this true in $\mathbf{Q}_p$ as well? That is, are the rationals precisely those $p$-adics which are eventually repeating (in the other direction, of course)? One direction, that repeating $p$-adics are rational, is pretty obvious: if $a\in p^{-t}\mathbf{Z}$ and $b\in\mathbf{Z}$ then
\[
a + b p^r + b p^{r+s} + b p^{r+2s} + \cdots = a + b \frac{p^r}{1 - p^s}
\]
is rational. What about the converse?

The converse seems trickier. How again did we do it in $\mathbf{R}$? I don't even remember: it's one of those things that we know so fundamentally (until very recently, $\mathbf{Q}$ was almost defined in my brain as the reals which eventually repeat) that we forget how to prove it.

Who cares how to prove it? It is true, and it says, in base $p$, that if $q\in\mathbf{Q}$ then there exists $a\in p^{-t}\mathbf{Z}$ and $b\in\mathbf{Z}$ such that
\[
q = a + b p^r + b p^{r-s} + b p^{r-2s} + \cdots = a + b \frac{p^r}{1 - p^{-s}}.
\]
But hey, this implies that
\[
q = a - b \frac{p^{r+s}}{1 - p^s},
\]
which we already know is a repeating $p$-adic.

Inducing a Haar measure from a quotient

2012-01-19T17:23:00.001+00:00

Suppose that $G$ is a locally compact group and that $H$ is a closed subgroup with an $H$-left-invariant regular Borel measure $\mu_H$ such that $G/H$ possesses a $G$-left-invariant regular Borel measure $\mu_{G/H}$. For instance, $G = \mathbf{R}$, $H=\mathbf{Z}$, and $G/H= S^1$. The following is how you then induce a Haar measure on $G$. (Technically, it's easier to construct Haar measure on compact groups, so this extends that construction slightly.)

For $f\in C_c(G)$, define $T_H(f) : G\to \mathbf{C}$ by
\[
T_H(f)(x) = \int_H f(xh) d\mu_H.
\]
Let $K=\text{supp}f$. Then $f(xh)$, as a function of $h$ is supported on $x^{-1} K$, so the above integral is finite. Moreover, if $x,y\in U$ and $U$ is compact, then
\[
|T_H(f)(x) - T_H(f)(y)| = \left| \int_H (f(xh)-f(yh))d\mu_H\right| \leq \mu_H(H\cap U^{-1} K) \sup_h |f(xh)-f(yh)|.
\]
Because a continuous function on a compact set is uniformly continuous (in the sense that there exists a neighbourhood $V$ of $e$ such that $gh^{-1} \in V$ implies $|f(g)-f(h)| <\epsilon$), $T_H(f)$ is continous. Since $T_H(f)$ is $H$-right-invariant, $T_H(f)$ descends to a continuous function $\hat{T}_H(f)$ defined on $G/H$. Moreover, if $q$ is the quotient map $G\to G/H$, then $\hat{T}_H(f)$ is supported on $q(K)$, so $\hat{T}_H:C_c(G)\to C_c(G/H)$. Finally, define $\lambda: C_c(G)\to \mathbf{C}$ by
\[
\lambda(f) = \int_{G/H} \hat{T}_H(f) d\mu_{G/H}.
\]
This linear functional $\lambda$ is positive (in the sense that $f\geq0$ implies $\lambda(f)\geq0$), so the Riesz representation theorem guarantees the existence of a regular Borel measure $\mu_G$ on $G$ such that
\[
\lambda(f) = \int_G f d\mu_G
\]
for all $f\in C_c(G)$. It is now a simple matter to check that $\mu_G$ is $G$-left-invariant.

Deriving the normal distribution

2012-01-03T15:34:00.000+00:00

When I was in high school, I asked several a person, "Why the normal distribution?" After all, the function $e^{-x^2/2}/\sqrt{2\pi}$ looks like a pretty bizarre function to guess, when there are other functions like $1/(1+x^2)$ which produce perfectly fine bell-shaped curves. One answer I received was that the normal distribution is in some sense the limit of the binomial distribution. While this answer seems fair enough, I tried my hand at the mathematics and did not succeed, so I was still confounded. None the less I believed the answer, and it satisfied me for the time.

The real answer that I was looking for but did not appreciate until university was the central limit theorem. For me, the central limit theorem is the explanation of the normal distribution. In any case, the calculation that I attempted was basically a verification of the central limit theorem in a simple case, and it is a testement to the force of the central limit theorem that that simple case is difficult to work out by hand.

In this post, I rectify that calculation that I should have accomplished in high school (with the benefit of hind-sight being the correct factor of $\sqrt{n}$ rather than $n$). On the way, I will also check both laws of large numbers in this simple case.

Consider a "random walk" on $\bf Z$. Specifically, let $X$ be a random variable taking the values $1$ and $-1$ each with probability $1/2$, and let
\[
S_n = X_1+X_2+\ldots+X_n,
\]
where each $X_i$ is an independent copy of $X$. Note that $(X+1)/2$ is Bernoulli, so $S_n/2 + n/2$ is binomial, so
\[
{\bf P}(S_n = k) = {\bf P}(S_n/2 + n/2 = k/2 + n/2) = \left(\begin{array}{c}n\\ k/2+n/2\end{array}\right) 2^{-n},
\]
where we make the convention that the binomial coefficient is zero when it doesn't make sense. Hence, if $x>0$,
\[
{\bf P}(S_n/n \geq x) = \sum_{k\geq x} {\bf P}(S_n/n = k) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n}.
\]
By Stirling's formula, $m! \sim \sqrt{2\pi m} (m/e)^m$, so
\[
\left(\begin{array}{c}m\\k\end{array}\right) \sim \sqrt{\frac{m}{2\pi k(m-k)}} \frac{m^m}{k^k (m-k)^{m-k}}
\]
as $m,k\rightarrow\infty$. Hence, for constant $x>0$,
\[
{\bf P}(S_n/n \geq x) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n} \sim \frac{\sqrt{n}}{\sqrt{2\pi(1-x^2)}} \left((1+x)^{1+x} (1-x)^{1-x}\right)^{-n/2}.

\]

Let $f(x) = (1+x)^{1+x} (1-x)^{1-x}$. Note that $f(x)>0$ for $0<x<1$, that $f(0)=1$, and that
\[
\frac{f^\prime(x)}{f(x)} = (\log f(x))^\prime = \log(1+x) + 1 - \log(1-x) - 1 = \log\left(\frac{1+x}{1-x}\right) > 0,
\]
so $f(x)>1$ for all $0<x<1$. Hence
\[
{\bf P}(S_n/n\geq x) \rightarrow 0,
\]
and by symmetry,
\[
{\bf P}(S_n/n\leq -x) \rightarrow 0,
\]
as $n\rightarrow\infty$. Thus we have verified the weak law.

In fact, because the convergence above is geometric,
\[
\sum_{n=1}^\infty {\bf P}(S_n/n\geq x) < \infty,
\]
whence
\[
{\bf P}(S_n/n\geq x\text{ infinitely often}) = \lim_{m\rightarrow\infty} {\bf P}(S_n/n\geq x\text{ for some }n\geq m) \leq \lim_{m\rightarrow\infty} \sum_{n=m}^\infty {\bf P}(S_n/n\geq x) =0.
\]
(This is the Borel-Cantelli lemma.) Thus $S_n/n\rightarrow 0$ almost surely. Thus we have verified the strong law.

It remains to check the central limit theorem, i.e., to investigate the limiting distribution of $S_n/\sqrt{n}$. Now, for constant $x<y$,
\[
{\bf P}(x\leq S_n/\sqrt{n} \leq y) = \int_x^y\frac{\sqrt{n}}{2} \left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \,d\mu_n(t),
\]
where $\mu_n$ is the atomic measure assigning a mass $2/\sqrt{n}$ to each point of $(n+2{\bf Z})/\sqrt{n}$. Now Stirling's formula strikes again, giving
\[
\frac{\sqrt{n}}{2}\left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \sim \frac{1}{\sqrt{2\pi(1-t^2/n)}} \left((1+t/\sqrt{n})^{1+ t/\sqrt{n}} (1- t/\sqrt{n})^{1- t/\sqrt{n}}\right)^{-n/2} \longrightarrow \frac{1}{\sqrt{2\pi}} e^{-t^2/2},
\]
as
\[
\lim_{m\rightarrow\infty} \left((1+t/m)^{1+t/m} (1-t/m)^{1-t/m}\right)^{m^2/t^2} = \lim_{z\rightarrow\pm\infty} (1-1/z^2)^{z^2} (1+1/z)^z (1-1/z)^{-z} = e.
\]
Hence
\[
{\bf P}(x\leq S_n/\sqrt{n} \leq y) = \frac{1}{\sqrt{2\pi}} \int_x^y e^{-t^2/2}\,d\mu_n(t) + \int_x^y R_n(t)\,d\mu_n(t),
\]
where $R_n(t)\rightarrow 0$ as $n\rightarrow\infty$. Moreover this convergence is uniform in $t\in[x,y]$, so
\[
\left|\int_x^y R_n(t)\,d\mu_n(t)\right| \leq \mu_n([x,y]) \max_{x\leq t\leq y} |R_n(t)| \leq (y-x + 2/\sqrt{n})\max_{x\leq t\leq y} |R_n(t)| \rightarrow 0.
\]
Hence
\[
{\bf P}(x\leq S_n/\sqrt{n}\leq y) \rightarrow \frac{1}{\sqrt{2\pi}} \lim_{n\rightarrow\infty}\int_x^y e^{-t^2/2}\,d\mu_n(t) = \frac{1}{\sqrt{2\pi}}\int_x^y e^{-t^2/2}\,dt,
\]
where the last equality follows from the theorem that continuous functions are Riemann integrable. Thus we have verified the central limit theorem.

A simpler example of a nonmeasurable set

2011-11-14T13:52:00.001+00:00

I don't claim that I'm the first person to have come up with this example; I only claim that yesterday is the first time I thought of it. This is a rather sorry state of affairs, because it's an obvious and nice example.

First, some background, the example of a nonmeasurable set that I was given when I first learnt measure theory went basically as follows. Start with $[0,1]$, and call $x,y\in[0,1]$ equivalent if $x-y\in{\bf Q}$. This defines an equivalence relation, and thus we can define a set $Z$ to consist of exactly one representative from each equivalence class. Then the interval $[0,1]$ is just the disjoint union of translates mod $1$ of $Z$ by rational numbers mod $1$ (or something). Because the rationals in $[0,1]$ are countably infinite this is incompatible with $Z$ being measurable: if $\mu(Z)>0$ then then $\mu([0,1])=\infty$, and if $\mu(Z)=0$ then $\mu([0,1])=0$.

This example is fine and beautiful and all that, but I find it rather hard to remember and especially difficult to visualize. Thus, yesterday, when I was explaining to my girlfriend in an intuitive way why not all sets are measurable, the example I produced was, by accident, not the above example.

From an intuitive point of view, if we're talking about $[0,1]$ and we're generally thinking about mod $1$ then we're probably better off just talking about the circle $S^1$ and not confusing the issue. Moreover, now translation mod $1$ is just rotation, which is much easier to visualize and, importantly, much easier to believe is measure-preserving. Now we just need to decompose $S^1$ into a countably infinite collection of copies of some set $Z$. Towards this end, let $T$ be any irrational rotation of $S^1$. (When I was talking yesterday, I started with the concrete example of rotation by $1$ degree, and then hastily changed this to rotation by $1$ radian.) Then for any point $x$ we can consider the orbit $\{T^n(x)\}_{n\in\mathbb{Z}}$, and we thus decompose $S^1$ into a union of orbits under the action of $T$. Let $Z$ be a set consisting of exactly one representative of each of these orbits. Since every orbit is infinite, $S^1$ is the disjoint union of
\[\ldots,T^{-2}(Z), T^{-1}(Z), Z, T(Z), T^2(Z),\ldots.\]
Since $T$ is measure-preserving, we have a contradiction as before.

A question in general topology

2011-10-26T13:01:00.000+01:00

Recall the following theorem: Given a set $X$ with two topologies $\cal U$ and $\cal V$, with $\cal U$ weaker than $\cal V$, if $\cal U$ is Hausdorff and $\cal V$ is compact then in fact $\cal U = \cal V$.

In general, is there a "right" topology, in this sense? Specifically, given a Hausdorff topology $\cal U$ can you weaken it to a compact topology which is still Hausdorff, and given a compact topology $\cal V$ can you strengthen it to a Hausdorff topology which is still compact?

For example, consider the cofinite topology on $\omega$. The cofinite topology is always compact. Can we extend this topology to a Hausdorff topology? Yes we can: $\omega$ could just as well have been any countably infinite set, say $\{0\}\cup\{1/n: n\in{\bf Z}^+\}$, and now one checks that the usual topology (induced by $\mathbb{R}$, say) is compact, Hausdorff, and stronger than the cofinite topology (because finite sets are closed).

EDIT: A more or less complete answer can be found on http://mathoverflow.net/questions/15841/how-do-the-compact-hausdorff-topologies-sit-in-the-lattice-of-all-topologies-on-a.