Random Permutations: February 2015

A few months ago on his blog Terry Tao explained how one could, by analogy with the theory of graph limits, replace the explicit use of arithmetic regularity with a soft device which he calls an additive limit. Roughly speaking, the additive limit, or Kronecker factor, of a sequence of finite abelian groups $G_i$ is a compact quotient of the ultraproduct $\prod G_i$ which controls the convolutions. The result is that many theorems from additive combinatorics, such as Roth's theorem, which are usually proved using quantitative tools like Fourier analysis can instead be proved using soft tools more like the Lebesgue differentiation theorem.

The purpose of this post is to extend Tao's construction to the nonabelian setting. Tao already stated in his post that this should be possible, so one could say that this is just an exercise in nonabelian Fourier analysis. On the other hand the proof in the nonabelian setting more or less forces a more categorical point of view, so certain points of this exercise are instructive.

1. Measurable Bohr compactification

$\def\B{\text{Bohr}}\def\Ba{\text{Baire}}$Given any topological group $G$ there is a compact group $\B(G)$, called the Bohr compactification of $G$, such every continuous homomorphism from $G$ to a compact group $K$ factors uniquely through $\B(G)$. One can think of $\B(G)$ as the 'largest' compact group in which $G$ has dense image. We need a variant of this definition for groups $G$ endowed only with a $\sigma$-algebra instead of a topology.

By a measurable group we mean a group $G$ together with a $\sigma$-algebra $\Sigma$ of subsets of $G$. Note that we do not make any measurability assumptions about the group operation or even the left- or right-shifts, though certainly it would be sensible to do so in other contexts. For us the only role of $\Sigma$ is to distinguish among all homomorphisms the measurable homomorphisms. The analogue of Bohr compactification for measurable groups is given by the following theorem.

Theorem 1 (Existence of measurable Bohr compactification): For every measurable group $G$ there is a compact group $\B_m(G)$ together with a $(\Sigma,\Ba)$-measurable homomorphism $\pi:G\to \B_m(G)$ such that every $(\Sigma,\Ba)$-measurable homomorphism from $G$ to a compact group $K$ factors uniquely as the composition of $\pi:G\to \B_m(G)$ and a continuous homomorphism $\B_m(G)\to K$.

In category-theoretic terms $\B_m$ is a left adjoint to the functor $\Ba$ from compact groups to measurable groups which replaces a group's topology with its Baire $\sigma$-algebra. By the adjoint functor theorem it suffices to check that the functor $\Ba$ is continuous, which boils down to the following lemma. Incidentally the analogue of this lemma fails for the Borel $\sigma$-algebra, which is why we must consider the Baire $\sigma$-algebra instead.

Lemma 2: For compact Hausdorff spaces $X_i$ we have $\Ba(\prod X_i) = \prod \Ba(X_i)$. In words, the Baire $\sigma$-algebra of the product is the product of the Baire $\sigma$-algebras.

Proof: The containment $\prod \Ba(X_i) \subset \Ba(\prod X_i)$ is immediate from the defintions. To prove the opposite containment it suffices to check that every continuous function $f:\prod X_i \to \mathbf{R}$ is measurable with respect to $\prod\Ba(X_i)$. This is certainly true of functions $f$ which depend on only finitely many coordinates, and thus for all continuous functions $f$ by the Stone-Weierstrass theorem.$\square$

Those who, like me, are not used to thinking in terms of the adjoint functor theorem will appreciate a more pedestrian proof of Theorem 1. To this end, let $\mathcal{K}$ be the set of all pairs $(f,K)$, where $K$ is a compact group and $f:G\to K$ is a $(\Sigma,\Ba)$-measurable homomorphism with $f(G)$ dense in $K$. Then if$$\pi:G\to\prod_{(f,K)\in\mathcal{K}} K$$ is the diagonal map then $\pi:G\to\overline{\pi(G)}$ is the Bohr compactification of $G$. Indeed, the measurability of $\pi$ follows from Lemma 2, and the universal property is essentially obvious: given a $(\Sigma,\Ba)$-measurable homomorphism $f:G\to K$ with $K$ compact, the pair $(f,\overline{f(G)})$ appears in $\mathcal{K}$, so we have continuous maps$$\overline{\pi(G)}\to\prod_{(g,L)\in\mathcal{K}} L \to \overline{f(G)}\to K$$whose composition $h$ satisfies $h\pi = f$; moreover $h$ is unique because $\pi(G)$ is dense in $\overline{\pi(G)}$.

Alternatively, by the Peter-Weyl theorem, $\B_m(G)$ can be defined as the inverse limit of all measurable finite-dimensional unitary representations of $G$.

It is natural to ask what relation the measurable Bohr compactification $\B_m$ bears to the usual Bohr compactification $\B$. In particular, is $\B$ just the composition $\B_m\circ\Ba$? Clearly $\B(G) = \B_m(\Ba(G))$ if and only if every measurable homomorphism $f:G\to K$ from $G$ to a compact group $K$ is continuous. This follows from Steinhaus's theorem if $G$ is locally compact, but it certainly fails in general, for instance for $G=\mathbf{Q}$ with the topology inherited from $\mathbf{R}$.

2. The Bohr compactification of an ultrafinite group

Let $G_1,G_2,\dots$ be a sequence of finite groups and let $p\in \beta\mathbf{N} \setminus\mathbf{N}$ be a nonprincipal ultrafilter. We form the ultraproduct $G=\prod_{n\to p} G_n$ and make it into a measurable group by giving it the Loeb $\sigma$-algebra $\mathcal{L}_G$, the $\sigma$-algebra generated by internal sets $\prod_{n\to p} A_n$, where $A_n\subset G_n$. We define the Loeb measure $\mu_G$ on internal sets $\prod_{n\to p}A_n$ by putting$$\mu_G\left(\prod_{n\to p}A_n\right)=\text{st}\lim_{n\to p} |A_n|/|G_n|,$$and we define $\mu_G$ on $\mathcal{L}_G$ by extension.

While the group operation $G\times G\to G$ is not generally measurable with respect to the product $\sigma$-algebra $\mathcal{L}_G\times\mathcal{L}_G$, it is measurable with respect to the larger $\sigma$-algebra $\mathcal{L}_{G\times G}$. Moreover this latter $\sigma$-algebra is still 'product-like' in the sense that all $\mathcal{L}_{G\times G}$-measurable $f:G\times G\to\mathbf{R}_{\geq0}$ obey Fubini's theorem, so we have a sensibly defined convolution operation $L^1(G)\times L^1(G)\to L^1(G)$ given $$f*g(x) = \int f(y)g(y^{-1}x)\,d\mu_G(y).$$

Now consider the Bohr compactification $\B_m(G)$ of $G$. The first thing to notice is that $\pi_*\mu_G$ is a $\pi(G)$-invariant Baire probability measure on $\B_m(G)$. Since $\pi(G)$ is dense in $\B_m(G)$ we conclude that $\pi_*\mu_G$ is in fact $\B_m(G)$-invariant, so by the uniqueness of Haar measure we must have$$\pi_*\mu_G=\mu_{\B_m(G)}.$$

In the remainder of this section we relate the two convolution algebras $L^2(G)$ and $L^2(\B_m(G))$. Given $f\in L^2(\B_m(G))$ we can form the pullback $\pi^*f = f\circ\pi$. Since$$ \|f\circ\pi\|_{L^2(G)}^2 = \int |f\circ\pi|^2\,d\mu_G = \int |f|^2 \,d\mu_{\B_m(G)} = \|f\|_{L^2(\B_m(G))}^2,$$we see that $\pi^*f$ is a well defined element of $L^2(G)$, and in fact $\pi^*$ defines an isometric embedding$$\pi^*:L^2(\B_m(G))\to L^2(G).$$In the other direction we have the pushforward$$\pi_*:L^2(G)\to L^2(\B_m(G)),$$ defined as the adjoint of $\pi_*$. The identities $$\pi_*\pi^*f=f\quad\text{for }f\in L^2(\B_m(G)),$$$$\pi^*(f*g)=\pi^*f*\pi^*f\quad\text{for }f,g\in L^2(\B_m(G)),$$$$\pi_*(f*g)=\pi_*f*\pi_*g\quad\text{for }f,g\in L^2(G)$$are readily verified. For instance, the last of these is verified by the following computation, valid for $h\in L^2(\B_m(G))$ by $\mathcal{L}_{G\times G}$-Fubini:$$\langle h, \pi_*(f*g)\rangle = \langle \pi^* h, f*g\rangle = \int h(\pi(xy)) f(x)g(y)\,d\mu_G(x)\,d\mu_G(y)$$$$=\int h(x'y') \pi_*f(x')\pi_*g(y')\,d\mu_{\B_m(G)}(x')\,d\mu_{\B_m(G)}(y') = \langle h,\pi_*f*\pi_*g\rangle.$$We can summarise the situation as an isometric Banach algebra isomorphism$$L^2(G) \cong L^2(\B_m(G))\oplus \ker\pi_*.$$

The following theorem asserts that in fact $\B_m(G)$ alone determines convolutions in $G$, and thus $\B_m(G)$ will more generally control all 'first-order configurations' in $G$.

Theorem 3: We have $(\ker\pi_*)*L^2(G)=0$. Thus all convolutions in $L^2(G)$ can be computed in $L^2(\B_m(G))$, in the sense that$$f*g=\pi^*(\pi_*f*\pi_*g)$$for all $f,g\in L^2(G)$.

The theorem follows from the following lemma.

Lemma 4: For all $f,g\in L^2(G)$ and $d\in\mathbf{R}$ we have$$\|f*g\|^2_{L^2(G)} \leq \left(\frac{1}{d}\|f\|_{L^2(G)}^2 + d\|\pi_*f\|_{L^2(\B_m(G))}^2\right)\|g\|_{L^2(G)}^2.$$In particular by optimising $d$ we have$$\|f*g\|^2_{L^2(G)} \leq 2\|\pi_*f\|_{L^2(\B_m(G))}\|f\|_{L^2(G)}\|g\|_{L^2(G)}^2.$$

$\def\st{\text{st}}$Proof: We borrow nonabelian harmonic analysis notation from Tao. Certainly we may assume that $f$ and $g$ are internal, say $f=\st\lim_{n\to p}f_n$ and $g=\st\lim_{n\to p}g_n$. Then by nonabelian Plancherel,$$\|f*g\|_{L^2(G)}^2 = \st\lim_{n\to p} \|f_n*g_n\|_{L^2(G_n)}^2= \st\lim_{n\to p} \sum_{\xi\in\widehat{G_n}} \dim V_\xi \|\hat{f_n}(\xi)\hat{g_n}(\xi)\|^2_{\text{HS}(V_\xi)}$$$$\leq\st\lim_{n\to p} \sup_\xi\|\hat{f_n}(\xi)\|^2_{\text{HS}(V_\xi)} \|g_n\|_{L^2(G_n)}^2 = \sup_\xi\st\lim_{n\to p}\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} \|g\|_{L^2(G)}^2,$$where in the last line the supremum is taken over all $\xi=(\xi_n)$. Fixing some such $\xi$, by Plancherel again$$\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} \leq \frac{1}{\dim V_{\xi_n}} \|f_n\|_{L^2(G_n)}^2,$$so we may assume that $\dim V_{\xi_n}\leq d$ for $p$-most $n$. But then the representations $\rho_{\xi_n}:G_n\to U(d)$ induce a measurable representation $\rho_\xi:G\to U(d)$, which in turn by the universal property of $\B_m(G)$ factors through a continuous representation $\rho'_\xi:\B_m(G)\to U(d)$. Thus$$\st\lim_{n\to p}\|\hat{f_n}(\xi_n)\|^2_{\text{HS}(V_{\xi_n})} = \st\lim_{n\to p} \int f_n(x)f_n(y) \text{tr}\rho_{\xi_n}(xy^{-1})\,d\mu_{G_n}(x)\,d\mu_{G_n}(y)$$$$= \int f(x) f(y) \text{tr}\rho_\xi(xy^{-1})\,d\mu_G(x)\,d\mu_G(y) = \int f(x) f(y) \text{tr}\rho'_\xi(\pi(xy^{-1}))\,d\mu_G(x)\,d\mu_G(y)$$$$= \int \pi_*f(x') \pi_*f(y') \text{tr}\rho'_\xi(x'y'^{-1})\,d\mu_{\B_m(G)}(x)\,d\mu_{\B_m(G)}(y) \leq d \|\pi_*f\|_{L^1(\B_m(G))}^2\leq d\|\pi_*f\|_{L^2(\B_m(G))}^2.\square$$

3. Quasirandomness

From an additive combinatorics point of view, nonabelian groups obey a structure versus randomness principle: the asymptotic behaviour with respect to linear configurations can usually be described as some combination of abelian and random-like behaviour. Following Gowers, we call a sequence of finite groups $(G_n)$ quasirandom if the least dimension of a nontrivial representation of $G_n$ tends to infinity with $n$. For example for $n\geq 7$ every nontrivial representation of the alternating group $\text{Alt}(n)$ has dimension at least $n-1$, so the sequence $(\text{Alt}(n))$ is quasirandom.

Theorem 5: The ultrafinite group $G_p=\prod_{n\to p} G_n$ has trivial Bohr compactification if and only if the least dimension of a nontrivial representation of $G_n$ tends to infinity as $n\to p$. In particular, $(G_n)$ is quasirandom if and only if $\B_m(G_p)$ is trivial for every $p\in\beta\mathbf{N}\setminus\mathbf{N}$.

First we need a simple lemma.

Lemma 6: Let $G$ and $H$ be groups with $G$ finite and $f:G\to H$ a map which satisfies $f(xy)=f(x)f(y)$ for $1-o(1)$ of the pairs $(x,y)\in G^2$. Then there is a homomorphism $h:G\to H$ such that $f(x)=h(x)$ for $1-o(1)$ of the points $x\in G$.

Proof: For every $x\in G$ and for $1-o(1)$ of the pairs $(y,z)\in G^2$ we have$$f(xyz)f(yz)^{-1} = f(xy)f(z)(f(y)f(z))^{-1} = f(xy)f(y)^{-1},$$so for each $x\in G$ there is a unique $h(x)\in H$ such that$$h(x) = f(xy)f(y)^{-1}$$ for $1-o(1)$ of the points $y\in G$. Clearly $h(x)=f(x)$ for $1-o(1)$ of the points $x\in G$, and for $x_1,x_2\in G$ we have$$h(x_1)h(x_2)^{-1} = f(x_1y)f(y)^{-1}f(y)f(x_2 y)^{-1} = f(x_1y)f(x_2y)^{-1}=h(x_1x_2^{-1})$$for $1-o(1)$ of the points $y\in G$, in particular for at least one $y\in G$, so $h$ is a homomorphism.$\square$

Proof of Theorem 5: Suppose we have a sequence of nontrivial homomorphisms $f_n:G_n\to U(d)$ for all $n$ on some neighbourhood of $p$. Then $(f_n)$ induces a measurable homomorphism $f:G_p\to U(d)$, and since $U(d)$ has no small subgroups the induced homomorphism $f:G_p\to U(d)$ will also be nontrivial, so $\B_m(G_p)$ must be nontrivial.

Conversely suppose the least dimension of a nontrivial representation of $(G_n)$ tends to infinity as $n$ tends to $p$, and let $f:G_p\to U(d)$ be a measurable homomorphism. By a countable saturation argument there is an internal function $g:G_p\to U(d)$, say $g=\st\lim_{n\to p}g_n$, such that $f=g$ almost everywhere. Then $g_n$ satisfies $g_n(xy)=g_n(x)g_n(y)$ for $1-o(1)$ of the pairs $(x,y)\in G_n$, so by the lemma there is a homomorphism $h_n:G_n\to U(d)$ such that $g_n(x) = h_n(x)$ for $1-o(1)$ of the points $x\in G_n$. But by assumption any such homomorphism $h_n$ must be trivial for $n$ near enough to $p$, so we must have $g=1$ almost everywhere, so $f=1$ almost everywhere. Moreover since $f(x) = f(xy)f(y)^{-1}$ we must in fact have $f=1$ identically. Since the Peter-Weyl theorem implies that $\B_m(G_p)$ is an inverse limit of matrix groups this implies that $\B_m(G_p)$ is trivial.$\square$

Equations are generally easy to solve in quasirandom groups. We illustrate this point with the following theorem.

Theorem 7: Let $(G_n)$ be quasirandom and let $\epsilon>0$. Then there exists $n_0$ such that if $n\geq n_0$ and $A_n\subset G_n$ has density $|A_n|/|G_n|\geq\epsilon>0$ then we can find $x,y,z\in A_n$ with $xy=z$.

Proof: Let $p\in\beta\mathbf{N}\setminus\mathbf{N}$, let $G=\prod_{n\to p}G_n$, and let $f$ be the internal function $\st\lim_{n\to p} 1_{A_n}$. Then $\int f\,d\mu_G \geq \epsilon$, so if $\pi:G\to\B_m(G)$ is the Bohr compactification then $\int\pi_* f\,d\mu_{\B_m(G)}\geq\epsilon$. But by the previous theorem $\B_m(G)$ is trivial, so $\pi_* f$ is a constant $\geq\epsilon$, so by Theorem 3 we have$$\langle f*f,f\rangle_{L^2(G)} = \langle \pi^*(\pi_*f*\pi_*f),f\rangle_{L^2(G)} = \langle \pi_*f*\pi_*f,\pi_*f\rangle_{L^2(\B_m(G))} \geq \epsilon^3.$$In other words the number of pairs $(x,y)\in G_n^2$ such that $x,y,xy\in A_n$ is at least $(\epsilon^3-o(1))|G_n|$ as $n\to p$, but since $p$ was arbitrary this must hold as $n\to\infty$.$\square$

Here is another nice criterion for quasirandomness, which can be found in Gowers's original paper: $(G_n)$ is not quasirandom if and only if the groups $G_n$ have nontrivial abelian quotients or nontrivial small quotients. In our setup we can write this the following way.

Theorem 8: Let $G$ be an ultrafinite group. Then one of the following three alternatives hold:
1. $\B_m(G)$ is trivial.
2. $\B_m(G)$ has a nontrivial abelian quotient.
3. $\B_m(G)$ has a nontrivial finite quotient.

Proof: Suppose $G=\prod_{n\to p} G_n$. By the previous theorem if $\B_m(G)$ is nontrivial then the groups $G_n$ have bounded-dimensional nontrivial representations $\pi_n:G_n\to U(d)$ as $n\to p$. By Jordan's theorem, $\pi_n(G_n)$ has a normal abelian subgroup $A_n$ of bounded index. If $A_n=\pi_n(G_n)$ as $n\to p$ then 2 holds, while if $A_n<\pi_n(G_n)$ as $n\to p$ then 3 holds.$\square$

Using this theorem one can prove a sort of converse to Theorem 7. If $(G_n)$ is not quasirandom then there are arbitrarily large $n$ and product-free subsets $A_n\subset G_n$ of density bounded away from $0$. We leave the details to the reader.

4. Roth's theorem

As an application of group limits we can prove the following version of Roth's theorem.

Theorem 9: Let $G$ be a finite group on which the squaring map $s:y\mapsto y^2$ is $O(1)$-to-$1$. Let $A\subset G$ be a subset of density $|A|/|G|\geq\epsilon>0$. Then there are $\gtrsim_\epsilon |G|^2$ solutions to $y^2=xz$ in $A$. Equivalently, there are $\gtrsim_\epsilon|G|^2$ pairs $(a,b)\in G^2$ such that $a,ab,bab\in A$.

Proof: If the theorem fails then we have finite groups $G_n$, some $\epsilon>0$, and subsets $A_n\subset G_n$ of density $|A_n|/|G_n|\geq\epsilon$ for which there are fewer than $|G_n|^2/n$ pairs $(a,b)\in G_n^2$ such that $a,ab,bab\in A_n$. Let $G=\prod_{n\to p} G_n$ and $f = \st\lim_{n\to p} 1_{A_n}$. Then $\int_G f \,d\mu_G \geq \epsilon$ but$$\int_{G^2} f(a)f(ab)f(bab)\,d\mu_G^2 = 0.(*)$$But note$$\int_{G^2} f(a)f(ab)f(bab)\,d\mu_G^2 = \int_{G^2} f(x) f(y) f(x^{-1}y^2)\,d\mu_G^2= \langle f,(f*f)\circ s\rangle_{L^2(G)},$$and by Theorem 3 this becomes$$\langle f,\pi^*(\pi_*f*\pi_*f)\circ s\rangle_{L^2(G)} = \langle f, \pi^*((\pi_*f*\pi_*f)\circ s)\rangle_{L^2(G)}$$$$= \langle \pi_*f,(\pi_*f*\pi_*f)\circ s\rangle_{L^2(B)} = \int_{B^2} \pi_*f(a)\,\pi_*f(ab)\,\pi_*f(bab)\,d\mu_B^2,(**)$$where $B=\B_m(G)$.

Fix a small $\delta>0$, and let $g:B\to[0,1]$ be a continuous function such that $\|\pi_*f-g\|_{L^1(G)} \leq \delta$. Since a continuous function on a compact space is uniformly continuous we can find a neighbourhood $U$ of $1$ such that$$|g(x)-g(ux)|\leq \delta\quad\text{and}\quad|g(x)-g(xu)|\leq\delta$$ for all $x\in B$ and $u\in U$. Now note$$\int_B\int_{xU\times U} |\pi_*f(a)-g(a)| \,d\mu_B(a)d\mu_B(b)d\mu_B(x) = \mu_B(U)^2\|\pi_*f-g\|_{L^1(B)} \leq \mu_B(U)^2\delta,$$$$\int_B\int_{xU\times U} |\pi_*f(ab)-g(ab)| \,d\mu_B(a)d\mu_B(b)d\mu_B(x) = \mu_B(U)^2\|\pi_*f-g\|_{L^1(B)} \leq \mu_B(U)^2\delta,$$$$\int_B\int_{xU\times U} |\pi_*f(bab)-g(bab)| \,d\mu_B(a)d\mu_B(b)d\mu_B(x) = \mu_B(U)^2\|\pi_*f-g\|_{L^1(B)} \leq \mu_B(U)^2\delta,$$so if $R$ is the set of $x\in B$ such that$$\int_{xU\times U}|\pi_*f(a)-g(a)|\,d\mu_B(a)d\mu_B(b) \leq \delta^{1/2}\mu_B(U)^2,$$$$\int_{xU\times U}|\pi_*f(ab)-g(ab)|\,d\mu_B(a)d\mu_B(b) \leq \delta^{1/2}\mu_B(U)^2,$$$$\int_{xU\times U}|\pi_*f(bab)-g(bab)|\,d\mu_B(a)d\mu_B(b) \leq \delta^{1/2}\mu_B(U)^2,$$then $\mu_B(R^c)\leq 3\delta^{1/2}$. Thus$$\int_R g(x)\,d\mu_B(x) \geq \int_B g(x)\,d\mu_B(x) - 3\delta^{1/2}\geq \epsilon-4\delta^{1/2},$$
so there is some $x\in R$ such that$$g(x)\geq\epsilon-4\delta^{1/2}.$$Now using all our information about $x$ we can bound $(**)$ below:$$\int \pi_*f(a)\pi_*f(ab)\pi_*f(bab) \,d\mu_B(a)d\mu_B(b)\geq \int_{xU\times U} \pi_*f(a)\pi_*f(ab)\pi_*f(bab)\,d\mu_B(a)d\mu_B(b)$$$$\geq \int_{xU\times U} g(a)g(ab)g(bab)\,d\mu_B(a)d\mu_B(b) - 3\delta^{1/2}\mu_B(U)^2$$$$\geq \int_{xU\times U} g(x)^3\,d\mu_B(a)d\mu_B(b) - 6\delta\mu_B(U)^2 - 3\delta^{1/2}\mu_B(U)^2$$$$\geq\left( (\epsilon-4\delta^{1/2})^3 - 6\delta - 3\delta^{1/2}\right)\mu_B(U)^2.$$Now if $\delta$ is sufficiently small depending on $\epsilon$ this figure is positive, contradicting $(*)$.$\square$

We chose to count configurations of the form $(a,ab,bab)$ precisely because they are alternatively described by the rather simple equation $y^2=xz$. If instead we chose to count the more "obvious" nonabelian analgues of three-term arithmetic progressions, namely configurations of the form $(a,ab,ab^2)$, then we would be counting solutions to the more complicated equation $z = yx^{-1}y$. The problem is that the count of these configurations is not obviously controlled by convolutions, so we can't easily transport the problem to the Bohr compactification. In fact the situation is delicate and not completely understood: see for example this paper of Tao for the case of $\text{SL}_d(F)$.

Random Permutations

Monday, 9 February 2015

Group limits

Friday, 6 February 2015

Commuting probability of compact groups