Showing posts with label probability. Show all posts
Showing posts with label probability. Show all posts

Thursday, 29 January 2015

Leon Green's theorem

The fundamental objects of study in higher-order Fourier analysis are nilmanifolds, or in other words spaces given as a quotient $G/\Gamma$ of a connected nilpotent Lie group $G$ by a discrete cocompact subgroup $\Gamma$. Starting with Furstenberg's work on Szemeredi's theorem and the multiple recurrence theorem, work by Host and Kra, Green and Tao, and several others has gradually established that nilmanifolds control higher-order linear configurations in the same way that the circle, as in the Hardy-Littlewood circle method, controls first-order linear configurations.

Of basic importantance in the study of nilmanifolds is equidistribution: one needs to know when the sequence $g^n x$ equidistributes and when it is trapped inside a subnilmanifold. It turns out that this problem was already studied by Leon Green in the 60s. To describe the theorem note first that the abelianisation map $G\to G/G_2$ induces a map from $G/\Gamma$ to a torus $G/(G_2\Gamma)$ which respects the action of $G$, and recall that equidistribution on tori is well understood by Weyl's criterion. Leon Green's beautiful theorem then states that $g^n x$ equidistributes in the nilmanifold if and only if its image in the torus $G/(G_2\Gamma)$ equidistributes.

Today at our miniseminar Aled Walker showed us Parry's nice proof of this theorem, which is more elementary than Green's original proof. During the talk there was some discussion about the importantance of various hypotheses such as 'simply connected' and 'Lie'. It turns out that the proof works rather generally for connected locally compact nilpotent groups, so I thought I would record the proof here with minimal hypotheses. The meat of the argument is exactly as in Aled's talk and, presumably, Parry's paper.

Let $G$ be an arbitrary locally compact connected nilpotent group, say with lower central series$$G=G_1\geq G_2 \geq\cdots\geq G_s\geq G_{s+1}=1,$$and let $\Gamma\leq G$ be a closed cocompact subgroup. Under these conditions the Haar measure $\mu_G$ of $G$ induces a $G$-invariant probability measure $\mu_{G/\Gamma}$ on $G/\Gamma$. We say that $x_n\in G/\Gamma$ is equidistributed if for every $f\in C(G/\Gamma)$ we have$$\frac{1}{N}\sum_{n=0}^{N-1} f(x_n) \to \int f \,d\mu_{G/\Gamma}.$$We fix our attention on the sequence$$x_n = g^n x$$for some $g\in G$ and $x\in G/\Gamma$. As before we have an abelianisation map$$\pi: G/\Gamma\to G/G_2\Gamma$$from the $G$-space $G/\Gamma$ to the compact abelian group $G/G_2\Gamma$. We define equidistribution on $G/G_2\Gamma$ similarly. The theorem is then the following.

Theorem: (Leon Green) For $g\in G$ and $x\in G/\Gamma$ the following are equivalent.
1. The sequence $g^n x$ is equidistributed in $G/\Gamma$.
2. The sequence $\pi(g^n x)$ is equidistributed in $G/G_2\Gamma$.
3. The orbit of $\pi(g)$ is dense in $G/G_2\Gamma$.
4. $\chi(\pi(g))\neq 0$ for every nontrivial character $\chi:G/G_2\Gamma\to\mathbf{R}/\mathbf{Z}$.

Item 1 above trivially implies every other item. The implication 4$\implies$3 (a generalised Kronecker theorem) follows by pulling back any nontrivial character of $(G/G_2\Gamma)/\overline{\langle\pi(g)\rangle}$. The implication 3$\implies$2 (a generalised Weyl theorem) follows from the observation that every weak* limit point of the sequence of measures$$ \frac{1}{N}\sum_{n=0}^{N-1} \delta_{\pi(g^n x)}$$must be shift-invariant and thus equal to the Haar measure. So the interesting content of the theorem is 2$\implies$1.

A word about the relation to ergodicity: By the ergodic theorem the left shift $\tau_g:x\mapsto gx$ is ergodic if and only if for almost every $x$ the sequence $g^n x$ equidistributes; on the other hand $\tau_g$ is uniquely ergodic, i.e., the only $\tau_g$-invariant measure is the given one, if and only if for every $x$ the sequence $g^n x$ equidistributes. Thus to prove the theorem above we must not only prove that $\tau_g$ is ergodic but that it is uniquely ergodic. Fortunately one can prove these two properties are equivalent in this case.
Lemma: If $\tau_g:G/\Gamma\to G/\Gamma$ is ergodic then it's uniquely ergodic.
Proof: (Furstenberg) By the ergodic theorem the set $A$ of $\mu_{G/\Gamma}$-generic points, in other words points $x$ for which$$\frac{1}{N}\sum_{n=0}^{N-1} f(g^n x) \to \int f \,d\mu_{G/\Gamma}$$for every $f\in C(G/\Gamma)$, has $\mu_{G/\Gamma}$-measure $1$, and clearly if $x\in A$ and $c\in G_s$ then $xc\in A$, so $A = p^{-1}(p(A))$, where $p$ is the projection of $G/\Gamma$ onto $G/G_s\Gamma$.

Now let $\mu'$ be any $\tau_g$-invariant ergodic measure. By induction we may assume that $\tau_g:G/G_s\Gamma\to G/G_s\Gamma$ is uniquely ergodic, so we must have $p_*\mu' = p_*\mu_{G/\Gamma}$, so$$\mu'(A) = p_*\mu'(p(A)) = p_*\mu_{G/\Gamma}(p(A)) = \mu_{G/\Gamma}(A) = 1.$$But by the ergodic theorem the set of $\mu'$-generic points must also have $\mu'$-measure $1$, so there must be some point which is both $\mu_{G/\Gamma}$- and $\mu'$-generic, and this implies that $\mu'=\mu_{G/\Gamma}$.$\square$

We need one more preliminary lemma about topological groups before we really get started on the proof.
Lemma: If $H$ and $K$ are connected subgroups of some ambient topological group then $[H,K]$ is also connected.
Proof: Since $(h,k)\mapsto [h,k]=h^{-1}k^{-1}hk$ is continuous certainly $C = \{[h,k]:h\in H,k\in K\}$ is connected, so $C^n = CC\cdots C$ is also connected, so because $1\in C^n$ for all $n$ we see that $[H,K]=\bigcup_{n=1}^\infty C^n$ is connected.$\square$

Thus if $G$ is connected then every term $G_1,G_2,G_3,\dots$ in the lower central series of $G$ is connected.

Proof of Leon Green's theorem: As noted it suffices to prove that $\tau_g$ acts ergodically on $G/\Gamma$ whenever it acts ergodically on $G/G_2\Gamma$. By induction we may assume that $\tau_g$ acts ergodically on $G/G_s\Gamma$. So suppose that $f\in L^2(G/\Gamma)$ is $\tau_g$-invariant. By decomposing $L^2(G/\Gamma)$ as a $\overline{G_s\Gamma}/\Gamma$-space we may assume that $f$ obeys$$f(cx)=\gamma(c)f(x)\quad(c\in G_s, x\in G/\Gamma)$$for some character $\gamma:G_s\Gamma/\Gamma\to S^1$. In particular $|f|$ is both $G_s$-invariant and $\tau_g$-invariant, so it factors through a $\tau_g$-invariant function $G/G_s\Gamma\to\mathbf{R}$, so it must be constant, say $1$. Moreover for every $b\in G_{s-1}$ the function$$\Delta_bf(x) = f(bx)\overline{f(x)}$$is $G_s$-invariant, and also a $\tau_g$ eigenvector:$$\Delta_bf(gx) = \gamma([b,g])\Delta_bf(x).$$By integrating this equation we find that either $\gamma([b,g])=1$, so $\Delta_bf$ is constant, or $\int\Delta_bf \,d\mu_{G/\Gamma}= 0$, so either way we have$$\int \Delta_bf\,d\mu_{G/\Gamma}\in \{0\}\cup S^1.$$But since $\int\Delta_bf\,d\mu_{G/\Gamma}$ is a continuous function of $b$ and equal to $1$ when $b=1$ we must have $\gamma([b,g])=1$ for all sufficiently small $b$, and thus for all $b$ by connectedness of $G_{s-1}$ and the identity$$[b_1b_2,g]=[b_1,g][b_2,g].$$Thus setting $\gamma(b)=\Delta_bf$ extends $\gamma$ to a homomorphism $G_{s-1}\to S^1$. In fact we can extend $\gamma$ still further to a function $G\to D_1$, where $D_1$ is the unit disc in $\mathbf{C}$, by setting$$\gamma(a) = \int \Delta_af\,d\mu_{G/\Gamma}.$$Now if $a\in G$ and $b\in G_{s-1}$ then$$\gamma(ba) = \int f(bax) \overline{f(x)}\,d\mu_{G/\Gamma} = \int \gamma(b)f(ax)\overline{f(x)}\,d\mu_{G/\Gamma}=\gamma(b)\gamma(a),$$and$$\gamma(ab) = \int f(abx)\overline{f(x)}\,d\mu_{G/\Gamma} = \int f(ax) \overline{f(b^{-1}x)}\,d\mu_{G/\Gamma} = \int f(ax) \overline{\gamma(b^{-1})}\overline{f(x)}\,d\mu_{G/\Gamma} = \gamma(b)\gamma(a),$$so$$\gamma(b)\gamma(a)=\gamma(ab)=\gamma(ba[a,b]) = \gamma(ba)\gamma([a,b])=\gamma(b)\gamma(a)\gamma([a,b]).$$Since $|\gamma(b)|=1$ we can cancel $\gamma(b)$, so$$\gamma(a)(\gamma([a,b])-1) = 0.$$Finally observe that $\gamma(a)$ is a continuous function of $a$, and $\gamma(1)=1$, so we must have $\gamma([a,b])=1$ for all sufficiently small $a$, and thus by connectedness of $G$ and the identity$$[a_1a_2,b]=[a_1,b][a_2,b]$$we must have $\gamma([a,b])=1$ identically. But this implies that $\gamma$ vanishes on all $s$-term commutators and thus on all of $G_s$, so in fact $f$ factors through $G/G_s\Gamma$, so it must be constant.$\square$

A remark is in order about the possibility that some of the groups $G_i$ and $G_i\Gamma$ are not closed. This should not matter. One could either read the above proof as it is written, noting carefully that I never said groups should be Hausdorff, or, what's similar, instead modify it so that whenever you quotient by a group $H$ you instead quotient by the group $\overline{H}$.

Embarrassingly, it's difficult to come up with a non-Lie group to which this generalised Leon Green's theorem applies. It seems that many natural candidates have the property that $G$ is not connected but $G/\Gamma$ is: for example consider$$\left(\begin{array}{ccc}1&\mathbf{R}\times\mathbf{Q}_2&\mathbf{R}\times\mathbf{Q}_2\\0&1&\mathbf{R}\times\mathbf{Q}_2\\0&0&1\end{array}\right)/\left(\begin{array}{ccc}1&\mathbf{Z}[1/2]&\mathbf{Z}[1/2]\\0&1&\mathbf{Z}[1/2]\\0&0&1\end{array}\right).$$So it would be interesting to know whether the theorem extends to such a case. Or perhaps there are no interesting non-Lie groups for this theorem, which would be a bit of a let down.

Tuesday, 1 April 2014

Erdős-Turán statistical group theory

What is Erdős-Turán "statistical group theory"? Erdős and Turán published a series of seven papers with this title, from 1965 to 1972, in which they proved many beautiful statistical or counting results about permutations. For example, if $|\sigma|$ denotes the order of a permutation $\sigma\in S_n$, they showed that when $\sigma$ is chosen uniformly at random $\log|\sigma|$ is approximately normally distributed, with mean $(\log n)^2/2$ and variance $(\log n)^3/3$. Another typical result, though much simpler, is that if $A$ is any subset of $\{1,\dots,n\}$, then the probability that a random permutation contains no cycle with length in $A$ is at most $2\left(\sum_{a\in A} 1/a\right)^{-1}$.

Although most of their results are of the above approximate nature, they prove at least one beautiful exact counting result, and I thought I might relate it here.

Theorem: If $q$ is a prime power then the proportion of $\sigma\in S_n$ with order not divisible by $q$ is exactly
\[\left(1-\frac{1}{q}\right)\left(1-\frac{1}{2q}\right)\cdots\left(1-\frac{1}{\lfloor n/q\rfloor q}\right).\]

Proof: The number of $\sigma\in S_n$ with $m_1$ cycles of length $v_1$, $m_2$ cycles of length $v_2$, $\dots$, and $m_k$ cycles of length $v_k$, where $m_1 v_1 + \cdots m_k v_k = n$, is
\[\frac{n!}{m_1!\cdots m_k! v_1^{m_1}\cdots v_k^{m_k}}:\]
indeed, one can partition $\{1,\dots,n\}$ into $m_i$ sets of size $v_i$ for each $i$ in
\[\frac{n!}{m_1!\cdots m_k! v_1!^{m_1}\cdots v_k!^{m_k}}\]
ways, and then one can arrange each of the $m_i$ sets of size $v_i$ for each $i$ into cycles in
\[(v_1 - 1)!^{m_1} \cdots (v_k - 1)!^{m_k}\]
ways. Moreover, the order of every such $\sigma$ is $\text{lcm}(v_1,\dots,v_k)$, so the order of $\sigma$ is divisible by $q$ if and only if some $v_i$ is divisible by $q$. (This is the where we use the hypothesis that $q$ is a prime power.) Thus the proportion of $\sigma\in S_n$ with order not divisible by $q$ is
\[\sum\frac{1}{m_1!\cdots m_k! v_1^{m_1} \cdots v_k^{m_k}},\]
where the sum runs over all $k\geq 0$ and $2k$-tuples $(m_1,\dots,m_k,v_1,\dots,v_k)$ of positive integers such that $m_1 v_1 + \cdots m_k v_k = n$ and such that no $v_i$ is divisible by $q$. But this is just the coefficient of $X^n$ in
\[\prod_{v:q\nmid v} \sum_{m\geq 0} \frac{X^{mv}}{m! v^m}\\=\prod_{v:q\nmid v} \exp\left(\frac{X^v}{v}\right)\\= \exp\left(\sum_{v:q\nmid v} \frac{X^v}{v}\right)\\= \exp\left(\sum_v \frac{X^v}{v} - \sum_v \frac{X^{qv}}{qv}\right)\\= \exp\left(-\log(1-X) + \log(1-X^q)/q\right)\\ = \frac{(1-X^q)^{1/q}}{1-X}\\= (1 + X + \cdots + X^{q-1}) (1-X^q)^{-\frac{q-1}{q}}\\= (1+X+\cdots+X^{q-1}) \left(1 + \left(1-\frac{1}{q}\right) X^q + \left(1-\frac{1}{q}\right)\left(1-\frac{1}{2q}\right)X^{2q} + \cdots\right),\]
where the last equality follows from the binomial formula. This completes the proof.

Such an explicit formula should make us feel foolish for having used generating functions. Is there a direct combinatorial proof?

By the way, for those who are not already aware of this invaluable resource, almost all of Erdős's papers have been made freely available by the Rényi institute here: http://www.renyi.hu/~p_erdos/Erdos.html.

Saturday, 24 March 2012

The probability of coprimality

A famous result: Two positive integers chosen at random will be coprime with probability $6/\pi^2$.

One has to say what one means by this, since there is no uniform distribution on the positive integers. What we mean is that if we choose $x$ and $y$ uniformly and independently at random from the integers $\{1,\ldots,N\}$, then $x$ and $y$ will be coprime with probability tending to $6/\pi^2$ as $N\to\infty$.

Here is one classic proof: Consider the points $(x,y)\in\{1,\ldots,N\}^2$ such that $\gcd(x,y)=1$. If we scale this picture by $M$, we will have exactly the points $(X,Y)\in\{1,\ldots,MN\}^2$ such that $\gcd(X,Y)=M$. Therefore, whatever the limiting probability $P$ of being coprime is, the probability of having $\gcd = M$ is exactly $P/M^2$. The law of total probability then implies
\[ 1 = \lim_{N\to\infty}\mathbf{P}(\gcd(x,y)\in\mathbf{N}) = P \sum_{M=1} \frac{1}{M^2} = P\,\zeta(2).\]

Another way of imagining a random element on a noncompact group $G$ is to look at the profinite completion $\hat{G}$, which is always compact so always possesses a uniform distribution. Now by the Chinese Remainder Theorem,
\[ \hat{\mathbf{Z}} = \prod_p \mathbf{Z}_p,\]
so choosing an element at random from $\hat{\mathbf{Z}}$ is the same as choosing an element at random from $\mathbf{Z}_p$ for each $p$. The probability that a random element of $\mathbf{Z}_p$ is divisible by $p$ is precisely $1/p$, so the probably that two elements from $\hat{\mathbf{Z}}$ are coprime is precisely
\[ \prod_p (1-1/p^2) = 1/\zeta(2).\]
This does not imply the asymtotic result in $\mathbf{Z}$ mentioned above, but it is a nice way of thinking about it.


EDIT (28/4/12): As pointed out to me by Freddie Manners, the "classic proof" suggested above suffers from the fatal defect of not proving that the limit exists. Here is an alternative proof which does prove that the limit exists. More generally, let $d>1$ and consider positive integers $x_1,\ldots,x_d\leq X$ chosen uniformly at random. Let $G(X)$ be the number of $(x_1,\ldots,x_d)\in[1,X]^d$ such that $\gcd(x_1,\ldots,x_d)=1$, and observe that the number of $(x_1,\ldots,x_d)\in[1,X]^d$ such that $\gcd(x_1,\ldots,x_d) = n$ is precisely $G(X/n)$. Then
\[ \lfloor X\rfloor^d = \sum_{n\geq1} G(X/n) \]
for all $X$. Inverting this relationship,
\[ G(X) = \sum_{n\geq1} \mu(n) \lfloor X/n\rfloor^d. \]
Thus, the probability that $d$ randomly and uniformly chosen integers $x_1,\ldots,x_d\leq X$ are all coprime is exactly
\[P(X) = \sum_{n\geq1} \mu(n) \left(\frac{\lfloor X/n\rfloor}{\lfloor X\rfloor}\right)^d. \]
The $n$th term here is bounded by $n^{-d}$, so by the dominated convergence theorem $P(X)\to\sum_{n\geq1} \mu(n) n^{-d} = \zeta(d)^{-1}$.

Tuesday, 3 January 2012

Deriving the normal distribution


When I was in high school, I asked several a person, "Why the normal distribution?" After all, the function $e^{-x^2/2}/\sqrt{2\pi}$ looks like a pretty bizarre function to guess, when there are other functions like $1/(1+x^2)$ which produce perfectly fine bell-shaped curves. One answer I received was that the normal distribution is in some sense the limit of the binomial distribution. While this answer seems fair enough, I tried my hand at the mathematics and did not succeed, so I was still confounded. None the less I believed the answer, and it satisfied me for the time.

The real answer that I was looking for but did not appreciate until university was the central limit theorem. For me, the central limit theorem is the explanation of the normal distribution. In any case, the calculation that I attempted was basically a verification of the central limit theorem in a simple case, and it is a testement to the force of the central limit theorem that that simple case is difficult to work out by hand.

In this post, I rectify that calculation that I should have accomplished in high school (with the benefit of hind-sight being the correct factor of $\sqrt{n}$ rather than $n$). On the way, I will also check both laws of large numbers in this simple case.

Consider a "random walk" on $\bf Z$. Specifically, let $X$ be a random variable taking the values $1$ and $-1$ each with probability $1/2$, and let
\[
S_n = X_1+X_2+\ldots+X_n,
\]
where each $X_i$ is an independent copy of $X$. Note that $(X+1)/2$ is Bernoulli, so $S_n/2 + n/2$ is binomial, so
\[
{\bf P}(S_n = k) = {\bf P}(S_n/2 + n/2 = k/2 + n/2) = \left(\begin{array}{c}n\\ k/2+n/2\end{array}\right) 2^{-n},
\]
where we make the convention that the binomial coefficient is zero when it doesn't make sense. Hence, if $x>0$,
\[
{\bf P}(S_n/n \geq x) = \sum_{k\geq x} {\bf P}(S_n/n = k) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n}.
\]
By Stirling's formula, $m! \sim \sqrt{2\pi m} (m/e)^m$, so
\[
\left(\begin{array}{c}m\\k\end{array}\right) \sim \sqrt{\frac{m}{2\pi k(m-k)}} \frac{m^m}{k^k (m-k)^{m-k}}
\]
as $m,k\rightarrow\infty$. Hence, for constant $x>0$,
\[
{\bf P}(S_n/n \geq x) \leq \frac{n}{2} \left(\begin{array}{c} n\\\lceil (1+x)n/2\rceil \end{array}\right) 2^{-n} \sim \frac{\sqrt{n}}{\sqrt{2\pi(1-x^2)}} \left((1+x)^{1+x} (1-x)^{1-x}\right)^{-n/2}.

\]

Let $f(x) = (1+x)^{1+x} (1-x)^{1-x}$. Note that $f(x)>0$ for $0<x<1$, that $f(0)=1$, and that
\[
\frac{f^\prime(x)}{f(x)} = (\log f(x))^\prime = \log(1+x) + 1 - \log(1-x) - 1 = \log\left(\frac{1+x}{1-x}\right) > 0,
\]
so $f(x)>1$ for all $0<x<1$. Hence
\[
{\bf P}(S_n/n\geq x) \rightarrow 0,
\]
and by symmetry,
\[
{\bf P}(S_n/n\leq -x) \rightarrow 0,
\]
as $n\rightarrow\infty$. Thus we have verified the weak law.

In fact, because the convergence above is geometric,
\[
\sum_{n=1}^\infty {\bf P}(S_n/n\geq x) < \infty,
\]
whence
\[
{\bf P}(S_n/n\geq x\text{ infinitely often}) = \lim_{m\rightarrow\infty} {\bf P}(S_n/n\geq x\text{ for some }n\geq m)  \leq \lim_{m\rightarrow\infty} \sum_{n=m}^\infty {\bf P}(S_n/n\geq x) =0.
\]
(This is the Borel-Cantelli lemma.) Thus $S_n/n\rightarrow 0$ almost surely. Thus we have verified the strong law.

It remains to check the central limit theorem, i.e., to investigate the limiting distribution of $S_n/\sqrt{n}$. Now, for constant $x<y$,
\[
{\bf P}(x\leq S_n/\sqrt{n} \leq y) = \int_x^y\frac{\sqrt{n}}{2} \left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \,d\mu_n(t),
\]
where $\mu_n$ is the atomic measure assigning a mass $2/\sqrt{n}$ to each point of $(n+2{\bf Z})/\sqrt{n}$. Now Stirling's formula strikes again, giving
\[
\frac{\sqrt{n}}{2}\left(\begin{array}{c} n\\ (1+t/\sqrt{n})n/2\end{array}\right) 2^{-n} \sim \frac{1}{\sqrt{2\pi(1-t^2/n)}} \left((1+t/\sqrt{n})^{1+ t/\sqrt{n}} (1- t/\sqrt{n})^{1- t/\sqrt{n}}\right)^{-n/2} \longrightarrow \frac{1}{\sqrt{2\pi}} e^{-t^2/2},
\]
as
\[
\lim_{m\rightarrow\infty} \left((1+t/m)^{1+t/m} (1-t/m)^{1-t/m}\right)^{m^2/t^2} = \lim_{z\rightarrow\pm\infty} (1-1/z^2)^{z^2} (1+1/z)^z (1-1/z)^{-z} = e.
\]
Hence
\[
{\bf P}(x\leq S_n/\sqrt{n} \leq y) = \frac{1}{\sqrt{2\pi}} \int_x^y e^{-t^2/2}\,d\mu_n(t) + \int_x^y R_n(t)\,d\mu_n(t),
\]
where $R_n(t)\rightarrow 0$ as $n\rightarrow\infty$. Moreover this convergence is uniform in $t\in[x,y]$, so
\[
\left|\int_x^y R_n(t)\,d\mu_n(t)\right| \leq \mu_n([x,y]) \max_{x\leq t\leq y} |R_n(t)| \leq (y-x + 2/\sqrt{n})\max_{x\leq t\leq y} |R_n(t)| \rightarrow 0.
\]
Hence
\[
{\bf P}(x\leq S_n/\sqrt{n}\leq y) \rightarrow \frac{1}{\sqrt{2\pi}} \lim_{n\rightarrow\infty}\int_x^y e^{-t^2/2}\,d\mu_n(t) = \frac{1}{\sqrt{2\pi}}\int_x^y e^{-t^2/2}\,dt,
\]
where the last equality follows from the theorem that continuous functions are Riemann integrable. Thus we have verified the central limit theorem.