Sunday, 13 January 2013

Fekete's lemma and sum-free sets

Just a quick post to help popularise a useful lemma which seems to be well known to researchers but not to undergraduates. It would make for a good Analysis I exercise. Let $\mathbf{N}$ exclude $0$ for the purpose of this post.
Fekete's lemma: If $f:\mathbf{N}\to\mathbf{R}$ satisfies $f(m+n)\leq f(m)+f(n)$ for all $m,n\in\mathbf{N}$ then $f(n)/n\longrightarrow\inf_n f(n)/n$.
 Proof: The inequality $\liminf f(n)/n \geq \inf_d f(d)/d$ is immediate from the definition of $\liminf$, so it suffices to prove $\limsup f(n)\leq f(d)/d$ for each $d\in\mathbf{N}$. Fix such a $d$ and set $M=\max\{0,f(1),f(2),\ldots,f(d-1)\}$. Set $f(0)=0$. Now for a given $n$ choose $k$ so that $0 \leq n-kd < d$. Then
\[\frac{f(n)}{n} \leq \frac{f(n-kd)}{n} + \frac{f(kd)}{n} \leq \frac{M}{n} + \frac{kf(d)}{kd} \longrightarrow \frac{f(d)}{d},\]
so $\limsup f(n)/n \leq f(d)/d$.

Example: For $n\in\mathbf{N}$ let $f(n)$ denote the largest $k\in\mathbf{N}$ such that every set of $n$ nonzero real numbers contains a subset of size $k$ containing no solutions to $x+y=z$. We call such a subset sum-free.
Claim: $f(n)/n$ converges as $n\to\infty$.
Proof: Let $A$ be a set of size $m$ containing no sum-free subset of size larger than $f(m)$, and $B$ a set of size $n$ containing no sum-free subset of size larger than $f(n)$. Then for large enough $M\in\mathbf{N}$, the set $A\cup MB$ is a set of size $m+n$ containing no sum-free subset of size larger than $f(m)+f(n)$, so $f(m+n)\leq f(m)+f(n)$. Thus by Fekete's lemma $f(n)/n$ converges to $\inf f(n)/n$.

The value of $\sigma = \lim f(n)/n$ not easy to compute, but certainly $0\leq\sigma\leq\frac{1}{2}$. A famously simple argument of Erdos shows $\sigma\geq\frac{1}{3}$: Fix a set $A$, and for simplicity assume $A\subset\mathbf{N}$. For $\theta\in[0,1]$ consider those $n\in A$ such that the fractional part of $\theta n$ lies between $\frac{1}{3}$ and $\frac{2}{3}$. This set $A_\theta$ is sum-free, and if $\theta$ is chosen uniformly at random then $A_\theta$ has size $|A|/3$ on average.

The example $A=\{1,2,3,4,5,6,8,9,10,18\}$, due to Malouf, shows $\sigma\leq\frac{2}{5}$. The current record is due to Lewko, who showed by example $\sigma\leq\frac{11}{28}$. Since these examples are somewhat ad hoc, while Erdos's argument is beautiful, one is naturally led to the following conjecture (indeed many people have made this conjecture):
Conjecture: $\sigma=\frac{1}{3}$.
Ben Green, Freddie Manners, and I have just proved this conjecture: see http://arxiv.org/abs/1301.4579.

Thursday, 13 September 2012

Constructible regular polygons


The proof presented below, giving an (almost) complete characterisation of constructible regular polygons, is such a beautiful gem of a proof that I can't help but record it here so that I might not forget it. I'll go over far more details than is usually done, simply because it amuses me how many different areas of undergraduate mathematics are touched upon.
Theorem: The regular $n$-gon is constructible by ruler and compass if and only if $n$ has the form $2^k p_1 \cdots p_l$, where $p_1,\ldots,p_l$ are distinct primes of the form $2^{2^m} + 1$.
Primes of this form are called Fermat primes. The only known Fermat primes are $3,5,17,257,65537$, and heuristics suggest that there may be no others, though this is an open problem.

We must say what we mean by "constructible". We assume we are given a plane (a sheet of paper, say), with two points already labelled. We parameterise the plane by points of $\mathbf{C}$, and we rotate and scale so that the two labelled points are $0$ and $1$. There are three things we are allowed to do:

  1. Whenever we have two labelled points we can draw the straight line through them.
  2. Whenever we have two labelled points we can draw the circle with one as centre and the other on the circumference.
  3. We can label any point of intersection (of two lines, two circles, or a line and a circle).
Any point of the plane (considered as an element of $\mathbf{C}$) that can be labelled through a sequence of the above operations is called constructible. Denote by $\mathbf{E} \subset \mathbf{C}$ the set of constructible numbers.
Lemma: $\mathbf{E}$ is the smallest subfield of $\mathbf{C}$ closed under taking square roots.
The lemma can be proved as follows.

  1. Show closure under $(x,y)\mapsto x+y$ by constructing the parallelogram with vertices $0,x,y,x+y$.
  2. Show closure under the rotation $x\mapsto ix$.
  3. Show closure under taking real parts. Hence $\mathbf{E} = \mathcal{R}\mathbf{E} + i \mathcal{R}\mathbf{E}$, where $\mathcal{R}\mathbf{E} = \mathbf{E}\cap\mathbf{R}$.
  4. Show closure under $(x,y)\mapsto xy$ for $x,y\in\mathcal{R}\mathbf{E}$ by constructing the right triangle with leg lengths $1,x$ and the similar right triangle with base $y$. Deduce that $\mathbf{E}$ is a ring.
  5. Show that if $0\neq x\in\mathbf{E}$ then $1/x\in\mathbf{E}$ by doing so first for $x\in\mathcal{R}\mathbf{E}$ (another right triangle construction) and then using $1/x = \bar{x}/|x|^2$. Thus $\mathbf{E}$ is a field.
  6. Show that $\mathcal{R}\mathbf{E}$ is closed under taking square roots (yet another right triangle construction). Thus show that $\mathbf{E}$ is closed under taking square roots by bisecting an appropriate angle.
  7. Finally show that any subfield $K$ of $\mathbf{C}$ closed under taking square roots is closed under the "constructing" rules we listed when defining $\mathbf{E}$. The idea here is that one never has to solve an equation worse than a quadratic (when intersecting two circles, first construct the perpendicular bisector to the segment connecting the two centres, and then compute the intersection of this line with one of the circles).
The following is our algebraic characterisation of constructibility.
Lemma: $x\in\mathbf{E}$ if and only if the degree $[K:\mathbf{Q}]$ of the normal closure $K$ of $\mathbf{Q}(x)$ is a power of $2$.
Proof: By the previous lemma, $\mathbf{E}$ can be described as the union of all fields $\mathbf{Q}(\alpha_1,\ldots,\alpha_m)$ such that $\alpha_k^2\in\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})$ for all $k=1,\ldots,m$. Let $x\in\mathbf{E}$. Thus $x$ lies in some such field $L = \mathbf{Q}(\alpha_1,\ldots,\alpha_m)$. But note by the tower law that
$$[L:\mathbf{Q}] = \prod_{k=1}^{m} [\mathbf{Q}(\alpha_1,\ldots,\alpha_k):\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})],$$where each factor $[\mathbf{Q}(\alpha_1,\ldots,\alpha_k):\mathbf{Q}(\alpha_1,\ldots,\alpha_{k-1})]\leq 2$. Thus $[L:\mathbf{Q}]$ is a power $2$, and by another application of the tower law so is $[K:\mathbf{Q}]$.

Conversely suppose that $[K:\mathbf{Q}]$ is a power of $2$. Then the Galois group $\text{Gal}(K/\mathbf{Q})$ is a $2$-group. Recall that any $p$-group $G$ (except the trivial group) has nontrivial centre: by counting conjugacy class sizes, the size of $Z(G)$ must be divisible by $p$. Thus by induction (and classifying abelian $p$-groups) it follows that in every $p$-group $G$ there is a sequence of normal subgroups $G_i \triangleleft G$ such that $G_0 = \{e\}$, $G_n = G$, and $|G_k/G_{k-1}| = p$ for each $k=1,\ldots,n$. Specialising to $p=2$, Galois theory now implies that $K$ is the top of a tower of quadratic extensions starting from $\mathbf{Q}$, so $x\in K\subset\mathbf{E}$.

By saying "the $n$-gon is constructible" we mean that the vertices of some regular $n$-gon are constructible. Clearly this is equivalent to saying $\zeta_n = \exp(i2\pi/n)$ is an element of $\mathbf{E}$. Note that $\Phi_n(\zeta_n) = 0$, where the $n$th cyclotomic polynomial $\Phi_n$ is defined by $$\Phi_n(X) = \prod_{(k,n)=1} (X-\zeta_n^k).$$Note that $\prod_{d|n} \Phi_d(X) = X^n-1 \in \mathbf{Z}[X]$, so by the uniqueness of polynomial division and induction we have that $\Phi_n(X) \in \mathbf{Z}[X]$.
Lemma: $\Phi_n$ is irreducible over $\mathbf{Q}$.
Proof: Suppose $\Phi_n(X) = f(X) g(X)$ where $f$ is irreducible and $\deg f\geq 1$. We must show that each $\zeta_n^k$ is a root of $f$. Since some $\zeta_n^k$ is a root of $f$, it suffices to show that $f(z)=0$ implies $f(z^p)=0$ whenever $p$ is a prime not dividing $n$. Towards this end, suppose that $p$ is such a prime and $f(z)=0$ but $f(z^p)\neq 0$. Then $g(z^p) = 0$ since $\Phi_n(z^p)=0$. Since $f$ is irreducible this implies that $f(X)$ divides $g(X^p)$. Denoting reductions modulo $p$ by a tilde and using Freshman's dream, $\tilde{f}$ divides $\tilde{g}(X^p) = \tilde{g}(X)^p$. Thus $\tilde{f}$ and $\tilde{g}$ are not coprime, so $\tilde{\Phi_n}$, and thus $X^n-1$, has a multiple root over $\mathbf{F}_p$. But this is a contradiction since $X^n-1$ and its formal derivative $nX^{n-1}$ are coprime.

Thus $\zeta_n$ has minimal polynomial $\Phi_n$, and moreover $\mathbf{Q}(\zeta_n)/\mathbf{Q}$ is a normal extension. We have therefore reduced our task to finding those $n$ for which $[\mathbf{Q}(\zeta_n):\mathbf{Q}] = \deg\Phi_n = \varphi(n)$ is a power of $2$.

Writing $n$ as a product of primes $n=p_1^{e_1}\cdots p_m^{e_m}$ with each $e_i>0$, we have (OK, the details must stop somewhere) $$\varphi(n) = p_1^{e_1 - 1}(p_1 - 1) \cdots p_m^{e_m-1} (p_m-1).$$This is a power of $2$ iff every $p_i\neq 2$ which occurs has $e_i = 1$ and $p_i - 1$ a power of $2$. But if $p = 2^k + 1$ is prime, then $k$ itself must be a power of $2$, since for every odd $q$ we have $$x^q + 1 = (x + 1)(x^{q-1} - x^{q-2} + - \cdots + 1).$$


Thursday, 6 September 2012

A clarifying view of Weyl's criterion

Consider an arbitrary sequence $(x_n)$ on the circle $\mathbf{T} = \mathbf{R}/\mathbf{Z}$. If $(x_n)$ is sufficiently well behaved, then often it will have a sort of limiting distribution, which would be usefully represented by a measure $\mu$. For instance, if $x_n$ alternates between $0$ and $1/2$ somewhat regularly, then this limiting distribution $\mu$ is an atomic measure localised at $0$ and $1/2$. Other sequences, such as $\alpha n \text{ mod $1$}$ with $\alpha$ irrational, are equidistributed, by which I mean that the limiting distribution $\mu$ is just the uniform measure. Finally, some sequences do not have a well defined limiting distribution at all, e.g. the sequence which spends 1 year at home, then 10 years at its summer home, then 100 years at home, then 1000 years at the summer home, etc.


There are two ways to make this rigorous:
  1. We say $(x_n)$ has a limiting distribution if the limit $$D(f)  =  \lim_{N\to\infty}   \frac{1}{N}\sum_{n=1}^N f(x_n)$$exists for every continuous function $f:\mathbf{T}\to\mathbf{C}$. In this case $D$ defines a linear functional on $C(\mathbf{T})$ such that $D(1)=1$ and $f\geq 0$ implies $D(f)\geq 0$, so Riesz's representation theorem implies that $D$ is represented by a probability measure $\mu$: $$D(f) = \int_\mathbf{T} f\, d\mu.$$
  2. There's another way of saying (pretty much) the same thing, in case you don't like Riesz. Namely, for each $N$ consider the atomic measure $\mu_N$ which gives a mass $1/N$ to each of $x_1, ..., x_N$, and then take a weak limit of some subsequence of $(\mu_N)$ (recall that the space of regular Borel measures on $\mathbf{T}$ is weakly compact). We say that $(x_n)$ has a limiting distribution if this weak limit is independent of subsequence, i.e. if $(\mu_N)$ weakly converges.
Next we say that $(x_n)$ is equidistributed if it has the uniform distribution as its limiting distribution. Now a probability measure on $\mathbf{T}$ is the uniform distribution iff all its nonzero Fourier coefficients vanish. But if $\mu$ is the limiting distribution of $(x_n)$ then $$\hat{\mu}(k) = \int_0^1 e^{-i2\pi k x} d\mu(x) = \lim_{N\to\infty}  \frac{1}{N}\sum_{n=1}^N e^{-i2\pi k x_n},$$so this comes down to exactly Weyl's criterion.

Taking the weak limit point of view rather than Riesz's, we can additionally conclude the existence of the limiting distribution from Weyl's criterion: if you take the weak limit $\mu$ of some subsequence, Weyl's criterion implies that this $\mu$ is uniform. Since all convergent subsequences then converge to the same limit, the sequence is actually convergent.

In fact, now that I'm thinking about it, this reasoning gives a sort of generalized Weyl's criterion for the existence of a limiting distribution. Namely, the "Fourier coefficients" $$\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^N e^{-i2\pi k x_n}$$all exist (they need not be $0$) iff $(x_n)$ has a limiting distribution.