Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond
Boris Alexeev, Kevin Barreto, Yanyang Li, Jared Duker Lichtman, Liam Price, Jibran Iqbal Shah, Quanyu Tang, and I have just uploaded to the arXiv our paper Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond. This paper (which is a work in progress) represents our efforts to digest and document the recent flurry of developments around the following problem of Erdős, Sárközy, and Szemerédi on primitive sets:
Conjecture 1 (Erdős problem #1196) Suppose that
is a primitive set, which means that no element of
divides another. Then
as
.
One can show that the upper bound of is best possible up to the
error by taking
to be the set of products of
primes for some suitable parameter
. This was one of the most well-known open problems in the study of primitive sets, and had attracted some number of partial results (for instance, Lichtman was able to show the upper bound of
). It was thus notable that this problem was first solved by an autonomous AI query (by the fifth author) a few weeks ago. This solution introduced a proof technique – based on Markov chains in the divisibility poset – which in retrospect is very natural for controlling primitive sets, but which had not been explicitly used in previous literature, though in retrospect many of the arguments in that literature involved a specific Markov chain which we call the downwards Mertens chain. The proof instead revolved around a different Markov chain, which we call the downwards von Mangoldt chain, which manages to neatly avoid the “
” type losses in the previous Mertens-based arguments, and resolve Conjecture 1. In this paper we develop the Markov chain approach more systematically, and show that it settles several further conjectures concerning primitive sets, and also provides simpler proofs of some previous results in the literature. More precisely, in addition to Conjecture 1, we establish the following:
Theorem 2 (Erdős primitive set conjecture, #164) For any primitive
consisting of numbers greater than
,
Theorem 3 (Odd Banks–Martin) Let
and suppose
is a primitive set consisting of odd numbers with at most
prime factors. Then
![]()
where
denotes the primes appearing as factors of elements of
, and
is the collection of products of
primes from
.
Theorem 4 (
is Erdős-strong) If
is a primitive set consisting of even numbers, then
Theorem 5 (Ahlswede–Khachatrian–Sárközy) If
is a primitive set, then
whenever
.
Theorem 6 (Erdős–Sárközy–Szemerédi, #1217) Let
be such that the upper doubly logarithmic density
is positive. Then there exists a strictly increasing infinite divisibility chain
in
such that
Theorem 2 and Theorem 5 had been previously established by Lichtman and Ahlswede–Khachatrian–Sárközy respectively, but the Markov chain formalism gives shorter (and more unified) proofs of both. Theorems 3, 4, 6 were open conjectures that can now be settled by this method. These results were obtained with varying levels of AI involvement, ranging from completely autonomous AI queries to traditional pen-and-paper calculations, to various hybrid approaches (for instance, with humans suggesting key inequalities that could then be rapidly tested numerically or even proved by various AI tools).
— 1. Chain/antichain duality and Markov chains —
I’ll now discuss the basic method of proof and try to motivate the main ideas, which have become much clearer in retrospect. Primitive sets can be viewed as antichains in the divisibility poset , in which the partial ordering is given by the divisibility relation
. So, one can pose the following more abstract question: given a general poset
and a weight function
, what is the maximal value of
as
ranges over all antichains in
?
One can attack this problem using the well known duality between antichains and chains (totally ordered subsets of
): every antichain and chain can meet in at most one point, thus one has
for any chain and any antichain
. In particular, if one has a measure
on the space
of all chains (viewed as a compact subspace of the power set of
, equipped with the product topology) with the property that
for all , then by integrating the previous inequality against
and using Tonelli’s theorem one would obtain the upper bound
In fact this duality is completely tight:
Proposition 7 (Chain/antichain duality) Let
be a poset, let
be a weight function, and let
. Then the following are equivalent:
Proof: We have already indicated how (ii) implies (i). Now we need to show that (i) implies (ii). A standard compactness argument allows us to reduce to the case when is finite. If (i) holds, then we also have
for all in the Stanley chain polytope, defined as the convex hull of the indicator functions of antichains. By a classic result of Stanley, this polytope can also be defined as the space of all
obeying the inequalities
Applying linear programming duality (or the Farkas lemma), we conclude that the inequality (2) must be a non-negative linear combination of the inequalities (3), (4) (as well as the trivial inequality ). Equivalently, we can find non-negative weights
for each chain
such that
and
for all . The claim follows by viewing
as a measure on the space of chains.
Thus, the “universal” problem of obtaining a uniform upper bound on for all antichains
is replaced with the equivalent “existential” dual problem of exhibiting a single measure
on chains of controlled mass, which hits each element
of the poset with a mass of at least the original weight
. Thus, such problems are now reduced to that of finding a sufficiently clever construction of such a measure
. If the mass of
was normalized to equal
, this becomes a probability problem: find a random chain process in the poset
that hits each element
of the poset with a sufficiently high probability. (Though in our paper we found it more convenient for technical reasons to not normalize the measure, and allow the mass to take values other than
.)
It turns out (in a manner that was not explicitly appreciated in past literature) that particularly good choices of random chain to use here can come from Markov chains. (Here, the term “chain” is now being used in two different ways, but fortunately the order-theoretic concept of a chain and the Markov process-theoretic concept of a chain will be quite compatible in this discussion.) There will be two types of Markov chains on the poset that will be relevant: downwards Markov chains and upwards Markov chains. Here is our notation for a downwards Markov chain:
Definition 8 (Downwards Markov chain) Let
be a poset, and suppose we designate some subset
of
to be the “absorbing states” (in practice these will be the minimal elements of
, although they do not have to be). A downwards Markov chain on
with absorbing states
is a collection of transition probabilities
for
obeying the following axioms:
(Thus for instance one has
for any absorbing state
.)
Given such a downwards Markov chain and an initial state , one can generate a random decreasing sequence
by having each
transition to
with probability
after conditioning on the past history
of the chain. This sequence will (almost surely) be strictly decreasing until it hits an absorbing state, in which case it stays there forever, although if the descending chain condition is not satisfied it is also possible for the sequence to be strictly decreasing indefinitely. We let
denote the law of this decreasing sequence. This construction is already enough to recover the Lubell-Yamomoto-Meshalkin (LYM) inequality:
Theorem 9 (LYM inequality) If
is the power set of
with the inclusion partial ordering, and
is an antichain in this poset (i.e., a Sperner family), then
Proof: We introduce the downwards Markov chain with absorbing state in which each non-empty subset of
of some cardinality
transitions to a
-element subset chosen uniformly at random (i.e., with probability
of transitioning to each). If we start the descending sequence from the maximal element
of
, then one can easily check that each
is hit with probability
. Applying Proposition 7 with
,
, and
, we obtain the claim.
In the above argument we fixed the initial location of the Markov chain, but more generally one can start with any source mass
and work with the measure
for the purposes of applying Proposition 7.
One can also define upwards Markov chains in exact analogy with downwards Markov chains
(reversing the order in the poset), which now generate random increasing sequences
rather than decreasing sequences. There is a useful adjoint construction that can convert a downwards Markov chain into an upwards Markov chain: if we have a positive weight
which is invariant under the chain in the sense that
for any , then we can define an adjoint upwards Markov chain
(with no absorbing state) by the formula
for any in
. More generally, if
is merely sub-invariant in the sense that
for all , one can still construct an adjoint upwards chain as before, but now one must also add an additional absorbing maximal state
to ensure that the transition probabilities still sum to one.
A downwards or upwards Markov chain, when equipped with an invariant or sub-invariant measure, also induces a flow network on the poset, in which an edge from to
is assigned a flow capacity of
One can rewrite the Markov chain arguments in the paper in terms of such flow networks, in which case the arguments often boil down to an application of the discrete divergence theorem, giving very short proofs of many of the above results; see the paper for more discussion. However, we chose to focus more on the Markov chain approach in our presentation, as this formalism is also natural and could potentially be more flexible for further applications.
— 2. The Mertens and von Mangoldt chains —
For the purpose of analyzing primitive sets, there are two downward chains on the natural numbers (with absorbing state )that, in retrospect, are particularly natural to use:
The von Mangoldt weight is a natural choice here thanks to the fundamental identity
which encodes the fundamental theorem of arithmetic. The two chains are similar in many way to each other: the von Mangoldt process favors the division by the largest prime factor, but does not require it.
The Mertens chain generates deterministic downward divisibility chains
starting from a product of primes
, and as such this process was implicit in much of the previous literature on primitive sets. However, it does not quite interact well with the
weight, which is not invariant or subinvariant with respect to this process. Intuitively, the process of dividing
by
tends to increasingly select for numbers
for which
is smaller than expected. Instead, the natural invariant measure for this chain is the Mertens weight
the verification that this is indeed an invariant weight is a nice exercise in telescoping series. Taking the adjoint of the downwards Mertens chain with respect to this weight and running that chain from gives the upwards Mertens divisibility chain
in which each transitions to
for some prime
with probability
. A routine induction shows that each
is hit by this chain with a probability of
; this for instance gives a weak version
of Theorem 1, and similarly for the other results discussed above.
The key innovation (which was uncovered by the AI-assisted proofs, though not quite in the notation and framework presented here) is to switch to the von Mangoldt chain, which removes the bias towards numbers whose largest prime factor is small. Indeed, the weight now turns out to be sub-invariant (after removing
) under this chain, and there is a modification
of this weight which turns out to be perfectly invariant. (We have an interpretation of this formula in terms of a zeta process that couples together various zeta distributions into a continuous divisibility chain; see the paper for further details.) Taking the adjoint with respect to this weight (or with the original weight ) can eliminate the
loss in the previous argument, and give one of the proofs of Theorem 1 recorded in our paper (there are several other variants of this method that we also present).
One slight defect of the von Mangoldt chain, as compared to the Mertens chain, is that it can “jump over” primitive sets (such as the set of products of primes) due to the fact that it will sometimes multiply or divide by a power of a prime rather than a prime itself. This turns out to be a technical difficulty for many of our applications, resulting in a need to make various small ad hoc modifications to the von Mangoldt chain to eliminate this type of jump.
In order to establish some crucial sub-invariance properties, it turns out (after some standard manipulations) to be useful to obtain good bounds on the negative log-derivative
of the Riemann zeta function in the region
. Here it turns out that there is a clean (and rather efficient) upper bound
which is equivalent to the non-decreasing nature of the Dirichlet eta function
in this region. There are many proofs of this fact in the literature, but I would like to record a particularly cute proof, that is in the spirit of other arguments in the paper: one can interpret probabilistically as
where is a gamma random variable of shape
and scale
. Because the sum of independent copies of
and
has the distribution of
, one can couple together all the
so that they are increasing in
, at which point the claim follows.
— 3. Further directions —
Will Sawin and Ofir Gorodetsky have obtained analogues of several of the above results for function field models or permutation models respectively; we briefly discuss these in the paper, although we do not plan to cover these models in depth. We also note another recent use of this technique to solve a separate Erdős problem (#858) relating to antichains in a variant of the divisibility poset.
The zeta process that we have discovered hints at an emerging theory of the “developmental anatomy of integers”, which differs from the existing topic of anatomy of integers in that it views a large integer (and its prime factor “organs”) not as a static entity, but rather as an evolving process in which primes (or powers of primes, in the case of the von Mangoldt process) are added or removed to the integer over time. With this perspective, primitive sets can be viewed as singular moments in such a developmental process, which are only encountered at most once in the life cycle of a given integer. It seems of interest to study this developmental perspective further.
The paper is currently a work in progress; we have released an early version due to the public interest in this problem. We plan to explore some further applications, and also to formalize more of the above results in Lean (currently two of the six main theorems are formalized), before submitting the paper for publication. The situation here highlights a distinction I have recently made between three components of the problem solving process in mathematics, namely proof generation, proof verification, and proof digestion. In this particular case, the first two steps were extremely rapid due to modern AI tools; however, properly digesting the AI-generated proofs into a coherent exposition that places the arguments in context with both past literature and future directions remains a slower process that requires expert human attention.



