Understanding the Grand List Decoding Challenge

2026-06-12

This document covers Reed-Solomon list decoding from basic principles, and explains the concepts needed to understand the grand list decoding problem for interleaved and folded Reed-Solomon codes. We start from the basic picture, where data becomes a polynomial, and build up gradually to the decoding question and the open challenge it leads to. No prior coding theory is assumed.

Why does this matter? Solving the grand list decoding problem, as defined by the Ethereum Foundation, would lead to more efficient STARK-style proof systems. STARKs are transparent, hash-based, plausibly post-quantum proof systems, and modern SNARK/STARK constructions rely heavily on Reed-Solomon proximity properties through protocols such as FRI and WHIR. Their security depends on how many codewords can crowd near a corrupted word: the list size. The smaller that list stays as we allow more errors, the fewer checks a verifier needs, and the smaller the proof can get. So pinning down the list size for these structured codes is not just a coding-theory curiosity; it directly controls how compact these proof systems can be.

0. The High-Level Picture

A Reed-Solomon code turns data into evaluations of a low-degree polynomial.

data  →  polynomial  →  evaluations  →  Reed-Solomon codeword

List decoding asks:

Given a corrupted received word, how many valid Reed-Solomon codewords could be close to it?

The Grand List Decoding problem studies this question for an interleaved or folded Reed-Solomon code.

The central mathematical object is the list size:

Λ(C,w,E)|\Lambda(C,w,E)|

which means:

the number of codewords in C within distance E of received word w

The worst-case version is:

BC(E)=maxwΛ(C,w,E).B_C(E) = \max_w |\Lambda(C,w,E)|.

This quantity is the base Reed-Solomon list-size problem.

1. Reed-Solomon Codes: The Basic Idea

A Reed-Solomon code starts with data written on a “tape.”

Think of the tape as holding field elements:

tape:
+-------+-------+-------+-------+
|  a_0  |  a_1  |  a_2  |  a_3  |
+-------+-------+-------+-------+

Instead of treating these values as just a list, Reed-Solomon interprets them as the coefficients of a polynomial:

p(X)=a0+a1X+a2X2+a3X3.p(X) = a_0 + a_1X + a_2X^2 + a_3X^3.

So the tape

[a_0, a_1, a_2, a_3]

becomes the polynomial

a_0 + a_1 X + a_2 X^2 + a_3 X^3

If there are kk values on the tape, then the polynomial has at most kk coefficients:

p(X)=a0+a1X+a2X2++ak1Xk1.p(X) = a_0 + a_1X + a_2X^2 + \cdots + a_{k-1}X^{k-1}.

The largest exponent is k1k-1, so the polynomial has degree less than kk:

deg(p)<k.\deg(p) < k.

That is what the parameter kk means:

kk is the number of message coefficients, or equivalently the dimension of the Reed-Solomon code.

Then Reed-Solomon evaluates this polynomial at many points:

x1,x2,,xn.x_1,x_2,\dots,x_n.

The output is:

(p(x1),p(x2),,p(xn)).\bigl(p(x_1),p(x_2),\dots,p(x_n)\bigr).

That output is the Reed-Solomon codeword.

flowchart LR
    A["Data tape<br/>a₀, a₁, a₂, ..., aₖ₋₁"]
    B["Interpret as polynomial<br/>p(X)=a₀+a₁X+...+aₖ₋₁X^(k−1)"]
    C["Evaluate at many points<br/>x₁, x₂, ..., xₙ"]
    D["RS codeword<br/>(p(x₁), p(x₂), ..., p(xₙ))"]

    A --> B --> C --> D

Here is the same idea as a table:

Tape positionTape valuePolynomial term
00a0a_0a0a_0
11a1a_1a1Xa_1X
22a2a_2a2X2a_2X^2
k1k-1ak1a_{k-1}ak1Xk1a_{k-1}X^{k-1}

So:

k values on the tape

polynomial with degree < k

evaluate at n points

codeword of length n

Usually nn is larger than kk. That is the whole point: we start with kk pieces of data and expand them into nn evaluations.

The ratio

ρ=k/n\rho = k/n

is called the rate.

  • If kk is close to nn, the code has little redundancy.
  • If kk is much smaller than nn, the code has more redundancy.
  • More redundancy usually means better error tolerance.

A Reed-Solomon code is written as:

C=RS[F,L,k]C = RS[\mathbb{F}, L, k]

where:

  • F\mathbb{F} is the finite field.
  • L={x1,,xn}L = \{x_1,\dots,x_n\} is the evaluation domain.
  • n=Ln = |L| is the block length.
  • kk is the number of polynomial coefficients.
  • ρ=k/n\rho = k/n is the rate.

The code consists of all vectors

(p(x))xL\bigl(p(x)\bigr)_{x \in L}

where pp ranges over all polynomials with

deg(p)<k.\deg(p)<k.

Code, codeword, and received word

Three objects are worth pinning down right away, because they are easy to blur, and the rest of these notes leans on the difference.

A codeword (singular) is one valid evaluated vector,

c=(p(x))xL,c = \bigl(p(x)\bigr)_{x \in L},

coming from a single message polynomial pp. It has length nn.

A Reed-Solomon code is the set of all such codewords, one for every message:

C={(p(x))xL:degp<k}.C = \{\, (p(x))_{x \in L} : \deg p < k \,\}.

So a codeword is a single element, and the code is the whole collection it lives in: cCc \in C. The "-word" ending is the tell: [5,9,7,8][5, 9, 7, 8] is a codeword, while the code is the entire pile of valid vectors it belongs to.

It also helps to keep the message separate from its codeword. The message has length kk; the codeword has length nn. The map from one to the other is the encoder, written Enc\mathrm{Enc}. There is exactly one codeword per message, so the two sets have the same size, but they live in different spaces:

message    msg ∈ F^k                     (k symbols)
            │ Enc

codeword   c = Enc(msg) ∈ F^n            (n symbols: one valid vector)
code       C = { Enc(m) : all m } ⊆ F^n  (the set of ALL codewords)

Finally, a received word ww is what actually arrives after transmission. It has the same length nn as a codeword, so it sits in the same space Fn\mathbb{F}^n. But here is what a Reed-Solomon code is not: the received word is not assumed to be a codeword.

possibly wC.\text{possibly } w \notin C.

If transmission was clean, then w=cCw = c \in C and there is nothing to do. If symbols were corrupted, ww can be knocked off the code, to a point in Fn\mathbb{F}^n that is not a valid codeword at all. Recovering the original cc from such a ww is exactly the job of decoding.

How far that recovery can be pushed is the whole subject of these notes. To make "how far" precise, we first need a way to measure distance.

2. Distance, Hamming Balls, and List Size

Decoding needs a way to say how far a received word is from a codeword. The tool is Hamming distance, which counts how many positions differ.

For example, a codeword next to a corrupted received word:

received word: w = [5, 9, 2, 8]
codeword:      c = [5, 9, 7, 8]

These differ in one coordinate:

w = [5, 9, 2, 8]
c = [5, 9, 7, 8]
          ^
       different

So their Hamming distance is 1.

The Hamming distance simply counts how many positions are different. If two words differ in at most EE positions, we say they are within distance EE.

Absolute distance vs relative distance

There are two common ways to measure closeness. Throughout, cc is a codeword (a valid element of CC), ww is the received word, and both have length nn.

The first way is absolute Hamming distance:

Δ(c,w)=number of coordinates where c and w differ.\Delta(c,w) = \text{number of coordinates where } c \text{ and } w \text{ differ}.

This is the raw number of errors.

The second way is relative Hamming distance, the same count divided by the length:

δ(c,w)=Δ(c,w)n.\delta(c,w) = \frac{\Delta(c,w)}{n}.

So when people say a received word ww is δ\delta-close to a codeword cc, they mean

Δ(c,w)δn,equivalentlyΔ(c,w)nδ.\Delta(c,w) \le \delta n, \qquad\text{equivalently}\qquad \frac{\Delta(c,w)}{n} \le \delta.

In words:

ww is δ\delta-close to cc if the fraction of positions where they differ is at most δ\delta.

Example: if n=100n=100 and δ=0.25\delta=0.25, then ww and cc may differ in at most 0.25100=250.25 \cdot 100 = 25 positions.

So:

absolute radius: E errors
relative radius: δ = E / n

For an interleaved or folded Reed-Solomon code the same idea applies, but the length is the number of bundled coordinates. Each bundled coordinate may hold several field elements, yet Hamming distance still counts how many coordinate positions differ.

How far does corruption push the received word?

We saw in §1 that a received word ww is not necessarily a codeword. With distance in hand we can be precise, but there are really two questions here, and the trick to this section is keeping them apart:

  1. Is ww itself a codeword?
  2. How many codewords sit near ww, and is the nearest one unique?

They have different answers, set by two different radii.

How spread out the code is. The governing quantity is the code's minimum distance dd: the fewest positions in which two distinct codewords can differ. For a Reed-Solomon code,

d=nk+1.d = n - k + 1.

This is the largest a minimum distance can possibly be (the Singleton bound). A code that reaches this maximum is called MDS, for maximum distance separable, and Reed-Solomon codes are the standard example. We unpack what MDS buys in §7; for now all we need is that codewords are spread far apart, at least dd positions from one another.

A picture. Imagine each codeword as a star in a huge sky, with a bubble of radius

t=d12t = \left\lfloor \tfrac{d-1}{2} \right\rfloor

around it. Since any two codewords are at least dd apart, these bubbles never overlap. This tt is the unique-decoding radius. The received word ww is just the sent codeword cc nudged by ee errors, so it sits ee steps away from cc. Everything depends on the size of that nudge:

errors e:   0            t                      d                    n
            |------------|----------------------|--------------------|
          w = c      bubble edge          a different
                  (unique-decode          codeword can
                     radius)              be reached

  e = 0         w is exactly c
  1 ≤ e ≤ t     w is off the code, and c is the ONLY codeword near it   → unique decoding
  t < e < d     w is off the code, but SEVERAL codewords may be near it → list decoding
  e ≥ d         w might itself BE another (wrong) codeword

Reading the bands:

  • e=0e = 0: nothing was corrupted, w=cCw = c \in C, done.
  • 1et1 \le e \le t: ww has left the code (wCw \notin C) but is still inside cc's bubble. No other codeword is that close, so cc is the unique nearest codeword. This is unique decoding.
  • t<e<dt < e < d: ww is still not a codeword, but it has drifted out of cc's bubble into the gaps between stars. Now several codewords can lie within a search radius, and from ww alone you cannot tell which was sent. This is where list decoding lives: the candidates are exactly the set Λ(C,w,E)\Lambda(C,w,E).
  • ede \ge d: the nudge is large enough that ww can land on a different codeword. Then wCw \in C again, but it is the wrong one, and nothing about ww reveals that.

Two takeaways. First, "ww is not a codeword" and "the nearest codeword is unique" are set by different thresholds: ww stays off the code for the whole range 0<e<d0 < e < d, but uniqueness is only guaranteed up to td/2t \approx d/2. Second, the band past tt is the entire reason list decoding exists. Once corruption exceeds the unique-decoding radius, the honest answer to "which codeword was sent?" is a list, not a single word.

Hamming balls

A Hamming ball is the set of all words within some radius of a center word.

Think of an ordinary ball:

all points within radius E of a center point

A Hamming ball is the same idea, except distance means the number of coordinates where two words differ. Formally, the full Hamming ball of radius EE around a received word ww is

B(w,E)={yFn:Δ(y,w)E}.B(w,E)=\{y \in \mathbb{F}^n : \Delta(y,w) \le E\}.

That ball contains all ambient words near ww, not just valid codewords. The list-decoding object we care about is the part of that ball that intersects the code:

B(w,E)C.B(w,E) \cap C.

So, informally, when we say "the codewords inside the Hamming ball," we mean

all valid codewords that differ from w in at most E positions

Visually:

flowchart TD
    W["received word w<br/>possibly corrupted"]

    C1["codeword c₁<br/>distance ≤ E"]
    C2["codeword c₂<br/>distance ≤ E"]
    C3["codeword c₃<br/>distance ≤ E"]
    C4["codeword c₄<br/>distance > E"]

    W --> C1
    W --> C2
    W --> C3
    W -. "outside ball" .-> C4

The full Hamming ball contains every nearby ambient word. In this picture, the codewords c1,c2,c3c_1,c_2,c_3 are the valid codewords inside that ball. The codeword c4c_4 is outside it.

The list of nearby codewords

The list of nearby codewords is denoted

Λ(C,w,E)={cC:Δ(c,w)E}.\Lambda(C,w,E) = \{c \in C : \Delta(c,w) \le E\}.

Equivalently,

Λ(C,w,E)=B(w,E)C.\Lambda(C,w,E)=B(w,E)\cap C.

Read this as:

Λ(C,w,E) = all codewords c in C that are within distance E of w

This is a subset of the code: Λ(C,w,E)C\Lambda(C,w,E) \subseteq C. The code CC is the full catalog of codewords from §1; the list is just the part of that catalog sitting near a given ww.

Here are the symbols:

SymbolMeaning
CCthe code
wwthe received word
EEthe error radius
Δ(c,w)\Delta(c,w)Hamming distance between cc and ww
Λ(C,w,E)\Lambda(C,w,E)the list of codewords near ww

The number of nearby codewords,

Λ(C,w,E),|\Lambda(C,w,E)|,

is called the list size.

Worst-case list size

For one received word ww the list might be small; for another it might be larger. So we care about the worst case over all possible received words:

BC(E)=maxwΛ(C,w,E).B_C(E) = \max_w |\Lambda(C,w,E)|.

Read this as:

B_C(E) = the biggest list size that can happen at radius E

In words:

BC(E)B_C(E) is the largest number of Reed-Solomon codewords that can sit inside any Hamming ball of radius EE.

Why this quantity matters

The Grand List Decoding problem is not about one lucky received word. It is about the worst case.

The reason is that this list bound is used as a security guarantee. The setting behind the problem is proof systems and cryptographic commitments (such as FRI and STARKs), where the list size controls soundness, roughly the chance that a cheating party can slip a bad proof past a verifier.

In that setting the received word is not produced by random noise. A party trying to cheat the system gets to hand over the most confusing word possible, chosen to make the list as large as it can be. So we picture an adversary choosing ww, and we demand that the guarantee hold even against that worst-case choice. This is exactly what the maxw\max_w in the definition is doing:

BC(E)=maxwΛ(C,w,E).B_C(E) = \max_w |\Lambda(C,w,E)|.

So the important question is:

What is the largest list size that can happen?

If BC(E)B_C(E) is small, every received word has only a small number of nearby codewords. If BC(E)B_C(E) is huge, then some received word has many nearby codewords.

3. Unique Decoding vs List Decoding

When a message is encoded, there is one true Reed-Solomon codeword that was meant to be sent. But the decoder does not see that clean codeword. It sees a received word ww, which may have been corrupted in some positions.

So the decoding question is:

From the corrupted word ww, what valid codeword or codewords could have produced it?

There are two possible situations.

In the best case, there is exactly one nearby codeword.

received word w

exactly one nearby codeword c

return c

That is unique decoding.

In the more ambiguous case, there may be several nearby codewords.

received word w

several nearby codewords c₁, c₂, c₃

return [c₁, c₂, c₃]

That is list decoding.

List decoding does not mean that the decoder randomly guesses until it finds the true message. It means the decoder returns the full set of valid codewords that are close enough to ww:

Λ(C,w,E)={cC:Δ(c,w)E}.\Lambda(C,w,E)=\{c\in C:\Delta(c,w)\le E\}.

The true transmitted codeword should be somewhere in this list, assuming the corruption was within the chosen radius EE. But the list decoder may not know which candidate is the true one. A later step may use extra information to choose among the candidates, such as a checksum, a hash, a commitment, a proof check, or some other protocol rule.

So the real issue is not just whether we can find candidates. The real issue is whether the candidate list stays small.

Unique decoding

Unique decoding asks:

Is there exactly one codeword close to the received word?

If yes, decoding is easy. You return that one codeword.

flowchart TD
    W["received word w"]
    C1["one nearby codeword c"]

    W --> C1

This is the cleanest situation. The received word points clearly to one valid codeword.

Example: unique decoding

Suppose the code contains the following valid codeword:

c = [5, 9, 7, 8, 4, 2]

During transmission, one symbol gets corrupted:

w = [5, 9, 0, 8, 4, 2]

Compare them:

c = [5, 9, 7, 8, 4, 2]
w = [5, 9, 0, 8, 4, 2]
          ^
       one error

So:

Δ(c,w)=1.\Delta(c,w)=1.

If the decoding radius is E=1E=1, and no other codeword is within distance 11 of ww, then the list of nearby codewords is

Λ(C,w,1)={c},\Lambda(C,w,1)=\{c\},

with list size Λ(C,w,1)=1|\Lambda(C,w,1)|=1. That is unique decoding.

In plain English:

There is only one valid Reed-Solomon codeword close enough to the received word, so we treat it as the intended message.

List decoding

List decoding asks a more general question:

Which codewords are close to the received word?

Sometimes there may be several possible nearby codewords:

flowchart TD
    W["received word w"]

    C1["nearby codeword c₁"]
    C2["nearby codeword c₂"]
    C3["nearby codeword c₃"]

    W --> C1
    W --> C2
    W --> C3

Then the decoder returns a list:

[c₁, c₂, c₃]

This does not mean the decoder knows which one was originally sent. It means:

these are the candidates that are consistent with the received word

That is still useful if the list is small. It is not useful if the list is enormous.

Example: list decoding

Suppose the received word is:

w = [5, 9, 0, 8, 4, 2]

Now suppose the code contains three valid codewords that are all within distance 22 of ww:

c₁ = [5, 9, 7, 8, 4, 2]   distance 1 from w
c₂ = [5, 3, 0, 8, 4, 6]   distance 2 from w
c₃ = [1, 9, 0, 8, 4, 9]   distance 2 from w

(These vectors are illustrative. In a real Reed-Solomon code the minimum distance nk+1n-k+1 from §2 forces any two distinct codewords to differ in many positions, so genuine codewords cannot cluster arbitrarily close together, though the numbers above are fine for seeing the shape of the list.)

If the radius is E=2E=2, then the list of nearby codewords is

Λ(C,w,2)={c1,c2,c3},\Lambda(C,w,2)=\{c_1,c_2,c_3\},

with list size Λ(C,w,2)=3|\Lambda(C,w,2)|=3. So the decoder returns three candidates. It does not magically know which one was the original message.

The point is that the uncertainty has been reduced from:

all possible codewords

to:

these 3 nearby codewords

If some outside rule can identify the correct candidate, then list decoding is useful. For example, a larger protocol might say:

only one candidate satisfies the next proof check

or:

only one candidate has the expected hash / checksum / commitment

or:

only one candidate is consistent with the rest of the transcript

So list decoding is often a first stage:

received word

small list of possible codewords

use extra information to choose one, if needed

From the list back to the message

A natural worry: if list decoding hands back several codewords, how does anyone recover the one original message with no errors? The answer is that list decoding deliberately stops one step short, and two separate steps finish the job.

First, picking the true codeword out of the list is not done from ww alone. Past the unique-decoding radius it is provably impossible to single one out from the received word by itself, so a selector uses information from outside ww: a hash or checksum sent separately, a commitment the sender published earlier, a proof check in the surrounding protocol (the FRI or STARK setting from §2), or plain side information such as "the real message is English text and the others are gibberish." That outside information is what collapses the list to one.

Second, once you hold a single codeword, recovering its message is exact and easy. A codeword is nn evaluations of a degree-<k<k polynomial, and any kk of them interpolate that polynomial uniquely; its coefficients are the message. This is the only step where "zero errors" even applies, and it is the trivial one.

w ──list-decode──▶ {c₁, c₂, c₃} ──selector (outside info)──▶ c₁ ──interpolate──▶ msg

This document is about the first arrow only: how large that candidate list can be, and how far the radius can grow before it explodes. The selector and the interpolation are not our focus; we mention them just so the full path from received word back to message is clear.

Is list decoding probabilistic?

Not inherently. The mathematical definition is deterministic:

Λ(C,w,E)={cC:Δ(c,w)E}.\Lambda(C,w,E)=\{c\in C:\Delta(c,w)\le E\}.

This set is just the collection of all codewords inside a Hamming ball. There is no probability in that definition. Some actual decoding algorithms may use randomness internally for efficiency, but that is separate from the concept of list decoding.

The core idea is combinatorial:

How many valid codewords can fit inside a ball of radius EE around a received word ww?

That question has a definite answer for each fixed code CC, received word ww, and radius EE.

Why list size matters

Suppose the list size is 3.

received word w has 3 nearby codewords

That is manageable. But suppose the list size is huge.

received word w has 1,000,000 nearby codewords

Then the received word is too ambiguous. So the central question is not just:

Can we decode?

The real question is:

How many possible codewords could be close?

That is why we study Λ(C,w,E)|\Lambda(C,w,E)| and BC(E)B_C(E).

The decoding radius tradeoff

As the radius EE grows, the Hamming ball gets bigger, and a bigger ball can contain more codewords.

flowchart LR
    A["small radius E<br/>few nearby codewords"]
    B["larger radius E<br/>more nearby codewords"]
    C["too large radius<br/>possibly huge list"]

    A --> B --> C

Unique decoding usually works only for a smaller radius. List decoding tries to go farther. The tradeoff is:

larger decoding radius

more errors tolerated

possibly more candidate codewords

So list decoding is about finding how far we can increase the radius while keeping the list size controlled.

There is a classical threshold that makes this precise, called the Johnson radius. Below it, the list size is provably bounded. For a Reed-Solomon code of rate ρ=k/n\rho = k/n, it sits at approximately

δJ=1ρ.\delta_J = 1-\sqrt{\rho}.

For now, just hold onto the picture: there is a "safe" radius up to which the number of nearby codewords cannot blow up. We will cover the Johnson radius in depth in §4, where it becomes the floor of the band that the Grand List Decoding challenge cares about.

Summary of the objects

ObjectMeaning
CCthe Reed-Solomon code
wwa received word
EEthe number of errors allowed
Λ(C,w,E)\Lambda(C,w,E)all codewords within distance EE of ww
Λ(C,w,E)\lvert\Lambda(C,w,E)\rvertlist size around ww
BC(E)B_C(E)worst-case list size over all received words

In one sentence:

List decoding studies how many valid codewords can be close to one corrupted word.

4. The Johnson Radius

The Johnson radius is the classical safe radius for Reed-Solomon list decoding.

It is usually written as a relative radius, meaning a fraction of the block length nn.

For a Reed-Solomon code of rate

ρ=k/n,\rho = k/n,

the Johnson radius is approximately

δJ=1ρ.\delta_J = 1-\sqrt{\rho}.

Below this radius, the theory guarantees that a Hamming ball cannot contain too many Reed-Solomon codewords.

So if

δ<δJ,\delta < \delta_J,

then every received word ww has only a controlled number of codewords cc satisfying

Δ(c,w)δn.\Delta(c,w) \le \delta n.

What "safe" means here. "Safe" does not mean likely to work, and it does not mean decoding succeeds. It means provably bounded: there is a theorem guaranteeing the list stays small, bounded by a polynomial in nn, and that guarantee holds for every received word, with no exceptions.

The list size is not a single fixed number. It depends on where the received word ww sits:

  • many ww have an empty list, with no codeword within the radius;
  • most ww have a small list, often 00 or 11;
  • a few awkwardly-placed ww can have a larger list, with several codewords crowding into one ball.

"Safe" is a statement about the ceiling across all of these. Below δJ\delta_J, no received word anywhere, not even the adversary's worst choice, pushes the list above the guaranteed bound. Above δJ\delta_J, that ceiling is no longer promised: some ww might have an exploding list, and ruling that out is the hard problem this document builds toward.

Two edges worth keeping in mind. A bounded list can still hold several candidates, so "safe" is not the same as unique decoding; and at very small radii, below the unique-decoding radius, every ball has at most one codeword.

Example:

If

ρ=14,\rho = \frac14,

then

δJ=1ρ=112=12.\delta_J = 1-\sqrt{\rho}=1-\frac12=\frac12.

So the Johnson bound controls the list size up to about 50% errors. If n=1000n=1000, this means a radius of about 500500 coordinate errors.

Again, this does not mean there is a 50% probability of success. It means that, at this rate, every Hamming ball of radius about 500500 has a controlled number of Reed-Solomon codewords.

The hard region is beyond Johnson:

δ>1ρ.\delta > 1-\sqrt{\rho}.

The capacity-style ceiling is roughly

1ρ.1-\rho.

So the important band is

1ρ<δ<1ρ.1-\sqrt{\rho}<\delta<1-\rho.

This is the Johnson-to-capacity band. It is exactly the zone the Grand List Decoding challenge (§8) will care about:

Johnson radius       capacity-style ceiling
1 - sqrt(ρ)    < δ < 1 - ρ

The question is whether interleaved or folded Reed-Solomon codes (the bundled constructions introduced next, in §5) over structured domains can still have small list size in this beyond-Johnson region.

5. Interleaving and Folding

Up to here, every coordinate of a codeword has held a single field element (§1). The Grand List Decoding challenge is about bundled Reed-Solomon codes instead, where each coordinate holds a small tuple of field elements. There are two standard ways to build such a code, interleaving and folding. This section explains both, says why anyone bundles in the first place, and shows how the Hamming distance of §2 carries over to the bundled setting.

Why bundle at all?

There are two reasons, one practical and one mathematical.

The practical reason is that the fast proof systems that motivate this whole problem (the FRI/STARK setting from the introduction) naturally produce bundled Reed-Solomon objects. The codes that actually need analyzing are therefore bundled ones, not the plain code of §1.

The mathematical reason is the interesting one. Recall the two thresholds from §4 that the rest of the document orbits around: the Johnson radius 1ρ1-\sqrt{\rho}, below which the list is provably small, and the capacity ceiling 1ρ1-\rho, above which no guarantee is possible. For the plain code, the classical guarantee runs out at Johnson. Folding in particular is the classical lever for pushing the achievable decoding radius up out of that Johnson region and toward capacity. So bundling is not idle repackaging; it is the construction that puts "can we decode all the way up to 1ρ1-\rho?" on the table at all.

The question this raises, and the one §8 finally pins down, is: does grouping symbols move the list-decoding frontier, and how far?

One reassurance before the mechanics. Bundling does not change the rate. Counting field elements, both constructions keep the ratio of message symbols to codeword symbols at k/nk/n, so ρ\rho is exactly what it was in §1. We are not buying error tolerance by adding redundancy; we are only regrouping the same symbols and asking whether the regrouping helps.

Interleaving

Interleaving takes mm codewords from the same Reed-Solomon code C=RS[F,L,k]C = RS[\mathbb{F}, L, k] — the same evaluation domain LL and the same degree bound kk, but mm different message polynomials p1,,pmp_1, \dots, p_m — and stacks them coordinate by coordinate.

Write the mm codewords as rows, one polynomial per row, evaluated across the shared domain:

columns →        x₁        x₂        x₃        x₄
c₁  (from p₁):  p₁(x₁)    p₁(x₂)    p₁(x₃)    p₁(x₄)
c₂  (from p₂):  p₂(x₁)    p₂(x₂)    p₂(x₃)    p₂(x₄)
c₃  (from p₃):  p₃(x₁)    p₃(x₂)    p₃(x₃)    p₃(x₄)

Interleaving reads off the columns: each column becomes one bundled coordinate.

bundle each column ↓

[ (p₁(x₁),p₂(x₁),p₃(x₁)), (p₁(x₂),p₂(x₂),p₃(x₂)), (p₁(x₃),p₂(x₃),p₃(x₃)), (p₁(x₄),p₂(x₄),p₃(x₄)) ]

Writing ai=p1(xi)a_i = p_1(x_i), bi=p2(xi)b_i = p_2(x_i), di=p3(xi)d_i = p_3(x_i) for short, that is just

[ (a₁,b₁,d₁), (a₂,b₂,d₂), (a₃,b₃,d₃), (a₄,b₄,d₄) ]

So for an mm-fold interleaving:

  • the block length stays n=Ln = |L| — there are still nn coordinates;
  • each coordinate is now an mm-tuple, an element of Fm\mathbb{F}^m rather than of F\mathbb{F};
  • coordinate ii bundles the ii-th symbol of every layer, so the mm codewords are forced to share a single set of coordinate positions.

That last property is the entire point of interleaving. The layers are locked together at the same positions.

Folding

Folding starts from a single Reed-Solomon codeword and bundles consecutive symbols of it. The word "consecutive" is doing quiet work, though: it only means anything once we fix how the evaluation domain is ordered, and the standard ordering is exactly the multiplicative one from §6.

Take the domain to be the nn-th roots of unity,

L=μn={1,ω,ω2,,ωn1},ωn=1,L = \mu_n = \{1, \omega, \omega^2, \dots, \omega^{n-1}\}, \qquad \omega^n = 1,

listed in that order, so the codeword is

c = [ p(1), p(ω), p(ω²), p(ω³), p(ω⁴), p(ω⁵) ]

Folding by mm (shown here with m=2m = 2) groups the symbols into consecutive blocks:

[ (p(1),p(ω)), (p(ω²),p(ω³)), (p(ω⁴),p(ω⁵)) ]

So for folding by mm:

  • the block length shrinks from nn to n/mn/m — there are now n/mn/m bundled coordinates (we take mm to divide nn, which it does in the FFT case);
  • each coordinate is an mm-tuple in Fm\mathbb{F}^m, exactly as in interleaving;
  • the symbols inside a single fold sit at evaluation points related by ω\omega: a fold has the form (p(y), p(ωy), p(ω2y), , p(ωm1y))\bigl(p(y),\ p(\omega y),\ p(\omega^2 y),\ \dots,\ p(\omega^{m-1} y)\bigr), where yy is the first point of that fold. Stepping from one symbol of a fold to the next multiplies the evaluation point by ω\omega.

The third property is the one the word "adjacent" hides, and it is what makes folding more than relabeling. The bundled symbols are not arbitrary neighbors; they are evaluations at a little geometric run of points y,ωy,ω2y,y, \omega y, \omega^2 y, \dots that steps through the domain by multiplying by the generator ω\omega. That rigid multiplicative relationship is precisely the roots-of-unity structure of §6, and it is exactly what a folded-RS decoder leans on. So folding is built directly on top of the structured, FFT-friendly domains the whole challenge is concerned with.

The two constructions side by side

interleaving   m codewords, one shared RS code     →  bundle the columns
                length stays n                          each coordinate ∈ Fᵐ
                layers share coordinate positions

folding        1 codeword, root-of-unity domain    →  bundle consecutive symbols
                length shrinks to n/m                   each coordinate ∈ Fᵐ
                                                        symbols in a fold related by ω

Both end at the same kind of object: a code over the alphabet Fm\mathbb{F}^m, whose coordinates are bundles of mm field elements. But they get there by different routes — interleaving stacks many codewords sideways, folding chops one codeword into blocks — and they are genuinely different constructions, not two names for one thing.

Distance on bundled coordinates

The Hamming distance of §2 carries over with one adjustment, already hinted at there: it counts bundled coordinates that differ, and a bundled coordinate counts as differing if any one of its mm entries is wrong.

received:  w = [ (5,9), (2,1), (7,0), (8,4) ]
codeword:  c = [ (5,9), (2,3), (7,0), (8,4) ]
                          ^^^^^
                  one entry differs, so this whole
                  bundled coordinate counts as 1 error

Only a single field element is off here, but it spoils one bundled coordinate, so the bundled Hamming distance is 11. The relative radius δ\delta from §2 is then taken against the bundled length: nn for interleaving, n/mn/m for folding.

This metric is the precise reason the optimistic hope from "Why bundle at all?" is even plausible. For a bundled codeword to land within radius EE of ww, it has to agree with ww on at least (length E- E) bundled coordinates — and agreeing on a bundled coordinate means agreeing on all mm entries there at once. A candidate must thread a far narrower needle than in the unbundled code, so one expects fewer candidates to crowd into any single Hamming ball.

What this section claims, and what it does not

Two cautions before we move on.

First, that "stricter agreement, so fewer candidates" argument is an intuition, not a theorem. Whether it actually survives in the beyond-Johnson band depends on the evaluation domain, and the structured domains of §6 may hand an adversary exactly the algebraic coincidences the intuition was quietly assuming away. Turning the hope into a guarantee is the open problem of §8.

Second, interleaving and folding are not interchangeable, even though they land on similar-looking objects. For the rest of the document, CmC^{\equiv m} denotes the specific bundled code used in the Grand List Decoding challenge, with bundling parameter mm. The interleaving and folding pictures above are here to explain why bundled coordinates arise, and why they might move the frontier — not to claim that every bundling construction behaves identically.

6. Smooth Multiplicative Cosets and Generic RS

Many fast Reed-Solomon systems use roots of unity, or multiplicative cosets of roots of unity, as evaluation points.

Let

μn={1,ω,ω2,,ωn1},\mu_n = \{1,\omega,\omega^2,\dots,\omega^{n-1}\},

where

ωn=1.\omega^n = 1.

A common evaluation domain is either the subgroup itself,

L=μn,L = \mu_n,

or a multiplicative coset of it,

L=αμn={α,αω,αω2,,αωn1}.L = \alpha \cdot \mu_n = \{\alpha,\alpha\omega,\alpha\omega^2,\dots,\alpha\omega^{n-1}\}.

For simplicity, you can keep α=1\alpha=1 in mind. When n=2sn=2^s, this is an FFT-friendly domain. This is what people mean here by a smooth multiplicative domain or smooth multiplicative coset.

Such a domain is highly structured rather than random. Its evaluation points are not scattered freely across the field; they are roots of unity, or a shifted copy of roots of unity, locked into a tight web of algebraic relations with one another. That structure is exactly what makes the FFT fast, but it is also what sets these domains apart from the random-looking points the capacity theory was built on.

Generic or random Reed-Solomon codes use evaluation points that behave like random points, and random points usually avoid algebraic coincidences. Smooth Reed-Solomon codes do the opposite: they use structured roots of unity, which have many algebraic relations. For example, in the subgroup case,

xn=1x^n = 1

for all xμnx \in \mu_n. In the coset case xαμnx \in \alpha\mu_n, the corresponding relation is

xn=αn.x^n = \alpha^n.

Also, whenever nn is even (which includes every FFT case n=2sn = 2^s), we have xμn-x \in \mu_n for every xμnx \in \mu_n, since (x)n=(1)nxn=1(-x)^n = (-1)^n x^n = 1. The same pairing holds inside a coset αμn\alpha\mu_n. In characteristic 2 this says nothing, since there x=x-x = x; the relation only bites in odd characteristic, where xx-x \neq x.

So smooth domains are special. This matters because modern random and generic Reed-Solomon capacity results rely on genericity assumptions.

7. Higher-Order MDS

Recall MDS from §2: a maximum distance separable code reaches the largest possible minimum distance, d=nk+1d = n - k + 1, and Reed-Solomon is the standard example. For higher-order MDS we need one more way to say the same thing. Since any kk evaluations of a degree-<k<k polynomial determine it (§1), any kk coordinates already pin down the whole codeword; in generator-matrix terms, every set of kk columns is linearly independent. That last phrasing is the one higher-order MDS generalizes.

Ordinary MDS says:

Small sets of Reed-Solomon columns behave independently.

Higher-order MDS asks a stronger question:

Do several groups of columns have intersections of the expected generic dimension?

A helpful analogy:

  • ordinary MDS asks whether one group of vectors behaves independently;
  • higher-order MDS asks whether several spans overlap only as much as random spans would.

In plain terms: ordinary MDS is about one set of coordinates behaving independently. Higher-order MDS is about many coordinate subsets interacting as if they were random. This matters because recent capacity-style list-decoding results for generic or randomly punctured Reed-Solomon codes rely on this kind of generic intersection behavior. Smooth domains may violate those assumptions because roots of unity satisfy rigid algebraic identities.

Generic Reed-Solomon codes satisfy higher-order MDS. This property is used in modern generic and random Reed-Solomon capacity proofs.

Therefore, one possible strategy is:

Prove smooth roots-of-unity domains also satisfy higher-order MDS, then import generic Reed-Solomon capacity results.

This strategy turns out to fail: smooth roots-of-unity domains do not inherit higher-order MDS the way generic domains do, precisely because of the algebraic relations from §6. So the generic capacity results cannot simply be imported, and the beyond-Johnson behavior of these structured codes stays genuinely open.

That tension, where the generic theory promises capacity but the structured FFT domains of §6 may not inherit it, is exactly what makes the following challenge worth stating precisely.

8. The Grand List Decoding Challenge

We now have every piece we need:

  • the list Λ(C,w,E)\Lambda(C,w,E) of codewords near a received word, and its worst-case size over all received words, BC(E)B_C(E) (§2);
  • the relative radius δ=E/n\delta = E/n (§2);
  • the bundled code CmC^{\equiv m} from interleaving or folding (§5);
  • the Johnson floor 1ρ1-\sqrt{\rho} and the capacity ceiling 1ρ1-\rho, with the interesting band in between (§4);
  • the fact that fast systems use structured, non-generic evaluation domains (§6), so the generic higher-order MDS machinery does not transfer for free (§7).

A note on notation before the statement. We write Λ(Cm,δ)\Lambda(C^{\equiv m},\delta) with two arguments, not three: the received word has been maximized away. So Λ(Cm,δ)\Lambda(C^{\equiv m},\delta) is the worst-case list (the BCB_C idea from §2), now measured at a relative radius δ\delta instead of an absolute one. And rather than asking for a constant bound on the list, we compare its size to the field size F|\mathbb{F}|: over a cryptographically large field (for instance F2256|\mathbb{F}|\approx 2^{256}), a bound of 2128F2^{-128}\cdot|\mathbb{F}| is still an astronomically large number of codewords in absolute terms, yet a negligible fraction of the field, which is exactly the soundness guarantee downstream protocols need.

With that, here is the question this whole document has been building toward.

The grand list decoding challenge

We are given a Reed-Solomon code CC over a smooth multiplicative domain (the structured, FFT-friendly setting of §6). For a given ε\varepsilon^*, say ε=2128\varepsilon^* = 2^{-128}, and a constant mm, determine the largest δC[0,1]\delta_C^* \in [0,1] such that Λ(Cm,δC)εF,|\Lambda(C^{\equiv m},\delta_C^*)| \le \varepsilon^* \cdot |\mathbb{F}|, assuming F|\mathbb{F}| is sufficiently large so that such a δC\delta_C^* exists.

In plain English:

How far can a received word drift from the bundled Reed-Solomon code before the worst-case list of nearby codewords stops being a negligible fraction of the field?

Or, from the adversary's side:

What is the highest corruption rate δ\delta at which no received word, however cleverly chosen, can be confused with more than a negligible fraction of the codewords?

The Johnson radius (§4) already guarantees a δC\delta_C^* that is at least about 1ρ1-\sqrt{\rho}. The capacity ceiling says we cannot hope for more than 1ρ1-\rho. The real challenge is everything in between, and whether the structured, FFT-friendly domains of §6 let us push δC\delta_C^* up toward capacity, or whether their algebraic relations hold it back.

So the frontier sits in the Johnson-to-capacity band: above 1ρ1-\sqrt{\rho} the classical guarantee runs out, below 1ρ1-\rho a guarantee is at least conceivable, and the open question is which edge the structured FFT domains actually land near. Whether their algebraic relations cost us the climb toward capacity is, for now, unresolved, and that is what makes the RS Grand List Decoding problem worth solving.

If you made it this far, pat yourself on the back! Thanks for reading!