by cahlen

GPU Matrix Enumeration: 175× Faster Zaremba Verification

The Finding

Reformulating continued fraction tree enumeration as batched 2×2 matrix multiplication moves the entire computation to GPU, eliminating all CPU bottlenecks. The result:

Rangev4 (CPU tree)v5 (GPU matrix)Speedup
10710^7157s0.9s175×
10810^8~hours7.5s~1000×
10910^{9}~days14.4s~10,000×
101010^{10}infeasible43s

The Key Insight

Every CF path [a1,a2,,ak][a_1, a_2, \ldots, a_k] with partial quotients in {1,,5}\{1,\ldots,5\} corresponds to the matrix product:

γ=ga1ga2gak,ga=(a110)\gamma = g_{a_1} \cdot g_{a_2} \cdots g_{a_k}, \qquad g_a = \begin{pmatrix} a & 1 \\ 1 & 0 \end{pmatrix}

The denominator dd is the (2,1)(2,1) entry of γ\gamma. So enumerating Zaremba denominators is equivalent to computing batched matrix products — exactly what GPUs are designed for.

The Algorithm

At each depth kk, we have a batch of 2×2 matrices. To advance to depth k+1k+1:

  1. Expand: multiply each matrix by g1,g2,g3,g4,g5g_1, g_2, g_3, g_4, g_5 (5 children per parent)
  2. Mark: write the (2,1)(2,1) entry (denominator) into a bitset via atomicOr
  3. Compact: discard matrices whose denominator exceeds dmaxd_{\max} (dead branches)

All three operations are fused into a single CUDA kernel. The compaction uses atomicAdd for output positions, keeping only live matrices for the next depth.

The compaction is critical: without it, the matrix count grows as 5k5^k (exponential). With it, the live count peaks around depth 12-13 and then shrinks as branches die off.

Why It’s Fast

  • No CPU recursion: the previous v4 approach walked the CF tree recursively on CPU, feeding results to GPU for bitset marking. v5 does EVERYTHING on GPU.
  • Massive parallelism: at peak depth, ~244M matrices are expanded simultaneously by 244M CUDA threads
  • Memory efficient: fused expand+compact means we never store more than 2 buffers of live matrices
  • Naturally pruning: the denominator bound eliminates dead branches at every step

References

  • Zaremba, S.K. (1972). “La méthode des ‘bons treillis’ pour le calcul des intégrales multiples.” Applications of Number Theory to Numerical Analysis, pp. 39–119.
  • Bourgain, J. and Kontorovich, A. (2014). “On Zaremba’s conjecture.” Annals of Mathematics, 180(1), pp. 137–196.

Computed on NVIDIA DGX B200. Code: github.com/cahlen/idontknow.

This work was produced through human–AI collaboration (Cahlen Humphreys + Claude). Not independently peer-reviewed. All code and data open for verification at github.com/cahlen/idontknow.

Recent Updates

updateAuto-generated changelog from git log — updates on every build