GPU Matrix Enumeration: 175× Faster Zaremba Verification
The Finding
Reformulating continued fraction tree enumeration as batched 2×2 matrix multiplication moves the entire computation to GPU, eliminating all CPU bottlenecks. The result:
| Range | v4 (CPU tree) | v5 (GPU matrix) | Speedup |
|---|---|---|---|
| 157s | 0.9s | 175× | |
| ~hours | 7.5s | ~1000× | |
| ~days | 14.4s | ~10,000× | |
| infeasible | 43s | — |
The Key Insight
Every CF path with partial quotients in corresponds to the matrix product:
The denominator is the entry of . So enumerating Zaremba denominators is equivalent to computing batched matrix products — exactly what GPUs are designed for.
The Algorithm
At each depth , we have a batch of 2×2 matrices. To advance to depth :
- Expand: multiply each matrix by (5 children per parent)
- Mark: write the entry (denominator) into a bitset via
atomicOr - Compact: discard matrices whose denominator exceeds (dead branches)
All three operations are fused into a single CUDA kernel. The compaction uses atomicAdd for output positions, keeping only live matrices for the next depth.
The compaction is critical: without it, the matrix count grows as (exponential). With it, the live count peaks around depth 12-13 and then shrinks as branches die off.
Why It’s Fast
- No CPU recursion: the previous v4 approach walked the CF tree recursively on CPU, feeding results to GPU for bitset marking. v5 does EVERYTHING on GPU.
- Massive parallelism: at peak depth, ~244M matrices are expanded simultaneously by 244M CUDA threads
- Memory efficient: fused expand+compact means we never store more than 2 buffers of live matrices
- Naturally pruning: the denominator bound eliminates dead branches at every step
References
- Zaremba, S.K. (1972). “La méthode des ‘bons treillis’ pour le calcul des intégrales multiples.” Applications of Number Theory to Numerical Analysis, pp. 39–119.
- Bourgain, J. and Kontorovich, A. (2014). “On Zaremba’s conjecture.” Annals of Mathematics, 180(1), pp. 137–196.
Computed on NVIDIA DGX B200. Code: github.com/cahlen/idontknow.
This work was produced through human–AI collaboration (Cahlen Humphreys + Claude). Not independently peer-reviewed. All code and data open for verification at github.com/cahlen/idontknow.