Cache-Oblivious Ray Reordering

Bochang Moon 1 , Youngyong Byun 1 , Tae-Joon Kim 1 , Pio Claudio 1 , Hye-sun Kim 2 , Yun-ji Ban 2 , Seung Woo Nam 2 , Sung-Eui Yoon 1

KAIST 1 , Electronics and Telecommunications Research Institute (ETRI) 2

ACM Transactions on Graphics, Vol. 29, No. 3, 2010

The image shows the results of our method applied to path tracing. The image shows a St. Matthew model, two Lucy, and two David models in a Sponza model. This scene consists of 104 million triangles, requiring 12.8 GB for the original meshes and their acceleration hierarchies. This scene has 128 M triangles and takes 15.7 GB. This method generate many incoherent rays to render these images. By reordering such rays, we achieve more than one order of magnitude performance improvement in a machine with 4 GB main memory, compared to without reordering rays. This performance improvement is caused by the improved ray coherence.


We present a cache-oblivious ray reordering method for ray tracing. Many global illumination methods such as path tracing and photon mapping use ray tracing and generate lots of rays to simulate various realistic visual effects. However, these rays tend to be very incoherent and show lower cache utilizations during ray tracing of models. In order to address this problem and improve the ray coherence, we propose a novel hit point heuristic (HPH) to compute a coherent ordering of rays. The HPH uses the hit points between rays and the scene as a ray reordering measure.We reorder rays by using a space-filling curve based on their hit points. Since a hit point of a ray is available only after performing the ray intersection test with the scene, we compute an approximate hit point for the ray by performing an intersection test between the ray and simplified representations of the original models. Our method is a highly modular approach, since our reordering method is decoupled from other components of common ray tracing systems. We apply our method to photon mapping and path tracing and achieve more than an order of magnitude performance improvement for massive models that cannot fit into main memory, compared to rendering without reordering rays. Also, our method shows a performance improvement even for ray tracing small models that can fit into main memory. This performance improvement for small and massive models is caused by reducing cache misses occurring between different memory levels including the L1/L2 caches, main memory, and disk. This result demonstrates the cache-oblivious nature of our method, which works for various kinds of cache parameters. Because of the cache-obliviousness and the high modularity, our method can be widely applied to many existing ray tracing systems and show performance improvements with various models and machines that have different cache parameters.