Joint Self-Attention for Denoising Monte Carlo Rendering

Geunwoo Oh 1 , Bochang Moon 1

Gwangju Institute of Science and Technology

The Visual Computer

Joint Self-Attention for Denoising Monte Carlo Rendering
We design a new transformer block that jointly extracts pixel similarities from the two kinds of inputs, noisy colors and G-buffers (e.g., depths and normals), and build a denoising framework by hierarchically connecting our transformer blocks. Our method preserves the fine image details (e.g., hair strands) more faithfully than the state-of-the-art denoiser (AFGSA) by exploiting the rendering-specific auxiliary buffers more effectively via our joint self-attention scheme.


Image-space denoising of rendered images has become a commonly adopted approach since this post-rendering process often drastically reduces required sample counts (thus rendering times) for producing a visually pleasing image without noticeable noise. It is a common practice to conduct such denoising while preserving image details by exploiting auxiliary information (e.g., G-buffers) as well as input colors. However, it is still challenging to devise an ideal denoising framework that fully considers the two inputs with different characteristics, e.g., noisy but complete shading information in the input colors and less noisy but partial shading information in the auxiliary buffers. This paper proposes a transformer-based denoising framework with a new self-attention mechanism that infers a joint self-attention map, i.e., self-similarity in input features, through dual attention scores: one from noisy colors and another from auxiliary buffers. We demonstrate that this separate consideration of the two inputs allows our framework to produce more accurate denoising results than state-of-the-art denoisers for various test scenes.