Gwangju Institute of Science and Technology
Image-space denoising of rendered images has become a commonly adopted approach since this post-rendering process often drastically reduces required sample counts (thus rendering times) for producing a visually pleasing image without noticeable noise. It is a common practice to conduct such denoising while preserving image details by exploiting auxiliary information (e.g., G-buffers) as well as input colors. However, it is still challenging to devise an ideal denoising framework that fully considers the two inputs with different characteristics, e.g., noisy but complete shading information in the input colors and less noisy but partial shading information in the auxiliary buffers. This paper proposes a transformer-based denoising framework with a new self-attention mechanism that infers a joint self-attention map, i.e., self-similarity in input features, through dual attention scores: one from noisy colors and another from auxiliary buffers. We demonstrate that this separate consideration of the two inputs allows our framework to produce more accurate denoising results than state-of-the-art denoisers for various test scenes.