Patched Denoising Diffusion Models For High-Resolution Image Synthesis
ICLR 2024

1UC San Diego   2Stanford University

Generated image of size 1024×512 using the model trained on 21k natural images using a 148M-parameters model.

Abstract

We propose an effective denoising diffusion model for generating high-resolution images (e.g., 1024×512) trained on small-size image patches (e.g., 64×64). We name our algorithm Patch-DM, in which a new feature collage strategy is designed to avoid the boundary artifact when synthesizing large-size images. Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space. Patch-DM produces high-quality image synthesis results on our newly collected dataset of nature images (1024×512), as well as on standard benchmarks of LHQ (1024×1024), FFHQ (1024×1024) and on other datasets with smaller sizes (256×256), including LSUN-Bedroom, LSUN-Church, and FFHQ. We compare our method with previous patch-based generation methods and achieve state-of-the-art FID scores on all sixdatasets. Further, Patch-DM also reduces memory complexity compared to the classic diffusion models.

Method Overview

More Results on High-Resolution Generation on Nature Dataset

BibTeX

@inproceedings{ding2024patched,
      title={Patched Denoising Diffusion Models For High-Resolution Image Synthesis},
      author={Zheng Ding and Mengqi Zhang and Jiajun Wu and Zhuowen Tu},
      booktitle={The Twelfth International Conference on Learning Representations},
      year={2024}
}