MULTIMODAL SEMANTIC-AWARE AUTOMATIC COLORIZATION WITH DIFFUSION PRIOR

Shanghai Jiao Tong University
^*Corresponding Author

Abstract

Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausible semantics. To overcome the artifacts introduced by the diffusion prior, we apply the luminance conditional guidance. Moreover, we adopt multimodal high-level semantic priors to help the model understand the image content and deliver saturated colors. Besides, a luminance-aware decoder is designed to restore details and enhance overall visual quality. The proposed pipeline synthesizes saturated colors while maintaining plausible semantics. Experiments indicate that our proposed method considers both diversity and fidelity, surpassing previous methods in terms of perceptual realism and gain most human preference.

Qualitative

Qualitative comparisons among 1)GAN-based method: InstColor, ChromaGAN, BigColor, 2)Transformer-based method: ColTran, CT2, 3)Diffusion-based method: ControlNet and Ours.

User evaluations. To reflect human preferences, we randomly select 15 images from the COCO-Stuff val set for user study. For each image, the 7 results and ground truth are displayed to the user in a random order. Our method has a vote rate of 22.59%, which significantly outperforms other methods.

BibTeX

@misc{wang2024multimodal, title={Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior}, author={Han Wang and Xinning Chai and Yiwen Wang and Yuhong Zhang and Rong Xie and Li Song}, year={2024}, eprint={2404.16678}, archivePrefix={arXiv}, primaryClass={cs.CV} }

MULTIMODAL SEMANTIC-AWARE AUTOMATIC COLORIZATION WITH DIFFUSION PRIOR

Showcases produced by our pipeline.

Abstract

Framework

Overview of the proposed automatic colorization pipeline. It combines a semantic prior generator (blue box), a high-level semantic guided diffusion model(yellow box), and a luminance-aware decoder (orange box).

Evaluations

Qualitative

Qualitative comparisons among 1)GAN-based method: InstColor, ChromaGAN, BigColor, 2)Transformer-based method: ColTran, CT2, 3)Diffusion-based method: ControlNet and Ours.

User evaluations. To reflect human preferences, we randomly select 15 images from the COCO-Stuff val set for user study. For each image, the 7 results and ground truth are displayed to the user in a random order. Our method has a vote rate of 22.59%, which significantly outperforms other methods.

Quantitative

Ablation

Discussion on parameter i.

BibTeX