TY - GEN
T1 - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery
AU - Shen, Xi
AU - Efros, Alexei A.
AU - Joulin, Armand
AU - Aubry, Mathieu
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - The goal of this work is to efficiently identify visually similar patterns in images, e.g. identifying an artwork detail copied between an engraving and an oil painting, or recognizing parts a night-time photograph visible in its daytime counterpart. Lack of training data is a key challenge for this co-segmentation task. We present a simple yet surprisingly effective approach to overcome this difficulty: we generate synthetic training pairs by selecting segments in an image and copy-pasting them into another image. We then learn to predict the repeated region masks. We find that it is crucial to predict the correspondences as an auxiliary task and to use Poisson blending and style transfer on the training pairs to generalize on real data.We analyse results with two deep architectures relevant to our joint image analysis task: a transformer-based architecture and Sparse Nc-Net, a recent network designed to predict coarse correspondences using 4D convolutions. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset and achieves competitive performance on two place recognition benchmarks, Tokyo247 and Pitts30K. We also demonstrate the potential of our approach for unsupervised image collection analysis by introducing a spectral graph clustering approach to object discovery and demonstrating it on the object discovery dataset of [49] and the Brueghel dataset. Our code and data are available at http://imagine.enpc.fr/~shenx/SegSwap/.
AB - The goal of this work is to efficiently identify visually similar patterns in images, e.g. identifying an artwork detail copied between an engraving and an oil painting, or recognizing parts a night-time photograph visible in its daytime counterpart. Lack of training data is a key challenge for this co-segmentation task. We present a simple yet surprisingly effective approach to overcome this difficulty: we generate synthetic training pairs by selecting segments in an image and copy-pasting them into another image. We then learn to predict the repeated region masks. We find that it is crucial to predict the correspondences as an auxiliary task and to use Poisson blending and style transfer on the training pairs to generalize on real data.We analyse results with two deep architectures relevant to our joint image analysis task: a transformer-based architecture and Sparse Nc-Net, a recent network designed to predict coarse correspondences using 4D convolutions. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset and achieves competitive performance on two place recognition benchmarks, Tokyo247 and Pitts30K. We also demonstrate the potential of our approach for unsupervised image collection analysis by introducing a spectral graph clustering approach to object discovery and demonstrating it on the object discovery dataset of [49] and the Brueghel dataset. Our code and data are available at http://imagine.enpc.fr/~shenx/SegSwap/.
UR - https://www.scopus.com/pages/publications/85137773871
U2 - 10.1109/CVPRW56347.2022.00556
DO - 10.1109/CVPRW56347.2022.00556
M3 - Conference contribution
AN - SCOPUS:85137773871
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 5078
EP - 5088
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
PB - IEEE Computer Society
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
Y2 - 19 June 2022 through 20 June 2022
ER -