Passer à la navigation principale Passer à la recherche Passer au contenu principal

Zero-shot spatial layout conditioning for text-to-image diffusion models

  • Guillaume Couairon
  • , Marlène Careil
  • , Matthieu Cord
  • , Stéphane Lathuilière
  • , Jakob Verbeek
  • Meta Ai
  • Sorbonne Université
  • Telecom Paris

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modeling and allow for an intuitive and powerful user interface to drive the image generation process. Expressing spatial constraints, e.g. to position specific objects in particular locations, is cumbersome using text; and current text-based image generation models are not able to accurately follow such instructions. In this paper we consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content. We propose ZestGuide, a "zero-shot"segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models, and does not require any additional training. It leverages implicit segmentation maps that can be extracted from cross-attention layers, and uses them to align the generation with input masks. Our experimental results combine high image quality with accurate alignment of generated content with input segmentations, and improve over prior work both quantitatively and qualitatively, including methods that require training on images with corresponding segmentations. Compared to Paint with Words, the previous state-of-the art in image generation with zero-shot segmentation conditioning, we improve by 5 to 10 mIoU points on the COCO dataset with similar FID scores.

langue originaleAnglais
titreProceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
EditeurInstitute of Electrical and Electronics Engineers Inc.
Pages2174-2183
Nombre de pages10
ISBN (Electronique)9798350307184
Les DOIs
étatPublié - 1 janv. 2023
Evénement2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, France
Durée: 2 oct. 20236 oct. 2023

Série de publications

NomProceedings of the IEEE International Conference on Computer Vision
ISSN (imprimé)1550-5499

Une conférence

Une conférence2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Pays/TerritoireFrance
La villeParis
période2/10/236/10/23

Empreinte digitale

Examiner les sujets de recherche de « Zero-shot spatial layout conditioning for text-to-image diffusion models ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation