Click to Move: Controlling Video Generation with Sparse Motion

  • Pierfrancesco Ardino
  • , Marco De Nadai
  • , Bruno Lepri
  • , Elisa Ricci
  • , Stéphane Lathuilière

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input. Notably, our proposed deep architecture incorporates a Graph Convolution Network (GCN) modelling the movements of all the objects in the scene in a holistic manner and effectively combining the sparse user motion information and image features. Experimental results show that C2M outperforms existing methods on two publicly available datasets, thus demonstrating the effectiveness of our GCN framework at modelling object interactions. The source code is publicly available at https://github.com/PierfrancescoArdino/C2M.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages14729-14738
Number of pages10
ISBN (Electronic)9781665428125
DOIs
Publication statusPublished - 1 Jan 2021
Event18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada
Duration: 11 Oct 202117 Oct 2021

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499

Conference

Conference18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Country/TerritoryCanada
CityVirtual, Online
Period11/10/2117/10/21

Fingerprint

Dive into the research topics of 'Click to Move: Controlling Video Generation with Sparse Motion'. Together they form a unique fingerprint.

Cite this