TY - GEN
T1 - Playable Environments
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
AU - Menapace, Willi
AU - Lathuiliere, Stephane
AU - Siarohin, Aliaksandr
AU - Theobalt, Christian
AU - Tulyakov, Sergey
AU - Golyanik, Vladislav
AU - Ricci, Elisa
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - We present Playable Environments-a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint. Our method builds an environment state for each frame, which can be manipulated by our proposed action mod-ule and decoded back to the image space with volumetric rendering. To support diverse appearances of objects, we extend neural radiance fields with style-based modulation. Our method trains on a collection of various monocular videos requiring only the estimated camera parameters and 2D object locations. To set a challenging benchmark, we in-troduce two large scale video datasets with significant cam-era movements. As evidenced by our experiments, playable environments enable several creative applications not at-tainable by prior video synthesis works, including playable 3D video generation, stylization and manipulation11willi-menapace.github.io/playable-environments-website.
AB - We present Playable Environments-a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint. Our method builds an environment state for each frame, which can be manipulated by our proposed action mod-ule and decoded back to the image space with volumetric rendering. To support diverse appearances of objects, we extend neural radiance fields with style-based modulation. Our method trains on a collection of various monocular videos requiring only the estimated camera parameters and 2D object locations. To set a challenging benchmark, we in-troduce two large scale video datasets with significant cam-era movements. As evidenced by our experiments, playable environments enable several creative applications not at-tainable by prior video synthesis works, including playable 3D video generation, stylization and manipulation11willi-menapace.github.io/playable-environments-website.
KW - Image and video synthesis and generation
KW - Machine learning
KW - Self-& semi-& meta- & unsupervised learning
U2 - 10.1109/CVPR52688.2022.00357
DO - 10.1109/CVPR52688.2022.00357
M3 - Conference contribution
AN - SCOPUS:85141633330
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 3574
EP - 3583
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE Computer Society
Y2 - 19 June 2022 through 24 June 2022
ER -