Valeo4Cast: A Modular Approach to End-to-End Forecasting

  • Yihong Xu
  • , Éloi Zablocki
  • , Alexandre Boulch
  • , Gilles Puy
  • , Mickael Chen
  • , Florent Bartoccioni
  • , Nermin Samet
  • , Oriane Siméoni
  • , Spyros Gidaris
  • , Tuan Hung Vu
  • , Andrei Bursuc
  • , Eduardo Valle
  • , Renaud Marlet
  • , Matthieu Cord

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect and track from sensor data (cameras or LiDARs) the past trajectories of the different elements of the scene and predict their future locations. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting, and instead use a modular approach. We individually build and train detection, tracking and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. We conduct an in-depth study on the finetuning strategies and it reveals that our simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 End-to-end Forecasting Challenge, with 63.82 mAPf. We surpass forecasting results by +17.1 points over last year’s winner and by +13.3 points over this year’s runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 Workshops, Proceedings
EditorsAlessio Del Bue, Cristian Canton, Jordi Pont-Tuset, Tatiana Tommasi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages1-14
Number of pages14
ISBN (Print)9783031917660
DOIs
Publication statusPublished - 1 Jan 2025
Externally publishedYes
EventWorkshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Duration: 29 Sept 20244 Oct 2024

Publication series

NameLecture Notes in Computer Science
Volume15629 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceWorkshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024
Country/TerritoryItaly
CityMilan
Period29/09/244/10/24

Keywords

  • End-to-end motion forecasting
  • Finetuning
  • Modular approach

Fingerprint

Dive into the research topics of 'Valeo4Cast: A Modular Approach to End-to-End Forecasting'. Together they form a unique fingerprint.

Cite this