Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving

Tongji University
*Indicates Equal Contribution
ICRA 2025

Abstract

Vehicle-to-everything technologies (V2X) have become an ideal paradigm to extend the perception range and see through the occlusion. Exiting efforts focus on single-frame cooperative perception, however, how to capture the temporal cue between frames with V2X to facilitate the prediction task even the planning task is still underexplored. In this paper, we introduce the Co-MTP, a general cooperative trajectory prediction framework with multi-temporal fusion for autonomous driving, which leverages the V2X system to fully capture the interaction among agents in both history and future domains to benefit the planning. In the history domain, V2X can complement the incomplete history trajectory in single-vehicle perception, and we design a heterogeneous graph transformer to learn the fusion of the history feature from multiple agents and capture the history interaction. Moreover, the goal of prediction is to support future planning. Thus, in the future domain, V2X can provide the prediction results of surrounding objects, and we further extend the graph transformer to capture the future interaction among the ego planning and the other vehicles' intentions and obtain the final future scenario state under a certain planning action. We evaluate the Co-MTP framework on the real-world dataset V2X-Seq, and the results show that Co-MTP achieves state-of-the-art performance and that both history and future fusion can greatly benefit prediction.

Co-MTP Framework

MY ALT TEXT

The overall architecture of Co-MTP. In this framework, infrastructures share the history and their prediction results to ego CAV. Then, we construct a heterogeneous scene graph with the processed trajectory data and map information, categorizing them according to the types of objects and map elements. Next, we initialize the features of nodes and edges in the relative coordinate system of each object. The CTCA Fusion is used to update the features of the nodes and edges selected by the STSA module over K Transformer layers. Finally, we take the nodes' hidden features from the last layer and input them into the Multimodal Decoder to obtain the multimodal trajectory prediction results.

Experiment

Description of the new image

Performance comparison on the V2X-Seq dataset. TNT, HiVT and V2X-Graph are existing methods on the V2X-Seq dataset. Co-HTTP is the baseline model, simplified from our Co-MTP model. The framework Co-MTP ranks first across minADE/minFDE/MR in the benchmark of the dataset.


Description of the new image

Results of model ablation study. We examine the effectiveness of multiview data processing strategies and the decoder, assessing Co-MTP variations separately in history and future time dimensions.


Description of image 1 Description of image 2

Robustness assessment. We conduct robustness assessments by introducing noise and communication delays, assuming a positional deviation of 0.2 meters and a time delay of 0.5 seconds. We design experiments using the same Co-MTP model base, alongside two variants: Co-MTP-no fusion, which excludes the future fusion, and Co-HTTP-nofut, which simply stitches the trajectory without future information.

MY ALT TEXT

(a)

MY ALT TEXT

(b)

MY ALT TEXT

(c)

MY ALT TEXT

(d)

Qualitative examples of Co-MTP on V2X-Seq dataset. The red box are AV, while the orange ones are the predicted targets. The history ground-truth are shown in blue, the predicted trajectories are shown in green, , and the future ground-truth are shown in brown.

BibTeX

@misc{zhang2025comtpcooperativetrajectoryprediction,
      title={Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving}, 
      author={Xinyu Zhang and Zewei Zhou and Zhaoyi Wang and Yangjie Ji and Yanjun Huang and Hong Chen},
      year={2025},
      eprint={2502.16589},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.16589}, 
}