Learning Sequential Modeling Algorithms for Motion Forecasting, Reinforcement Learning, and Vision-Language Understanding

Sequential modeling algorithms have made significant strides in a variety of domains, facilitating intelligent decision-making and planning in complex scenarios. This dissertation explores the potential and limitations of these algorithms, unveiling novel approaches to enhance their performance across diverse fields, from autonomous driving and trajectory forecasting to reinforcement learning and vision language understanding.

Overview

Abstract

Sequential modeling algorithms have made significant strides in a variety of domains, facilitating intelligent decision-making and planning in complex scenarios. This dissertation explores the potential and limitations of these algorithms, unveiling novel approaches to enhance their performance across diverse fields, from autonomous driving and trajectory forecasting to reinforcement learning and vision language understanding. Through generating new driving trajectories via mixing driving modes, this work first introduces HalentNet, a model that leverages data augmentation to improve motion forecasting in autonomous driving.
Additionally, we propose a new objective, unlikelihood training, to mitigate the challenges faced by state-of-the-art models in assigning high probabilities to unlikely sequences, thereby enhancing the safety and performance of autonomous driving applications. Furthermore, the dissertation delves into the realm of reinforcement learning, presenting the Value Memory Graph (VMG) as a discrete world model that abstracts complex environments for improved future action sequence planning. The innovative approach of Action-Free Guide (AF-Guide) is also explored, demonstrating its efficacy in improving online reinforcement learning using action-free offline sequences to build informative reward signals. Lastly, the dissertation examines the emergence of advanced multi-modal abilities in the most powerful sequence model till the time of this dissertation, GPT-4, and introduces an open-sourced framework MiniGPT-4 that possesses similar vision-language abilities, shedding light on the intricate relationship between large language models and enhanced multi-modal generation capabilities.
Through comprehensive experiments and analysis, this dissertation contributes to the advancements in sequential modeling algorithms, paving the way for their widespread application and further development in various domains.

Brief Biography

Deyao Zhu is a PhD candidate specializing in artificial intelligence, with a keen focus on sequence models. His research explores the intricacies of artificial intelligence methodologies designed to interpret and process various sequential data across multiple domains. This includes predicting vehicle trajectories in autonomous driving, optimizing action sequences for AI agents such as robotics, and utilizing large language models to describe images through natural language. Through his work, Deyao aims to deepen our understanding and enhance the capabilities of sequence models in various applications, contributing to the progression of the field of artificial intelligence.

Presenters