Abstract
The recognition of human behavior has driven most advances in video analysis over the past decade. However, these advances have focused exclusively on the analysis of the visual stream, and target the recognition of human actions. Recently, this paradigm has shifted to account for the inherently multimodal nature of video, which contains audio and visual data, and whose audio stream can be automatically mapped into a text description. In this talk, we discuss some recent advances in video analysis developed at the Image and Video Understanding Lab (IVUL). Our focus is the integration of multimodal data (visual, audio, and text) for video analysis. We will present some works in audiovisual detection (automatically matching audio events to visual sources) and continual learning (avoiding network retraining in the presence of novel data) with application to video data.
Brief Biography
Juan Carlos L. Alcazar is a Postdoctoral Research Fellow at the Visual Computing Center (VCC) at King Abdullah University of Science and Technology (KAUST), working with Professor Bernard Ghanem at the Image and Video Understanding Laboratory (IVUL). Juan Carlos L. Alcazar is a Ph.D. in Computer Science from Los Andes University, and MSc. from the Universidad Nacional de Colombia. He completed my Ph.D. dissertation on applied Computer Vision under the supervision of Prof. Pablo Arbeláez, at the Biomedical Computervision Group. Before joined KAUST, he was a research assitant in Universidad de Los Andes at Bogotá D.C. Area, Colombia. His main research interest is the analysis of contextual information for "in the wild" video understanding.