Machine perception which corresponds to the ability to understand the visual world based on input from sensors, such as cameras is one of the central problems in Artificial Intelligence. To this end, recent years have witnessed tremendous progress in various instance-level recognition tasks having real-world applications e.g., robotics, autonomous driving, and surveillance. In this talk, I will first present our recent results towards understanding state-of-the-art deep learning-based visual recognition networks in terms of their robustness and generalizability. Next, I will present our results on learning visual recognition models with limited human supervision. Finally, I will discuss moving one step further from instance-level recognition to understanding visual relationships between object pairs.
Fahad Khan is currently a faculty member at MBZUAI, UAE, and Linköping University, Sweden. He received an M.Sc. degree in Intelligent Systems Design from the Chalmers University of Technology, Sweden, and a Ph.D. degree in Computer Vision from Computer Vision Center Barcelona and Autonomous University of Barcelona, Spain. From 2012 to 2014, he was a postdoctoral fellow and then a research fellow (2014-2018) at Computer Vision Laboratory, Linköping University, Sweden. From 2018 to 2020, he was a lead scientist at the Inception Institute of AI, UAE. He has achieved top ranks on various international challenges (Visual Object Tracking VOT: 1st 2014 and 2018, 2nd 2015, 1st 2016; VOT-TIR: 1st 2015 and 2016; OpenCV Tracking: 1st 2015; 1st PASCAL VOC Segmentation and Action Recognition tasks 2010). He received the best paper award in the computer vision track at IEEE ICPR 2016. He has published over 120 reviewed conference papers, journal articles, and book contributions. He serves as a senior program committee member of several prestigious conferences such as CVPR, AAAI, and ACM Multimedia and associate/guest editor of journals such as IEEE TPAMI and IEEE TNNLS. His research interests include a wide range of topics within computer vision, including object recognition, detection, segmentation, tracking, and action recognition.