Tuesday, September 03, 2024, 14:00
- 17:00
Building 2, Level 5, Room 5209
Contact Person
Deep Neural Networks (DNNs) have demonstrated exceptional performance in various fields, but they often suffer from performance degradation when encountering domain shifts, common in real-world applications. This thesis addresses this challenge by exploring and improving robustness approaches.
Thursday, May 04, 2023, 07:30
- 09:00
KAUST
Contact Person
The growth of digital cameras and data communication has led to an exponential increase in video production and dissemination. As a result, automatic video analysis and understanding has become a crucial research topic in the computer vision community. However, the localization problem, which involves identifying a specific event in a large volume of data, particularly in long-form videos, remains a significant challenge.
Monday, April 10, 2023, 17:00
- 19:00
Building 3, Level 5, Room 5220
Contact Person
Deep Neural Networks (DNNs) have shown huge success over the years to solve many 2D computer vision tasks driven by massive labeled 2D datasets and advancements in 2D vision models, but less success is witnessed on 3D vision tasks. This dissertation proposes innovative approaches to enhance the robustness of DNNs for 3D understanding and in 3D settings. The research focuses on two main areas: adversarial robustness on 3D data and setups, and the robustness of DNNs to realistic 3D scenarios. Two paradigms for 3D understanding are discussed: representing 3D as a set of 3D points and performing 2D processing of multiple images of the 3D data.
Monday, January 23, 2023, 18:30
- 20:30
Building 2, Level 5, Room 5209
Contact Person
With video data dominating the internet traffic, it is crucial to develop automated models that can analyze and understand what humans do in videos. Such models must solve tasks such as action classification, temporal activity localization, spatiotemporal action detection, and video captioning. This dissertation aims to identify the challenges hindering the progress in human action understanding and propose novel solutions to overcome these challenges.
Monday, November 30, 2020, 12:00
- 13:00
KAUST
Contact Person
In this talk, I will give an overview of research done in the Image and Video Understanding Lab (IVUL) at KAUST. At IVUL, we work on topics that are important to the computer vision (CV) and machine learning (ML) communities, with emphasis on three research themes: Theme 1 (Video Understanding), Theme 2 (Visual Computing for Automated Navigation), Theme 3 (Fundamentals/Foundations).
Thursday, May 28, 2020, 16:00
- 18:00
KAUST
Contact Person
One of the main goals in computer vision is to achieve a human-like understanding of images. This understanding has been recently represented in various forms, including image classification, object detection, semantic segmentation, among many others. Nevertheless, image understanding has been mainly studied in the 2D image frame, so more information is needed to relate them to the 3D world. With the emergence of 3D sensors (e.g. the Microsoft Kinect), which provide depth along with color information, the task of propagating 2D knowledge into 3D becomes more attainable and enables interaction between a machine (e.g. robot) and its environment. This dissertation focuses on three aspects of indoor 3D scene understanding: (1) 2D-driven 3D object detection for single frame scenes with inherent 2D information, (2) 3D object instance segmentation for 3D reconstructed scenes, and (3) using room and floor orientation for automatic labeling of indoor scenes that could be used for self-supervised object segmentation. These methods allow capturing of physical extents of 3D objects, such as their sizes and actual locations within a scene.
Monday, March 30, 2020, 18:00
- 20:00
KAUST
Contact Person
In this dissertation, we aim at theoretically studying and analyzing deep learning models. Since deep models substantially vary in their shapes and sizes, in this dissertation, we restrict our work to a single fundamental block of layers that is common in almost all architectures. The block of layers of interest is the composition of an affine layer followed by a nonlinear activation function and then lastly followed by another affine layer. We study this block of layers from three different perspectives. (i) An Optimization Perspective. We try addressing the following question: Is it possible that the output of the forward pass through the block of layers highlighted above is an optimal solution to a certain convex optimization problem? As a result, we show an equivalency between the forward pass through this block of layers and a single iteration of certain types of deterministic and stochastic algorithms solving a particular class of tensor formulated convex optimization problems.
Tuesday, May 14, 2019, 16:00
- 17:00
B2 L5 Room 5220
Contact Person
This work investigates the problem of transfer from simulation to the real world in the context of autonomous navigation. To this end, we first present a photo-realistic training and evaluation simulator Sim4CV which enables several applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator features cars and unmanned aerial vehicles (UAVs) with a realistic physics simulation and diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning.