Advancing Monocular Depth Estimation: Novel Formulations and Applications

PhD Dissertation Defense

Event Start

2024-07-17 - 12:00

Event End

2024-07-17 - 14:00

Location

Building 1, Level 3, Room 3119

Shariq Farooq Bhat

PhD Student, Computer Science

Abstract

Monocular depth estimation, the task of inferring depth information from a single RGB image, is a fundamental yet challenging problem in computer vision due to its inherently ill-posed nature. This dissertation presents a series of approaches that significantly advance the state-of-the-art in depth estimation. First, we discuss our novel formulation of depth estimation - the Adaptive Bins framework, which introduces dynamically adjusting depth bins representing depth distribution based on the characteristics of each input scene and expresses depth as a weighted linear combination of bin centers. The Adaptive Bins framework has advanced the state-of-the-art in depth estimation and serves as a foundational pillar for ongoing research in the field. We then explore learning depth distributions at a local neighborhood level capturing finer-grained depth variations within an image and propose a multi-scale bin adjustment mechanism. Furthermore, we explore zero-shot generalization in depth estimation which is particularly poor in existing models. By combining relative and metric depth estimation, and proposing a superior bin adjustment mechanism, we design a framework to generalize effectively across diverse datasets while maintaining metric depth accuracy. As a result, we achieve unprecedented zero-shot generalization performance on multiple unseen datasets improving the state-of-the-art by up to 11x. Finally, we introduce LooseControl, a novel application of depth estimation in depth-conditioned image generation using diffusion models. LooseControl enables flexible content creation by allowing users to specify scene boundaries and object locations, opening new possibilities for creating complex environments with minimal effort.

Brief Biography

Shariq Farooq Bhat obtained his BS degree in Electronics & Communication Engineering from the National Institute of Technology Srinagar, India in 2018. He worked as a Machine Learning Engineer at Harman International, Samsung, until joining KAUST in Spring 2020 obtaining his MS degree in 2021. He is currently a CS PhD student working under the guidance of Prof. Peter Wonka. During his PhD, he worked as a Research intern at Intel, Germany in 2022 and as a Visiting Researcher at UCL, London in 2023.