
Bridging 2D and 3D for Label-Efficient Learning in Scene Understanding
This talk presents a series of novel methods that bridge 2D and 3D data representations to achieve robust, label-efficient scene understanding, encompassing unsupervised semantic and instance segmentation, open-vocabulary 3D scene graph generation, and RelationField, which extends neural radiance fields to capture open-vocabulary object relationships implicitly.
Overview
Effectively understanding complex scenes from limited or unlabeled data is a key challenge in modern deep learning. This talk explores recent advancements in bridging the semantic and geometric gap between 2D and 3D data representations to achieve more efficient and robust scene understanding. First, I will introduce an approach to unsupervised semantic segmentation, which leverages depth-guided feature correlations and 3D-informed sampling techniques, thereby significantly improving semantic segmentation without explicit annotations. Building upon this, I will present CutS3D, a method that enriches unsupervised 2D instance segmentation by explicitly using 3D geometry to refine segmentation masks and generate spatially coherent instance segmentations. Next, I will discuss Open3DSG, the first method for generating open-vocabulary 3D scene graphs directly from point clouds without the need for explicit labels, enabling richer semantic understanding and complex inter-object relationships. Finally, I will present RelationField, which extends neural radiance fields to capture open-vocabulary object relationships implicitly, further enriching the capability of models to perform sophisticated, relationship-based queries. Together, these methods demonstrate how integrating 2D and 3D information significantly enhances label efficiency and generalization capabilities in scene understanding tasks.
Presenters
Timo Ropinski, Professor, Institute of Media Informatics, Ulm University, Germany
Brief Biography
Timo Ropinski leads the Visual Computing Group at Ulm University. Previously, he served as a full professor at Linköping University in Sweden, where he led the Scientific Visualization Group. He earned his Ph.D. in computer science from the University of Münster in 2004 and completed his habilitation there in 2009.