By David Murphy
A novel KAUST-Stanford University research paper by Panos Achlioptas (Stanford), Ahmed Abdelreheem (KAUST), Fei Xia (Stanford), Mohamed Elhoseiny (KAUST) and Leonidas Guibas (Stanford) was accepted for presentation at the 16th European Conference on Computer Vision (ECCV 2020).
The multi-authored paper titled, “ReferIt3DNet: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes,” details research to design neural networks capable of comprehending spoken references to distinguish a specific 3D object (from multiple distractors of the same fine-grained object class) in a real-world setting.
“We are happy that the paper got accepted for oral presentation at ECCV 2020, especially as it only has a 2% acceptance rate,” noted Abdelreheem’s supervisor Mohamed Elhoseiny, KAUST Assistant Professor of Computer Science. “For the work to be accepted by the ECCV 2020 judging panel may reflect the level of importance that reviewers are affording to the research.”
The KAUST-Stanford study has found that architectures that promote object-to-object communication via graph neural networks outperform less context-aware alternatives. Elhoseiny believes the paper’s research represents a solution for existing bottlenecks regarding AI language-assisted 3D object-specific identification.
“Imagine in our 3D environment, aka real life, that I want to ask a robot to bring me a cup that is on a table closest to the door,” Elhoseiny explained. “To get closer to achieving this task, we need to advance artificial intelligence’s (AI) ability to identify and visually locate this specific cup from natural language, e.g., “‘cup on the table closest to the door.’” What we propose here is a function that gets us a step closer towards accomplishing this aim.”
“The work was collaboratively executed and supported by Stanford University and KAUST.
Ahmed Abdelreheem (a Ph.D. student at KAUST) has meaningfully contributed to the project by collecting the Sr3D, the synthetic part of the proposed dataset, and boosted the performance on natural data by 10%. He also put great effort into working with Panos Achlioptas (Stanford Ph.D. student) to develop baselines and the final best performing ReferIt3DNet model at KAUST,” Elhoseiny said.
“I fully believe we are proposing a future progression in the field with this innovative dataset/benchmark,” he added.
Due to the ongoing COVID-19 pandemic, the premier event in the field of computer vision will now be hosted online from August 23-28, 2020.