
Design of Neuromorphic Object Detection Systems
B3 L5 R5209
This dissertation advances event-based object detection by developing efficient deep learning frameworks such as ReYOLOv8 and Chimera, introducing novel encoding and augmentation techniques, releasing a new neuromorphic dataset, and demonstrating a real-time, low-power traffic monitoring system that highlights the practical potential of bio-inspired vision systems.
Overview
Object detection is a crucial activity in various cutting-edge applications, such as autonomous vehicles and advanced robotics systems, which currently primarily rely on data from conventional frame-based RGB sensors. However, these sensors often struggle with issues like motion blur and poor performance in challenging lighting conditions. In response to these challenges, event-based cameras have emerged as an innovative paradigm. Inspired by some aspects of biological vision systems, which react to changes in brightness instead of light absolute values, these cameras perform better in environments with fast motion and extreme lighting conditions while consuming less power.
Several models were proposed with the purpose of utilizing Event-Based cam- eras in the traditional vision tasks, like Object Recognition and Segmentation. In this sense, alternatives that adapted frame-based Deep Learning models to the event domain were growing in popularity, with good accuracy in real-time performance being reported. Following this approach, the first contribution in this work was Recurrent YOLOv8 (ReYOLOv8), an advanced object detection framework that improves a high-performing frame-based detection system with spatiotemporal modeling capabilities by combining it with Recurrent Neural Net- work (RNN)s. To take advantage of the event’s inherent sparsity, a low-latency, memory-efficient method for encoding event data was proposed. Moreover, to improve the training of Deep Learning models adopting this modality, a novel data augmentation technique tailored to leverage the unique attributes of event data was also proposed. Evaluation using Prophesee’s Generation 1 (GEN1) and Person Detection for Robotics (PeDRo) datasets showed that ReYOLOv8 outperformed all comparable approaches. On GEN1, the models achieved a mean Average Precision (mAP) with average improvements of 5%, 2.8%, and 2.5% across nano, small, and medium scales while reducing trainable parameters by 4.43% and maintaining processing speeds between 9.2ms and 15.5ms, which is suitable for real-world requirements. On PeDRo, mAP improvements of 9.0%, 11.8%, and 17.9% were reported, with the models being, on average, 8.9x smaller than the previous reference while presenting an average 1.67x speed enhancement.
In order to extend the modelling capabilities beyond that of ReYOLOv8, a Block-Based Neural Architecture Search (NAS) framework designed explicitly for event-based object detection, called Chimera, was introduced. While ReYOLOv8 demonstrated the effectiveness of deep learning for event data processing, Chimera expands it by deploying a systematic exploration of the space of neural blocks to maximize this potential. The Chimera design space incorporates various macro blocks, including self-attention, convolutions, State Space Models, and MLP-mixer-based architectures, providing valuable trade-offs between local and global processing capabilities. Further experiments on the GEN1 dataset achieved top-tier mAP results while reducing the parameters by 1.6x and delivering a 2.1x speed-up compared to the current best-performing approaches.
A dataset for Object Detection, the KAUST-Neuromorphic Traffic Scenes (KAUST-NTS), is also presented. KAUST-NTS is the first neuromorphic-based dataset to be recorded in the Middle East. It features 90 sequences of 60 seconds each, with around 60k objects, consisting of pedestrians, cars, buses, and two-wheelers. Compared to the alternative datasets, it is the most diverse short-scaled option. Finally, a case study including a fully neuromorphic object detection system for traffic monitoring applications is presented. Being implemented on dedicated neuromorphic hardware, it was able to perform the detection task with a power consumption of 66mW and real-time operation, highlighting the potential of such technology.
This research demonstrates the significant potential of bio-inspired event-based vision sensors in advanced object detection applications. The developed frameworks, ReYOLOv8 and Chimera, provide efficient pipelines for processing event data while meeting real-time constraints. To enhance processing efficiency, Volume of Ternary Event Images (VTEI) offers a lightweight encoding method that preserves the inherent sparsity of event data. Additionally, Random Polarity Suppression (RPS) introduces novel data augmentation techniques specifically designed for event-based detection, leveraging the unique properties of this modality to improve model performance. The King Abdullah University of Science and Technology-Neuromorphic Traffic Scenes (KAUST-NTS) dataset contributes to the field by providing diverse traffic scenarios from the Middle East, enabling better model generalization. Finally, the implementation of a fully neuromorphic system for traffic monitoring achieved real-time operation with only 66mW power consumption, demonstrating the practical viability of this technology for energy-efficient, real-world applications.
Presenters
Brief Biography
Diego Augusto Silva is a Ph.D. candidate at the Communications and Computing Systems Laboratory (CCSL) within the Computer, Electrical, and Mathematical Sciences and Engineering Division (CEMSE) at King Abdullah University of Science and Technology (KAUST), Saudi Arabia, where his research focuses on event-based vision, object recognition, deep learning, and neuromorphic computing. He earned his Bachelor's degree in Electrical Engineering from the Federal University of São João del-Rei (UFSJ), Brazil, in 2018, followed by a Master of Science degree in Electronic Engineering and Computing from the Technological Institute of Aeronautics (ITA), Brazil, in 2020, with specialization in Electronic Devices and Systems. His technical expertise encompasses both theoretical research and practical implementation, including substantial experience in FPGA prototyping and asynchronous logic design, and his innovative work has been recognized through two hackathon awards in his research domains, demonstrating his ability to translate complex neuromorphic computing and event-based vision concepts into practical solutions.