Image segmentation used to require large, compute-intensive neural networks.  Running deep learning models without connecting to cloud or GPU servers was a huge challenge. Now, researchers at DarwinAI and the University of Waterloo have designed a new neural network architecture called ‘AttendSeg,’ that can perform image segmentation on edge or low computational devices. 

Image segmentation has a vital role to play in computer vision advancements. The goal of segmentation is to simplify or change the representation of an image into something more meaningful and easier to analyse. The use cases include self-driving vehicles, video surveillance, traffic control systems, etc.

In a paper titled ‘AttendSeg: A Tiny Attention Condenser Neural Network for Semantic Segmentation on the Edge,’ co-authored by Xiaoyu Wen, Mahmoud Famouri, Andrew Hryniowski and Alexander Wong, the researchers claimed AttendSeg could achieve segmentation accuracy comparable to much larger deep neural networks with greater complexity while possessing a significantly lower architecture, thereby making it well-suited for TinyML applications on the edge devices.  

AttendSeg architecture 

AttendSeg is a self-attention network architecture consisting of lightweight attention condensers for improved spatial-channel selective attention at very low complexity.

In layman terms, a self-attention method or attention condensers allows the inputs to interact with each other (self) and determine who they should be paying more attention to (attention). The output/result includes a mix of these interactions and attention scores.  

AttendSeg architecture (Source: arxiv.org)

The network architecture of AttendSeg is made up of unique properties (as shown above), including a heterogeneous mix of lightweight attention condensers, depthwise convolutions, and pointwise convolutions with micro-architecture designs, thereby maintaining a strong balance between efficiency and representational power. 

AttendSeg exhibits selective long-range connectivity where only select deeper layers are refined based on earlier layers, thus improving architectural efficiency by only refining at scales that benefit from it. One can observe very aggressive dimensionality reduction via convolutions with large strides, which reduces the complexity while preserving representational capacity. 

Interestingly, these properties leverage both machine-driven design exploration and attention condensers to produce highly compact network architectures custom-built for edge scenarios. 

Outcome

The researchers have experimented the new architecture using the Cambridge Driving Labeled Video Database (CamVid), where they successfully checked the efficacy of AttendSeg for on-device semantic segmentation on edge. 

CamVid is a dataset introduced for evaluating semantic segmentation performance with 32 different semantic classes. All experiments were conducted at 512×512 resolution in TensorFlow, where the results for ResNet-101 RefineNet [25] and EdgeSegNet [26], a SOTA efficiency deep semantic segmentation network are also presented. 

Comparison of AttendSeg based on 8-bit weights with other tested networks based on 32-bit weights. (Source: arxiv.org)

The table clearly shows that AttendSeg achieved similar accuracy as ResNet-101 RefineNet and higher than EdgeSegNet, while having fewer parameters than other networks. Also, because of the low-precision nature of AttendSg, its weight memory requirements are lower than RefineNet and EdgeSegNet. More than anything, AttendSeg achieves greater computational efficiency in terms of multiply-accumulate (MAC) operations compared to others. 

The post New Innovations In Image Segmentation For Edge devices appeared first on Analytics India Magazine.