Drop the zeros for a performance boost

An eight-fold speed up of deep machine learning can be achieved by skipping the transmission of zero values.

About

KAUST researchers have found a way to significantly increase the speed of training. Large machine learning models can be trained significantly faster by observing how frequently zero results are produced in distributed machine learning that use large training datasets.

AI models develop their “intelligence” by being trained on datasets that have been labelled to tell the model how to differentiate between different inputs and then respond accordingly. The more labelled data that goes in, the better the model becomes at performing whatever task it has been assigned to do. For complex deep learning applications, such as self-driving vehicles, this requires enormous input datasets and very long training times, even when using powerful and expensive highly parallel supercomputing platforms.

During training, small learning tasks are assigned to tens or hundreds of computing nodes, which then share their results over a communications network before running the next task. One of the biggest sources of computing overhead in such parallel computing tasks is actually this communication among computing nodes at each model step. 

“Communication is a major performance bottleneck in distributed deep learning,” explains Jiawei Fei from the KAUST team. “Along with the fast-paced increase in model size, we also see an increase in the proportion of zero values that are produced during the learning process, which we call sparsity. Our idea was to exploit this sparsity to maximize effective bandwidth usage by sending only non-zero data blocks.”

Read the full article