Inserting lightweight optimization code in high-speed network devices has enabled a KAUST-led collaboration to increase the speed of machine learning on parallelized computing systems five-fold.
This "in-network aggregation" technology, developed with researchers and systems architects at Intel, Microsoft and the University of Washington, can provide dramatic speed improvements using readily available programmable network hardware.
The fundamental benefit of artificial intelligence (AI) that gives it so much power to "understand" and interact with the world is the machine-learning step, in which the model is trained using large sets of labeled training data. The more data the AI is trained on, the better the model is likely to perform when exposed to new inputs.
The recent burst of AI applications is largely due to better machine learning and the use of larger models and more diverse datasets. Performing the machine-learning computations, however, is an enormously taxing task that increasingly relies on large arrays of computers running the learning algorithm in parallel.
“How to train deep-learning models at a large scale is a very challenging problem,” says Marco Canini from the KAUST research team. “The AI models can consist of billions of parameters, and we can use hundreds of processors that need to work efficiently in parallel. In such systems, communication among processors during incremental model updates easily becomes a major performance bottleneck.”
Read the full article