Teaching algorithms to see

The University's Image and Video Understanding Lab develops machine learning algorithms for computer vision and object tracking. Photo by Sarah Munshi.

Four hundred hours of video are uploaded to YouTube every minute. Other than using text to search the title, description or tags associated with a video, video content search is limited. In addition to helping users find content more quickly and accurately, the ability to search video content is of paramount importance to video platforms and advertisers.

Last year, YouTube was embroiled in controversy after rolling video ads for Coca-Cola and Amazon, among others, were shown before racist and extremist content. Advertising content is a major revenue stream for video platforms like YouTube, which comes under threat when an advertiser's content is associated with irrelevant and sometimes nefarious content.

Bernard Ghanem, KAUST associate professor of electrical engineering in the University's Visual Computing Center and principal investigator of the Image and Video Understanding Lab, is developing machine learning techniques that include computer vision for automated navigation, object tracking and other areas. When it comes to making video content searchable, Ghanem and his team, including Ph.D. students Fabian Caba Heilbron, Victor Escorcia and Humam Alwassel, have a solution.

The group has developed machine learning algorithms that can learn to detect specific types of activity within a video without relying on metadata.

"Our algorithms can detect—based on what inputs have been set—what and where a specific activity happens. This can help video platforms with detecting unwanted content [and] also help advertisers deliver more relevant and timely content. For example, a company like Nike can localize its ad within a part of the video that relates to its content, such as running or exercise," explained Ghanem.

The algorithm can also be used in other applications, such as in surveillance. Once an activity parameter has been defined, the algorithm can learn to trawl through hours of video for the relevant moments, automating the task and performing it more quickly.

The team's work has significantly contributed to the University's top 10 ranking in computer vision and computer graphics, according to CSRankings.

Read the full article