CEMSE Big Data Open Day shows off fascinating discoveries

1 min read ·

The fifth-anniversary event marked CEMSE's biggest successes and featured 20 KAUST discoveries, with more than 1,000 visitors from the community attending the event."Big data has many applications: to understand medicine better; to manage food supplies, and to connect objects. Data is at the center of everything," said Dean Mootaz Elnozahy of the University's Computer, Electrical, Mathematical Science and Engineering (CEMSE) Division at the CEMSE Big Data Open Day held on December 4, 2016.

About

-By Valentina De Vincenti

"Big data has many applications: to understand medicine better; to manage food supplies and to connect objects. Data is at the center of everything," said Dean Mootaz Elnozahy of the University's Computer, Electrical, Mathematical Science and Engineering (CEMSE) Division at the CEMSE Big Data Open Day held on December 4, 2016.
 
The fifth-anniversary event marked CEMSE's biggest successes and featured 20 KAUST discoveries, with more than 1,000 visitors from the community attending the event. Upon entering, the Walk of Quotes on Big Data exhibited a collection of quotes relating to the theme. These included a quote by Jake Porway, founder, and CEO at DataKind, who said, "Data is like a bucket of crude oil - potentially great, but only if someone knows how to refine it and someone else has vehicles that will run on it."

The starting year for the digital fuel revolution was 2004, as the Big Data Timeline highlighted. Global data grew thanks exponentially to the web 2.0 and social media, with users able to generate, share and store data.  This time period was also when data mining science developed the Apache Hadoop, Elastic Share and Mangoop systems, making it possible to identify patterns, establish relationships among huge datasets, find images and videos and even unravel problems and facts of which people were completely unaware.

In 2012, the challenge of mining Big Data developed even further and a new era began, featuring the internet of things (IoT). Electronic devices, objects, animals and people can now be tagged with sensors that can collect and transfer data over a network without requiring human-to-human or human-to-computer interactions.
 

Some of the event's features include:

Big data and supercomputing: if data were grains of sand, the desert is the data ensemble for bioinformatics, machine learning, and artificial intelligence, to mention but a few of the data sciences requiring computers differently proportioned in their architecture, allocation and usage strategies.

 

Three stands showcased the latest toolkit for data scientists. KAUST Professor David Keyes introduced visitors to the realm of the computer facilities at KAUST that are capable of dealing with such a huge amount of digital datasets. These devices were also at the core of the Foundation of Big Data Processing stand, where KAUST Professor Panagiotis Kalnis demonstrated four software to process over-voluminous datasets using large computers such as Shaheen II.

Finally, a trail of the ultimate search engine for agile data-focused literary reviews appeared at the Data Retrieval System (DRS) stand hosted by KAUST Professor Xiangliang Zhang. This so-called "Google for data scientists" ranks relevant, cited and available datasets in less than two to three seconds. If a document analysis is needed instead, the DRS allows researchers to visualize and predict dataset citation network relationships. 

Big Data and the Environment: The Statistics Lab , along with the University's Biological and Environmental Science and Engineering (BESE) Division, presented new frontiers in modeling extreme natural events and environmentally friendly alternative biotechnology.
 
KAUST Professor Marc Genton exhibited a statistical emulator for big climate data. This is an emulator for 3-D global spatiotemporal temperature simulation and big climate data compression. Based on a four stages algorithm, in less than 48 hours the model reduces and simultaneously reorganizes an ensemble of over 1 billion geophysical data points with a 98 percent cut in storage space. 

At the Statistics for Big Data: Natural Hazard stand, KAUST Professor Raphael Huser presented two models for extreme precipitation and landslide susceptibility applied to big and heterogeneous space-time datasets used to provide risk maps at fine spatial resolution. An insight into Saudi Arabia's precipitation was also showcased.  

"In the aftermath of the Japanese tsunami, a local stakeholder announced that casualties would have been avoided if only reliable prediction tools existed. That was the big revelation for our project," said Professor Ying Sun. Sun developed a realistic statistical model to simulate the possible ranges of the likelihood of tsunamis traveling ashore and quantified the uncertainties for hazard assessment, timely warnings, and effective responses.

 

The Virtual Red Sea and Climate Change, Coral Reef Fish and Big Data exhibit targeted integrated modeling systems on the Red Sea ecosystem. The former, presented by KAUST Professor Ibrahim Hoteit and co-worker Omar Knio, focused on climate and circulation modeling to forecast large-scale variability of the marine ecosystem. KAUST Professor Timothy Ravasi talked visitors through the molecular responses of colorful fish brains to increasing levels of CO2 and the expression of parental tolerance to high CO2 in the offsprings' molecular phenotype.

The Red Sea was also the open-air lab for alternative biotechnologies. New sustainable carbon-neutral technologies against greenhouse emissions appeared in An ocean of seaweeds by KAUST Professor John Archer. The Saudi peninsula's morphology and climate features make it an excellent location for microalgal biotechnology to manufacture biomass-derived products such as metabolites, chemicals, materials, biomass feedstocks and eventually fuels.
 

 

Big data and identity - What data-R you?

Big data also had a fun side with the Egg-citing Big Data Game by KAUST Professor Takashi Gojobori explained to children the fascinating adventure of genetic big data for individual identification. DNA sequences were used as barcodes to profile species, allowing the identification of any organism without further morphological or physiological characterization. 
 
Focusing on mankind's biometric identification instead, Faces and Voices by Velasquez Tobar showcased a photographic series based on voice and face data-driven manipulation.