Querying big data just got universal
A universal query engine for big data that works across computing platforms could accelerate analytics research.
About
To solve one of the key obstacles in big-data science, KAUST researchers have created a framework for searching very large datasets that runs easily on different computing architectures. Their achievement allows researchers to concentrate on advancing the search engine, or query engine, itself rather than on painstakingly coding for specific computing platforms.
Big data is one of the most promising yet challenging aspects of today’s information-heavy world. While the huge and ever-expanding sets of information, such as online-collected data or genetic information, could hold powerful insights for science and humanity, processing and interrogating all this data require highly sophisticated techniques.
Many different approaches to querying big data have been explored. But one of the most powerful and computationally effective is based on analyzing data with a subject-predicate-object triplestore structure of the form (e.g., apple, is a, fruit). This structure lends itself to being treated like a graph with edges and vertices, and this characteristic has been used to code query engines for specific computing architectures for maximum efficiency. However, such architecture-specific approaches cannot be readily ported to different platforms, limiting the opportunities for innovation and advancement in analytics.
“Modern computing systems provide diverse platforms and accelerators, and programming them can be intimidating and time consuming,” say Fuad Jamour and Yanzhao Chen, Ph.D. candidates in Panos Kalnis’s group in KAUST’s Extreme Computing Research Center. “Our research group focuses on building systems and algorithms for processing and analyzing very large datasets. This research addresses the desire to write a program once and then use it across different platforms.”
Rather than the previously used graph-traversal or exhaustive relational-indexing approaches, the group queried triplestore data by using an applied mathematical approach called sparse-matrix algebra.
Read the full article