Mining Large Data Series Collections: Motif, Discord and Anomaly Discovery

Event Start
Event End
Location
Building 3, Level 5, Room 5209 (sea-side)
Speaker: Michele Linardi, Ph.D. candidate at Paris Descartes University

Abstract

In the last fifteen years, data series motif and discord discovery have emerged as two useful primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provide the relative length. Yet, in several cases, the choice of length is critical and unforgiving. In this presentation, we will introduce a new framework, which provides an exact and scalable motif and discord discovery algorithm that efficiently find all motifs and discords in a given range of lengths. We have evaluated our approach with five diverse real datasets, and demonstrate that it is up to 20 times faster than the state-of-the-art. Our results also show that removing the unrealistic assumption that the user knows the correct length, can often produce more intuitive and actionable results, which could have otherwise been missed.

Bio

Michele Linardi is a Ph.D. candidate at Paris Descartes University (LIPADE lab – diNo group), under the supervision of Professor Themis Palpanas. Michele’s main research interests span the Data Series Management and Analytics area, with specific focus on Indexing and Sequential Scan techniques for fast and exact Similarity Search queries. This operation serves various well-known algorithms such as: Motif Discovery, Outliers Detection, Classification and Clustering in large collections of Data Series (a.k.a Time Series). He received his B.Sc. and M.Sc. degree in Computer Science at University of Trento (Italy), in 2012 and 2014, respectively.

Federico Roncallo is a Research Engineer working at Paris Descartes University (LIPADE lab – diNo group) in collaboration with Safran, under the supervision of Professor Themis Palpanas. Currently, Federico's work is related mainly in data series anomaly detection and analysis with a special focus on data series produced by sensor recording vibrations in rotating machines. He received his B.Sc. and M.Sc. degree in Computer Science at University of Genoa (Italy), in 2015 and 2018, respectively.

Contact Person