SEMINAR: Detecting Anomalies and Motifs in Environmental Data Streams

M.Sc. Thesis Proposal
Supervisor: Dr. Antonina Kolokolova

Detecting Anomalies and Motifs in Environmental Data Streams

Department of Computer Science
Thursday, October 16, 2014, 1:00 p.m., Room EN 2022


Massive amount of environmental data about ocean current, weather patterns or climate changes is collected through remote sensing. These data arrive so fast that it is impossible to store all of it. Moreover, even if we could store it all, we may not have the time to scan it before making judgements. This is a new kind of setting in computing: processing a stream of data as opposed to static, multiple-access data. Data streams are temporally ordered, fast changing, massive, and potentially infinite. The data flows in and out of an observation platform continuously and with varying update rates.

Anomaly detection refers to the problem of finding patterns in data that deviate significantly from expected behavior. On the other hand, Motifs are previously unknown frequently occurring patterns in time-series. They are used as a tool for visualizing and summarizing massive time-series databases. By detecting Anomalies and Motifs in such data can provide significant insights about hidden patterns which may have caused such events.

Here, in collaboration with a local company named EMSAT Corporation, we are looking at the data streams generated from real-time environment monitoring system. Specifically, the time-series data from in-situ sensors contains spikes, sharp local increases or decreases in the measured value due to unexpected event. Presently, the spikes are removed manually by the experts who are trained to differentiate between the visual appearance spikes due to noise and the variations in the data reflecting actual events. However, for the real-time monitoring tasks such spikes need to be eliminated by an automated process. Within this collaboration, we are planning to develop a method for detecting spikes. Moreover, we would like to extend the method for detecting significant features of the streams in future.