Seminar: Detecting Anomalies from Evolving Data Stream

Abdullah-Al-Mamun
M.Sc. Candidate
Supervisor: Dr. Antonina Kolokolova

Detecting Anomalies from Evolving Data Stream

Department of Computer Science
Monday, June 15, 2015, 1:00 pm, Room EN 2022


Abstract

This thesis stems from the project with real-time environmental monitoring company EMSAT Corporation. They were looking for methods to automatically flag spikes and other anomalies in their data streams. Their framework presented several challenges: first, they wanted real-time (or as close as possible to real-time) anomaly detection. Second, there was no labeled data, rendering supervised and semi-supervised machine learning methods less applicable. Third, the properties of their data were changing over time, due to environmental events as well as sensor failures; there was a need to detect such changes and incorporate them into anomaly detection.

In this project, we have applied several statistical techniques within a sliding window based framework for anomaly detection. We have explored both the Parametric-based approach using Gaussian-based model as well as the Nonparametric approach called the Kernel DensityEstimation (KDE). However, these techniques assume that all the data comes from the same, unchanged, distribution.

The main contribution of this thesis is extending statistical anomaly detection methods to work for evolving data streams, in particular in presence of the concept drift. To address that, we have developed a framework for integrating Adaptive Windowing (ADWIN) change detection algorithm with the non-parametric methods above. We have implemented and tested this approach on several real world data sets and received positive feedback from our industry collaborator.