Sequential Nonparametric Anomaly Detection and Clustering

  • 18



Name of the Speaker: Sreeram C Sreenivasan (EE17D404)
Guide: Dr. Srikrishna B
Venue/Online meeting link:
Date/Time: 18-10-2022, 3:00 PM

In this dissertation, we study two problems in the field of nonparametric hypothesis testing – anomaly detection and clustering. A finite collection of S data streams is observed. Each data stream is an i.i.d sequence drawn from arbitrary, unknown probability distribution. In the anomaly detection problem, a small subset of the streams are drawn from a distribution q while the remaining streams are drawn from a distribution and our task is to identify the anomalous streams (q) using as few samples as possible while ensuring a certain accuracy. In the more general clustering problem, the distributions that generate the data streams themselves form clusters based on their proximity to each other based on some underlying distance metrics. Here, our objective is to design sequential tests to segregate the data streams into clusters using as few samples as possible while ensuring a certain accuracy. In our work, we propose universal sequential tests for both anomaly detection and clustering. The tests are universally exponentially consistent and stop in finite time almost surely. Moreover, we also characterize the rate of growth of the stopping time in the limit of vanishing error probability. Computer simulations show that the proposed tests outperform the existing fixed sample size tests for these problems in all cases.