Online Clustering of Data Sequences with Bandit Information
Abstract: We study the problem of online clustering of data sequences in the multi-armed bandit (MAB) framework under the fixed-confidence setting. There are $M$ arms, each providing i.i.d. samples from a parametric distribution whose parameters are unknown. The $M$ arms form $K$ clusters based on the distance between the true parameters. In the MAB setting, one arm can be sampled at each time. The objective is to estimate the clusters of the arms using as few samples as possible from the arms, subject to an upper bound on the error probability. Our setting allows for: arms within a cluster to have non-identical distributions, vector parameter arms, vector observations, and $K \le M$ clusters. We propose and analyze the Average Tracking Bandit Online Clustering (ATBOC) algorithm. For multivariate Gaussian distributed arms, we show that ATBOC is asymptotically order-optimal, i.e., the expected sample complexity grows at most twice as fast as that of lower bound in asymptotic regime ($\delta\rightarrow 0$). This upper bound on expected sample complexity is also valid for multivariate subgaussian arms. For single-parameter exponential family distributed arms, we show that ATBOC is asymptotically optimal, i.e., the expected sample complexity growth matches the lower bound in the asymptotic regime. We also propose a computationally more efficient alternatives Lower and Upper Confidence Bound based Bandit Online Clustering Algorithm (LUCBBOC), and Bandit Online Clustering-Elimination (BOC-ELIM). We derive the computational complexity of all the proposed algorithms and also compare the per-sample run time through simulations. The LUCBBOC and BOC-ELIM algorithms require lesser per-sample run time compared to ATBOC and achieve comparable performance. All the proposed algorithms are $\delta$-Probably correct, i.e., the error probability of cluster estimate at the stopping time is upper bounded by $\delta$. We validate the asymptotic optimality guarantees through simulations, and present the comparison of our proposed algorithms with other related work through simulations on both synthetic and real-world datasets.
Event Details
Title: Online Clustering of Data Sequences with Bandit Information
Date: April 16, 2026 at 04:00 PM
Venue: ESB 234 (Malaviya Hall)
Speaker: Mr. Dinesh Chandran (EE22D200)
Guide: Dr. Srikrishna Bhashyam
Type: PHD seminar