GT Celeste
Maximilien Dreveton - GdT Céleste
juin 2024
Intervenant : Maximilien Dreveton
Institution : EPFL
Heure : 15h30 - 16h30
Lieu : LMO Orsay salle 3L8

Title: Minimax Rates of Clustering Mixture Models and Stochastic Block Models


Clustering and community detection are key unsupervised machine learning tasks. Clustering groups data points in Euclidean space, while community detection identifies groups within graph vertices. Clustering performance is typically evaluated using mixture models, and community detection using stochastic block models (SBM).

The minimax error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models depends on signal-to-noise ratios, while for homogeneous SBMs, it involves Renyi divergences. In this talk, we connect these results, showing that the error rate for clustering any mixture model can be expressed with Chernoff information. Additionally, the rate for recovering blocks in an SBM can be reduced to a binary hypothesis test for the distribution of a random vector with independently distributed components, highlighting the role of Renyi divergence in homogeneous models and Chernoff-Hellinger divergence in inhomogeneous models. This offers a unified perspective on the error rates in clustering and community detection. 

