Model selection and estimator selection for statistical learning
Tutorial given at the Scuola Normale Superiore di Pisa in February 2011. Since this tutorial mostly comes from the "Cours Peccot" lectures I gave in January 2011 at College de France (Paris), you can also have a look to my lecture notes for the Cours Peccot (in French).
- First lecture: Statistical learning [slides]
- the statistical learning learning problem
- examples: prediction, regression, classification, density estimation
- estimators: definition, consistency, examples
- universal learning rates and No Free Lunch Theorems [1]
- the estimator selection paradigm, bias-variance decomposition of the risk
- data-driven selection procedures and the unbiased risk estimation principle
- Second lecture: Model selection for least-squares regression [slides]
- ideal penalty, Mallows' Cp
- oracle inequality for Cp (i.e., non-asymptotic optimality of the corresponding model selection procedure), corresponding learning rates [2]
- the variance estimation problem
- minimal penalties and data-driven calibration of penalties: the slope heuristics [3,4,13,14,15]
- algorithmic and other practical issues [5]
- Third lecture: Linear estimator selection for least-squares regression [6] [slides]
- linear estimators: (kernel) ridge regression, smoothing splines, k-nearest neighbours, Nadaraya-Watson estimators
- bias-variance decomposition of the risk
- the linear estimator selection problem: CL penalty
- oracle inequality for CL
- data-driven calibration of penalties: a new light on the slope heuristics
- Fourth lecture: Resampling and model selection [slides]
- regressograms in heteroscedastic regression: the penalty cannot be a function of the dimensionality of the models [7]
- resampling in statistics: general heuristics, the bootstrap, exchangeable weighted bootstrap [8]
- study of a case example: estimating the variance by resampling
- resampling penalties: why do they work for heteroscedastic regression? oracle-inequality. comparison of the resampling weights [9,13]
- Fifth lecture: Cross-validation and model/estimator selection [10] [slides]
- cross-validation: principle, main examples
- cross-validation for estimating of the prediction risk: bias, variance
- cross-validation for selecting among a family of estimators: main properties, how should the splits be chosen?
- illustration of the robustness of cross-validation: detecting changes in the mean of a signal with unknown and non-constant variance [11]
- correcting the bias of cross-validation: V-fold penalization. Oracle-inequality. [12]
References:
[1] Luc Devroye, Laszlo Gyorfi, and Gabor Lugosi. A probabilistic theory of pattern recognition, volume 31 of
Applications of Mathematics (New York). Springer-Verlag, New York, 1996.
[2] Pascal Massart. Concentration Inequalities and Model Selection, volume 1896 of Lecture Notes in Mathematics.
Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour,
July 6-23, 2003.
[3] Lucien Birgé and Pascal Massart. Minimal penalties for Gaussian model selection. Probab. Theory Related Fields, 138(1-2):33-73, 2007.
[4] Sylvain Arlot and Pascal Massart. Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res., 10:245-279 (electronic), 2009.
[5] Jean-Patrick Baudry, Cathy Maugis and Bertrand Michel. Slope Heuristics : Overview and Implementation.
Technical Report 7223, INRIA, 2010.
[6] Sylvain Arlot and Francis Bach. Data-driven calibration of linear estimators with minimal penalties. Proceedings of NIPS 2009.
[7] Sylvain Arlot. Choosing a penalty for model selection in heteroscedastic regression. Preprint. 2010.
[8] Bradley Efron and Robert J. Tibshirani. An Introduction to the Bootstrap, volume 57 of Monographs on
Statistics and Applied Probability. Chapman and Hall, New York, 1993.
[9] Sylvain Arlot. Model selection by resampling penalization. Electron. J. Stat., 3:557-624 (electronic), 2009.
[10] Sylvain Arlot and Alain Celisse. A survey of cross-validation procedures for model selection. Statist. Surv., 4:40-79, 2010.
[11] Sylvain Arlot and Alain Celisse. Segmentation of the mean of heteroscedastic data via cross-validation. Statistics and Computing, 2010.
[12] Sylvain Arlot. V-fold cross-validation improved: V-fold penalization. arXiv:0802.0566v2.
[13] Matthieu Lerasle. Optimal model selection in density estimation. hal-00422655, 2009.
[14] Adrien Saumard. Nonasymptotic quasi-optimality of AIC and the slope heuristics in maximum likelihood estimation of density using histogram models. hal-00512310, 2010.
[15] Adrien Saumard. The slope heuristics in heteroscedastic regression. hal-00512306, 2010.
Abstract
Prediction is among the major problems in statistical learning. Given a sequence of examples of pairs of random variables (X_i,Y_i), i=1..n, the goal is to "predict" from X only (the explanatory variables) the value of Y (the interest variable).
Many estimators have been proposed for prediction, and each estimator usually depends itself on one or several parameters, whose calibration crucial for optimizing the statistical performance.
These lectures will address the problem of data-driven estimator selection, which includes the calibration problem as well as model selection (e.g., which variables in X are the most useful for predicting Y ?).
Two main approaches will be considered: penalization of the empirical risk (with deterministic or with data-driven penalties), and (V-fold) cross-validation.
We will focus on two main kinds of questions:
Which theoretical results can be proved for these selection procedures, and how these results can help practicioners to choose a selection procedure for a given statistical problem ?
How can theory help to design new selection procedures that improve existing ones ?
Retour à l'index - Back to index