Journées de Géostatistique - Sciencesconf.org

Les Journées de Géostatistique 2021

16-17 sept. 2021 Fontainebleau (France)

sciencesconf.org:geostat21:364040

A Statistical Learning View of Simple Kriging. Connection with Kernel Ridge Regression

Emilia Siviero 1, @ , Emilie Chautru 2, @ , Stéphan Clémençon 3, 4, @

1 : Télécom Paris - Site web

LTCI, Télécom ParisTech

19 Place Marguerite Perey 91120 Palaiseau - France

2 : Mines ParisTech, centre de Géosciences, équipe Géostatistique

MINES ParisTech, PSL Research University

3 : Telecom Paris

Télécom ParisTech

4 : Laboratoire Traitement et Communication de l'Information [Paris] (LTCI) - Site web

Télécom ParisTech, CNRS : UMR5141

CNRS LTCI Télécom ParisTech 46 rue Barrault F-75634 Paris Cedex 13 - France

The practice of machine learning has been successfully developed these last decades with the design of many efficient algorithms (e.g. boosting methods, SVM, deep neural networks) for carrying out various tasks such as classification, regression or clustering. It is supported by a sound probabilistic theory, essentially relying on the theory of empirical processes, i.e. collections of independent and identically distributed averages. In the Big Data era, we are facing situations where the massive datasets contain geolocated, spatially dependent observations. In this context, the usual theory of statistical learning does not provide any theoretical guarantee of the generalization capacity of rules learnt from data. We consider here the simple kriging task, the flagship problem in geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations. The connection of this minimization problem with kernel ridge regression is highlighted, as well as the difficulties faced when trying to establish the generalization capacity of empirical risk minimizers. Particular attention is paid to a seminal example: the isotropic stationary Gaussian case. There, data collection is assumed to be performed at every point of a regular meshgrid, spanning the supposedly compact spatial domain $S$ (in-fill setup). In this specific context, nonasymptotic bounds are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer. Numerical experiments illustrate their validity and the role of each technical assumption. Though the latter may be restrictive, this result shows that simple kriging can be considered not only through the common parametric geostatistical modelling approach, but also from a predictive perspective, in a sound validity framework. This paves the way for further theoretical developments in statistical learning based on spatial data.

Type :	:	oral
Thématiques	:	Machine Learning
PDF version	:	PDF version

Personnes connectées : 1

Vie privée