The practice of machine learning has been successfully developed these last decades with the design of many efficient algorithms (e.g. boosting methods, SVM, deep neural networks) for carrying out various tasks such as classification, regression or clustering. It is supported by a sound probabilistic theory, essentially relying on the theory of empirical processes, i.e. collections of independent and identically distributed averages. In the Big Data era, we are facing situations where the massive datasets contain geolocated, spatially dependent observations. In this context, the usual theory of statistical learning does not provide any theoretical guarantee of the generalization capacity of rules learnt from data. We consider here the simple kriging task, the flagship problem in geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations. The connection of this minimization problem with kernel ridge regression is highlighted, as well as the difficulties faced when trying to establish the generalization capacity of empirical risk minimizers. Particular attention is paid to a seminal example: the isotropic stationary Gaussian case. There, data collection is assumed to be performed at every point of a regular meshgrid, spanning the supposedly compact spatial domain $S$ (in-fill setup). In this specific context, nonasymptotic bounds are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer. Numerical experiments illustrate their validity and the role of each technical assumption. Though the latter may be restrictive, this result shows that simple kriging can be considered not only through the common parametric geostatistical modelling approach, but also from a predictive perspective, in a sound validity framework. This paves the way for further theoretical developments in statistical learning based on spatial data.