Sammendrag
The Partial Least Squares algorithm has through a long line of applications proven to be a good alternative for constructing predictors. The algorithm was developed in a non-statistical area of research (Wold, 1975, 1982), and the statistical properties of the regular sample PLS estimator in linear regression (e.g. Martens and Naes, 1989) have been hard to derive. However, a better understanding of its properties has been gained through studies of the population PLS model (e.g. Helland, 1990; Helland and Almøy, 1994). For instance, the population model has been shown to stop automatically after all the so-called relevant components have been extracted. The sample PLS algorithm can be regarded as a plug-in procedure where population parameters are replaced by sample estimates. Theoretical arguments have been given that the sample PLS solution cannot be optimal for prediction since this estimator falls outside the parameter space of the theoretical PLS algorithm (Helland, 2001). We therefore aim at developing a near optimal predictor from the population PLS model, as an alternative to the sample PLS predictor. It may be shown that the best equivariant estimator under certain symmetry assumptions is the Bayes estimator with certain invariant priors. The Bayes estimator is found using Markov Chain Monte Carlo methods and the method is illustrated on small simulated data sets. The preliminary results show that the Bayes PLS predictor performs at least at the level of the sample PLS predictor, but it is much less dependent on the choice of number of latent components. This is a huge advantage over, not only the sample PLS, but also principal component regression and Ridge regression, for which the prediction performance heavily depend on tuning parameters (component number/ridge penalty). Also, the number of components needed for good prediction appears in general to be lower for the Bayes PLS than the sample PLS, and PCR. Another benefit from Bayes PLS is the fact that it provides an estimate of the error variance for linear models even for cases when the number of variables is larger than the number of samples. This is not readily available for the sample PLS estimator.
Vis fullstendig beskrivelse