Scientific Research and Essays

  • Abbreviation: Sci. Res. Essays
  • Language: English
  • ISSN: 1992-2248
  • DOI: 10.5897/SRE
  • Start Year: 2006
  • Published Articles: 2768

Full Length Research Paper

Compared application of the new OPLS-DA statistical model versus partial least squares regression to manage large numbers of variables in an injury case-control study

Homayoun Sadeghi-Bazargani1,2*, Shrikant I. Bangdiwala2,3, Kazem Mohammad4, Hemmat Maghsoudi5 and Reza Mohammadi2  
1Nueroscience Research Center, Statistics and Epidemiology Department, Faculty of Health and Nutrition, Tabriz University of medical sciences, Tabriz Iran. 2PHS Department, Karolinska Institute, Stockholm, Sweden. 3Department of Biostatistics, University of North Carolina at Chapel Hill, USA. 4Epidemiology and Biostatistics Department, Tehran University of Medical Sciences, Tehran, Iran. 5Injury Epidemiology and Prevention Research Center, Surgery Department, Tabriz University of medical sciences, Tabriz, Iran.  
Email: [email protected]

  •  Accepted: 07 July 2011
  •  Published: 19 September 2011

Abstract

The use of modern statistical methodology to overcome the known pitfalls of classical regression models in the analysis of large numbers of highly correlated data, has increased considerably in recent years. Statisticians in the field of chemometrics and OMICS research have developed a new method called Orthogonal projections to latent structures (OPLS). In comparison with the regular partial least squares (PLS) regression, OPLS provides a simpler method with the additional advantage that the orthogonal variation can be analyzed separately. Use of the OPLS model has spread to fields other than its origin but it is not yet applied to the field of epidemiology, which is a wide field of research. In public health and clinical research, there are situations in which large numbers of correlated variables need to be modeled. The authors successfully applied OPLS-DA to model large numbers of variables in a case-control study and compared it with discriminant analysis done by partial least squares regression. Prior to fitting the models, the dataset was split into two parts:  a training set and a prediction set. Models fitted on the training dataset were later tested for validity in the prediction dataset. The OPLS-DA was compared with PLS-DA for model fitness, diagnostics and model interpretability. Both models suited the data but OPLS-DA was preferable. The authors encourage the use of these methods to increase study power and statistical validity in epidemiology and similar settings in which large numbers of correlated variables need to be modeled.

 

Key words: Partial least squares regression, orthogonal projections to latent structures, logistic regression, multicollinearity, injury epidemiology, burns.