African Journal of
Agricultural Research

  • Abbreviation: Afr. J. Agric. Res.
  • Language: English
  • ISSN: 1991-637X
  • DOI: 10.5897/AJAR
  • Start Year: 2006
  • Published Articles: 6576

Full Length Research Paper

Optimal sample size and data arrangement method in estimating correlation matrices with lesser collinearity: A statistical focus in maize breeding

Tiago Olivoto
  • Tiago Olivoto
  • Department of Agronomic and Environmental Sciences, Federal University of Santa Maria Frederico Westphalen, Rio Grande do Sul, Brazil.
  • Google Scholar
Maicon Nardino
  • Maicon Nardino
  • Department of Mathematics and Statistics, Federal University of Pelotas, Capão do Leão, Rio Grande do Sul, Brazil.
  • Google Scholar
Ivan Ricardo Carvalho
  • Ivan Ricardo Carvalho
  • Plant Genomics and Breeding Center, Federal University of Pelotas, Capão do Leão, Rio Grande do Sul, Brazil.
  • Google Scholar
Diego Nicolau Follmann
  • Diego Nicolau Follmann
  • Agronomy Department, Federal University of Santa Maria, Santa Maria, Rio Grande do Sul, Brazil.
  • Google Scholar
Mauricio Ferrari
  • Mauricio Ferrari
  • Plant Genomics and Breeding Center, Federal University of Pelotas, Capão do Leão, Rio Grande do Sul, Brazil.
  • Google Scholar
Alan Junior de Pelegrin
  • Alan Junior de Pelegrin
  • Plant Genomics and Breeding Center, Federal University of Pelotas, Capão do Leão, Rio Grande do Sul, Brazil.
  • Google Scholar
Vinicius Jardel Szareski
  • Vinicius Jardel Szareski
  • Department of Crop Science, Federal University of Pelotas, Capão do Leão, Rio Grande do Sul, Brazil.
  • Google Scholar
Antônio Costa de Oliveira
  • Antônio Costa de Oliveira
  • Plant Genomics and Breeding Center, Federal University of Pelotas, Capão do Leão, Rio Grande do Sul, Brazil.
  • Google Scholar
Braulio Otomar Caron
  • Braulio Otomar Caron
  • Department of Agronomic and Environmental Sciences, Federal University of Santa Maria Frederico Westphalen, Rio Grande do Sul, Brazil.
  • Google Scholar
Velci Queiróz de Souza
  • Velci Queiróz de Souza
  • Federal University of Pampa, Dom Pedrito, Rio Grande do Sul, Brazil.
  • Google Scholar


  •  Received: 06 October 2016
  •  Accepted: 14 December 2016
  •  Published: 12 January 2017

Abstract

Information about data arrangement methodologies and optimal sample size in estimating the Pearson correlation coefficient (r) among maize traits are still limited. Furthermore, some data arrangement methodologies currently used may be increasing multicollinearity in multiple regression analysis. This study aimed to investigate the statistical behavior of the r and the multicollinearity of correlation matrices among maize traits in different data arrangement scenarios and different sample sizes. Data from 45 treatments [15 simple maize hybrids (Zea mays L.) conducted in three locations] were used. Eleven traits were accessed and three datasets (scenarios) were formed: (1) Coming from all the sampled observations (plants), n = 900; (2) Coming from the average of five plants per plot, n = 180; and (3) Coming from the average of treatments, n = 45. A thousand estimates of r were held in each scenario to 60 sample sizes by bootstrap simulations with replacement. Confidence intervals (CI) were estimated. One hundred eighty correlation matrices were estimated and the condition number (CN) calculated. Data coming from average values of plots and average values of treatments overestimates the r up to 24 and 34%, resulting in an increase of 24 and 131% in the matrices’ CN. Trait pairs with high r require a smaller number of plants, being the CI inversely proportional to the magnitude of the r. Two hundred and ten plants are sufficient to estimate the r in the CI of 95% < 0.30.

Key words: Average values, bootstrap, confidence intervals, sample tracking, Zea mays L.

Abbreviation

ASO, all sampled observations; AVP, average values of plot;  AVT,  average   values    of   treatments;   CD,   cob diameter; CD/ED, cob diameter/ear diameter ratio; CL, cob length; ED, ear diameter; EH, ear height; EL, ear length; NKR, number of kernels per row; NRE, number of rows per ear; PH, plant height; TKW, thousand-kernel weight; TNK, total number of kernels per ear.