Strategies for selecting soybean genotypes using mixed models and multivariate approach

The objective of this study was to select soybean genotypes derived from crosses between conventional and transgenic lines Roundup Ready (RR), using jointly Restricted Maximum Likelihood/Best Linear Unbiased Prediction (REML/BLUP) approaches, factors analysis and principal components analysis, processed with favorable agronomic traits, during the 2013/2014 growing season. Three agronomic selection processes were identified to select genotypes that discriminate genotypes containing more specific properties. Process 1 (insertion height of first pod, HFP; number of branches, NB; number of pods, NP; number of nodes, NN; and grain yield, GY) was efficient to select earlier, smaller genotypes with good yield/production components and lodging resistance. The junction between mixed model via REML/BLUP and the applied multivariate statistic using factor analysis helped to select suitable genotypes with high performance to carry on the soybean plant-breeding program.


INTRODUCTION
Soybean (Glycine max (L.) Merrill), with its expanding commercial crop areas, has become very important in the world scenario (Cavalcante et al., 2010).The species have a complex production, storage, processing and marketing structure, where they are grown on a large scale (Rezende and Carvalho, 2007).
In Brazil, soybean growth and productive capacity improving are tied to advances in breeding programs and environmental conditions (Klahold et al., 2006).
Genetic improvement has contributed to increase soybean production, and genetic gains resulted from traditional methods involving hybridization and consequent phenotypic selection.Currently, it is combined with the use of transgenic and molecular markers (Peluzio et al., 2009).
One of the most important features of soybean breeding programs is searching for cultivars with favorable traits to obtain significant productivity gains.*Corresponding author.E-mail: bastos.andrea@yahoo.com.Tel: +55 16 3209 2666.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License However, genetic gains have become increasingly difficult to achieve for species submitted to long selection processes (Maia et al., 2009).
More accurate statistical methodologies should be employed to obtain highly effective estimates of the genetic gain, which is expected in each selection cycle.Besides, plant breeding has a strong link with statistics, therefore, in addition to selection methods, good field trials, and relevant resources for choosing genetic designs, the more recent trend uses more refined statistical analytical procedures for a more detailed study of components of the average and variance of a character (Maia et al., 2009).
For many years, plant breeding programs depended on selecting genotypes by analyzing each agronomic variable individually, estimating genetic parameters, applying selection indices for traits and analyzing the environments to check the genotype x environment interaction.In recent years, multivariate methods and mixed models have become more important due to advances in computer software, and are being applied to evaluate genetic divergence (Costa et al., 2006;Oliveira et al., 2008;Bizari et al., 2014), select genotypes and progenies (Vianna et al., 2013;Dallastra et al., 2014), study adaptability and stability of genotypes (Maia et al., 2006;Mendonça et al., 2007;Borges et al., 2010b;Gomez et al., 2014), estimate genotypic values and study genetic parameters (Duarte et al., 2001;Lopes et al., 2008).
In view of the positive aspects of mixed models and multivariate analyses, this study aimed to select superior soybean genotypes that originated from crosses between conventional and transgenic lines (Roundup Ready®) (RR), using the REML/BLUP methods, factors analysis and principal components analysis.

Genetic material, experimental site and agronomic traits
This study evaluated soybean segregating populations that resulted from crosses between conventional lines of the College of Agricultural and Veterinary Sciences/UNESP, Jaboticabal.These genotypes are widely adapted, and the commercial cultivars carry the RR gene of the MSOY RR group.
The trial evaluated 202 soybean genotypes from the generation F6 during the 2013/2014 growing season, in the Experimental Farm of Education, Research and Production (FEPE), of Agricultural and Veterinary Sciences College/UNESP, Jaboticabal, SP.
The experimental design was augmented Federer blocks, containing 13 blocks, with 5 m long rows of plants spaced 0.45 m as the plots.Four standard cultivars were used as additional checks, two conventional ('CODETEC-216' and 'Vmax') and two carriers of the RR gene ('BMX-Força RR' and BRS 'Valiosa RR').
The following agronomic traits were evaluated in six plants per parcel, in beginning flowering stage (R1) till full flowering stage (R2): a) days to flowering (DF) -number of days elapsed from plant emergence up to when 50% of the flowers opened; b) Plant height at flowering (PHF)the distance between plant insertion in the soil and the apex of the main stem, in centimeters (cm).
The following traits were evaluated at full mature stage (R8): a) days to maturity (DM) (Fehr and Caviness, 1977)-period elapsed between sowing and the date when 50% of the plants displayed 95% of mature pods; b) plant height at maturity (PHM)distance measured on the stem between plant insertion in the soil and the insertion of the uppermost pod, (cm); c) insertion height of first pod (HFP) -distance between the soil surface and the insertion of the first pod (cm); d) Lodging (LD) -evaluated by a visual score, ranging from 1 (all plants standing) to 5 (all plants lodging); e) agronomic value (AV) -assessed by a visual score, ranging from 1 (plants with poor agronomic traits) to 5 (plants with optimal agronomic traits).The scores evaluated a set of visual adaptive traits: plant architecture, number of filled pods, vigor and plant health, premature pod threshing and leaf retention at maturity; f) Number of branches per plant (NB) -total number of branches attached to the main stem of the plant; g) Number of nodes (NN)total number of internodes per plant; h) Number of pods (NP) -total number of pods with formed seeds per plant; i) Grain yield (GY)grain weight per individual plant obtained after plant harvesting, processing and drying of the grains (up to 13% moisture), expressed as grams per plant (g plant -1 ); j) Hundred seeds (grains) weight (HSW)the average weight of four samples of hundred seeds determined using a precision balance (1 g).

Estimation of genetic parameters
Genetic parameters were estimated by restricted maximum likelihood (REML) and genotypic means (generation F6) adjusted and estimated by Best Linear Unbiased Prediction (BLUP).In the analysis of mixed models with unbalanced data, the model effects are not tested via F tests as is done in the analysis of variance method.In this case, for the random effects, the scientifically recommended test is the likelihood ratio test (LRT) (Resende, 2007b).The likelihood ratio test (LRT) was used to evaluate the traits significance in the experiment which was determined by the chi-square at 5% and 1% probability with one degree freedom (Nelder and Wedderburn, 1972).Considering the experimental augmented blocks of Federer, the matrix data was analyzed according to the statistical model (Resende, 2007b): Where: y is the data vector; f, vector of fixed effects (general average); g, vector of genotypic effects (assumed to be random); b, vector of block environmental effects (assumed to be random); e, vector of errors and residues (random); X, Z and W are the incidence matrices for these effects (f, g, ande, respectively).
The distribution and structures of means and variances are given below according to Barreto and Resende (2011): The mixed model equations for the model adopted are (Barreto and Resende, 2011): Variance between blocks, 2 e  = Residual variance between parcels.
Statistical analysis for mixed models was performed using the linear analysis procedure of the PROC MIXED software (SAS Institute, 2011).

Selection of genotypes
The genotypes were selected using exploratory multivariate statistical techniques due to the structure of dependence in the original set of variables.The multivariate technique known as factor analysis used the method of principal components, calculated from the correlation matrix.This study used varimax rotation method (Manly, 2008).Each process is identified in the factor according to traits with the most representative loads (greater than 0.50).The processes identified in the factors are called agronomic selection processes.The traits considered in the processing of factor analysis were the genotypic averages estimated by BLUP, as follows: DF, PHF, DM, PHM, HFP, LD, AV, NB, NN, NP, GY and HSW in the studied F6 generation.
The discrimination of genotypes was performed by principal component analysis taking into account all traits, followed by each individual case (Cruz et al., 2012).The Kaiser criterion (1958) was used to select the main components, those whose eigenvalues were above unity since they generate components with relevant amount of the original information.
Each graph displays two circles that resulted from the principal component analysis: a smaller one with diameter between -2 and 2 (α ≈ 5%), and a larger one with diameter between -4 and 4 (α < 0.01).Values located outside each circle were considered genotypes with properties specific for selection.
The genetic parameters and genotypic means of the traits estimated by REML/BLUP indicated that the coefficients of genetic (CV g ) and environmental (CV e ) variation ranged from 0.83 to 92.0% and 6.70 the 69.5%, respectively (Table 2).The CV g /CV e ratio was greater than one for the following traits: NDF, PHF, PHM, HFP, NN, NP, GY and HSW.However, this ratio was not greater than one for traits that were not found significant by the analysis of deviance (chi-square test): DM, LD, AV and NB.
The estimated heritability coefficients (h 2 ) were low for all studied traits, which is undesired in the breeding program.Overall, the isolated variables of this study had either little or no variability to characterize a genotypic selection, and very low heritability estimates, considering each one individually, especially the important soybean agronomic traits of (DM, AV and LD).Consequently, genetic gains were low due to the fact that the studied population had undergone various selection processes, making it difficult to select for genotypes selection index.Nevertheless, factor analysis and principal components identified specific and important genotypes for breeding program.
Data at Table 3 showed the results of factor analysis while three agronomic processes with distinct patterns in the selection of genotypes were characterized, according to the suitability of the information traits acting together in the process.
The first factor (F1), accounting for 29.08% of the original variability, identified a process which aggregated only production traits.In this process, NP, NB and GY were inversely correlated with HFP.The second factor (F2), accounting for 29.74% of the remaining variability, aggregated the traits DF, PHF, DM, PHM and NN, associated with plant cycle and size, which were directly correlated.The third factor (F3), accounting for 12.77% of the remaining variability, aggregated the traits HSW (yield component), LD (lodging), which were directly correlated, but inversely correlated with VA (visual score of genotype quality).
Principal component analysis of Process 1 formed by HFP, NP, NB and GY, which discriminated genotypes regarding grain production was presented by Figure 1.In PC1, genotypes located outside the large circle to the left have higher yield, although displaying lower HFP (1, 50, 88, 165, 171, 172, 189 and 196) contrasting with the genotype 36, located to the right which displayed lower yield and greater insertion height of the first pod.
Data in Figure 2 showed that the second process (DF, PHF, DM, PHM and NN), to the right of PC1, outside the larger circle, characterized genotype 126, with higher PHF and DM, contrasting with genotype 152, located on the left near the zero reference line of PC1.Genotype 184 also located on the outer region of the circle, characterized by lower DF, PHF and DM and with greater PHM and NN.The goal was to determine earlier genotypes with height ranging from 0.80 to 1.0 m and the results indicated that genotype 152 is the closest to the ideal.
A principal component analysis without separation process had been performed with all variables to seek for specific genotypes (Figure 4).In PC1 Genotypes 36, 37, 126 and 170 which differentiated in the outer region of the larger circle, were characterized by greater DF, PHF, DM, PHM, NN, LD, HFP and AV in addition to lower grain yield.In contrast, Genotypes 1,47,49,189,50,88,152,165,171,172,183 and 196 displayed higher grain yield and lower HFP and AV.Moreover they were earlier, shorter and more resistant to lodging.

DISCUSSION
The significant differences of DF, PHF, PHM, HFP, NN, NP, GY and HSW detected by ANADEVI indicated a high variability among studied population.They also indicated that the variance components and their respective coefficients of determination were significantly different from zero in agreement with Resende (2007a).
However, the non-significance of: DM, LD, AV and NB may indicate a narrowing of genetic variation as a result of lower divergence between the parents, being little contrasting to the characteristics analyzed.LRT equal to zero was observed for DM, LD and NB, which corresponded to a lack of genetic variability.
The variability of DF trait may be explained by the presence of early and late cultivars in the preparation of   the crossings.However, DM is as important as DF, which did not vary significantly.It is noteworthy that, the estimates of flowering date and other soybean growth stages are highly relevant for culture management, and for growth and yield modeling.This information can assist crop management under adverse conditions, such as lack of water and lodging (Rodrigues et al., 2001).Therefore, according to the climatic conditions of the region, it is possible to stagger planting and harvesting (Almeida et al., 2011).Evaluation of PHF is associated with searching for earlier cultivars with good productivity.Genotypes with the greatest height at flowering tend to have higher productivity and shorter cycles when accompanied by lower DF.Carvalho et al. (2002) reported that this trait may help to select for yield, and being very effective in the selection of more productive strains.Moreover, they also noted that PHM displayed positive correlation with    productivity, but PHF showed slightly higher correlation values with productivity.
Furthermore, the ideal HFP of soybean crops, under most conditions, is about 15.0 cm, although most modern harvesters can harvest well when the first pod insertion is as low as 10.0 cm (Rocha et al., 2012).However, this trait did not show significant correlation with grain yield (Muniz et al., 2002).
In addition, there is positive correlation between the PHM and LD.Buezzello et al. (2013) observed that the reduction of height of soybean plants was strongly associated to the lodging reduction, contributing to the increase in grain yield of the crop.
Studies have shown that NN and NP display positive correlation with grain yield (Muniz et al., 2002;Arshad et al., 2006;Dalchiavon and Carvalho, 2012), also contributing to indirect selection of genotypes.The traits GY and HSW are highly correlated (Arshad et al., 2006), where GY is the plant individual output and HSW is related to the vigor of seeds and consequently of the plant, and being a production/yield component, as well.
Regarding to the studied traits, the ideal genotype sought should have high GY and AV.It should be earlier (lower DF and DM), resistant to lodging (LD = 1) while PHM should range from 0.80 to 1.0 m, and PHF higher or equal to 10.0 cm.The other studied traits (NDF, PHF, NB, NN, NP and HSW) are expected to enhance the selection of genotypes through correlations with more important genotypic traits.The low values obtained for the CV g /CV e ratio when studying the genetic parameters may indicate lower experimental precision or higher number of genes controlling the trait.The fact that they are smaller than unity indicates unfavorable conditions for the selection of genotypes for these traits (Mistro et al., 2004).The heritability (h 2 ), on the other hand, shows potential for selection within experiments (Borges et al., 2010a).However, the values of h 2 in this study were obtained by REML, which avoids the overestimation of h 2 .The desirable characteristics for a soybean breeding program, in addition to selecting the most productive genotypes, are the earlier genotypes and heights that do not cause lodging.However, if inadequate tools are used there is the risk of selecting genotypes with poor agronomic traits, such as lower AV scores and lower pod

Conclusions
The characteristics number of days to flowering, plant height at flowering, plant height at maturity, insertion height of first pod, number of nodes, number of pods, grain yield and hundred seeds weight are suitable for the selection process, once they showed high genetic variability.Three agronomic selection processes were identified to select genotypes that discriminate genotypes containing properties that are more specific.
The selection strategy containing the variables insertion height of first pod, number of branches, number of pods, number of nodes and grain yield allowed the selection of soybean genotypes with good yield components, more early, smaller sizes and lodging resistance.
The junction between mixed model via REML/BLUP and the applied multivariate statistic using factor analysis helped to select suitable genotypes with high performance to carry on the soybean plant-breeding program.

Figure 1 .
Figure 1.Principal component analysis of the first selection process for agronomic traits (HFP, NP, NB and GY) which discriminates genotypes regarding grain production, in soybean populations of the F6 generation.Jaboticabal, SP, 2013/2014.

Figure 2 .
Figure 2.Principal component analysis of agronomic traits of the second selection process, in soybean populations of the F6 generation.Jaboticabal, SP, 2013/2014.

Figure 2 .
Figure 2. Principal component analysis of agronomic traits of the second selection process, in soybean populations of the F 6 generation.Jaboticabal, SP, 2013/2014.

Figure 3 .
Figure 3.Principal component analysis of the traits of the third agronomic selection process, in soybean populations of the F6 generation.Jaboticabal, SP, 2013/2014.

Table 2 .
Genetic parameters and descriptive statistics of agronomical traits evaluated in studied soybean populations of the F6 generation.Jaboticabal, SP, 2013/2014.

Table 3 .
Factors and their factor loadings after rotation of the factorial axis using the Varimax method for studied traits in soybean populations of generation F6.Jaboticabal, SP, 2013/2014.