Association of statistical methods used to explore genotype × environment interaction (GEI) and cultivar stability

This study aims to acquaint breeders of the need to use statistical tools that will help resolve the identification of consistently better performing genotypes across various environmental conditions. It also aim to reveal the relationship among the various statistical methods used to describe genotype × environment interaction (GEI) and cultivar stability. A mixed model with fixed genotypes and random environments were used for the analysis of variance (ANOVA). In the present study, twenty released bread wheat cultivars were evaluated during the 2009 main cropping season using a randomized complete block design (RCBD) with three replications at seven different environments. The combined ANOVA revealed the presence of a highly significant GEI (p < 0.01) for grain yield indicating its influence on cultivar selection and recommendation. Spearman’s rank correlation coefficient revealed a perfect correspondence between Wricke’s ecovalence (Wi) and Shukla’s statbility variance (). These stability measures also showed a highly significant positive rank correlation with deviation from regression (Sdi), coefficient of determination( ri), AMMI stability value (ASV), variance of ranks (Si), rank sum (R-sum), and mean absolute rank difference (Si) indicating their similarity in cultivar ranking. The principal component analysis (PCA) clearly showed three groupings of the statistical methods as the static concept of stability, the dynamic concept of stability and yield performance measures. Therefore, it is imperative to consider one stability measure from the dynamic concept and one from the yield performance measures for efficient cultivar recommendation.


INTRODUCTION
It is true that agricultural production has become increased during the past decades mainly because of the innovative ideas and efforts of agricultural researchers. However, 870 million people in the world are still suffering from food shortage and malnutrition. The problem is so serious in sub-Saharan Africa being the home of about 26.5% of the world's hungry people (FAO, 2012). This indicates that the increased production and productivity *Corresponding author. E-mail: mulukenbaya2010@gmail.com Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License could not keep pace with the world's population growth especially in the developing countries. The world population, currently 6.78 billion, is expected to be 10 billion by the middle of the 21 st century (IDB, 2008). To feed such huge ever increasing population an assignment has remained for the agricultural scientist and other concerned bodies, at least to double the current food crops production. The key to doubling agricultural production is therefore, increasing efficiency in the utilization of resources (increased productivity per hectare and per dollar) and this includes a better understanding of the impacts of genotype × environment interaction (GEI) on cultivar recommendation and ways of exploiting it (Kang, 2002).
There are three ways of cultivar recommendation and exploiting the GEI in a crop breeding program (Eiseman et al., 1990). (i) Ignoring them, that is, using genotypic means across environments as a criterion for cultivar recommendation even when GEI exists. Interaction, however, should not be ignored when they are significant and of crossover type (Crossa, 1990). (ii) Avoiding theminvolves reducing the influence of significant cross over interaction by grouping similar environments (forming mega environments) and the main goal of crop breeding program in this case will be cultivar recommendation for each mega environment or specific adaptation (Annicchiarico, 2002;Basford and Cooper, 1998). By clustering environments, however, potentially useful information such as the aim of international and national breeders to develop cultivars with broad adaptation may be lost (Kang, 2002) and the number of cultivars recommended will be so large creating difficulties in the seed system of countries with diverse agro-ecologies like Ethiopia. (iii) Exploiting them -where the breeding program focuses on assessing the stability of genotypic performance across diverse environments by analyzing and interpreting the impact of GEI for broad adaptation. Therefore, breeders have to exploit the potential embedded in genotypes towards minimal GEI and enable breeders to identify cultivars performing well under different growing conditions.
To analyze and determine the extent of GEI under varying growing conditions, a number of univariate parametric and nonparametric as well as multivariate statistical methods have been developed by different researchers. The most commonly used are the parametric methods that require the fulfillment of some statistical assumptions such as normal distribution, independence, homogeneity of error variance and absence of outliers (Sabaghnia et al., 2006). Eberhart and Russell regression coefficient (bi) and deviation from regression (S 2 di ), Coefficient of determination (r i 2 ), Wricke's ecovalence (W i ), Shukla stability variance (σ 2 ), cultivar superiority measure (P i ), coefficient of variation (CV) and environmental variance (S 2 xi ) are under this category. AMMI stability value (ASV) is among the multivariate group. However, if the aforementioned assumptions are violated the nonparametric stability measures might be a good option (Huehn, 1990). Mean absolute rank differences (S i (1) ), and variance of ranks (S i (2) ) are among the commonly applied nonparametric methods. The stratified ranking (TOP and LOW), where genotypes are ranked at each environment separately and the number of sites at which the cultivar occurred in the top and bottom third of the ranks computed (Fox et al., 1990) and rank sum (R-sum) where both yield and Shukla's stability variance are used as a criterion for cultivar ranking (Kang, 1988) are also among the nonparametric methods.
However, different opinions still exist among the leading scientist and the users of the different statistical methods in identifying the best and most suitable procedures to be used for multi location and year data set or production environments. For example, Fox et al. (1990) criticized the Lin and Binns cultivar superiority measure, noting that it may be influenced by scale of measurement; Freeman and Perkins (1971) noted that joint linear regression approach has a number of statistical and biological limitations; and the parametric stability measures might not be good when some assumptions are violated (Huehn, 1990). All these indicate the need of assessing the relationship among the stability measures developed for the analysis and interpretation of multi-environment data. Therefore, this study was carried out with the following objective: to see the relationship among the various statistical methods used to describe GEI and stability analysis.

MATERIALS AND METHODS
The study was conducted at seven environments during the 2009 main cropping season. The environments are different in soil type, altitude, mean maximum and minimum temperatures, amount of rainfall and relative humidity (Table 1). Twenty cultivars were planted at each environment in a randomized complete block design (RCBD) with three replications. Each experimental unit had six rows of 2.5 m length with 0.2 m spacing between rows (3 m 2 ). A 1.5 m alley was left between blocks. A seeding rate of 150 kg/ha was used. The recommended fertilizer doses of each environment (92-46 kg N-P 2 O 5 /ha for Adet cambisol and nitosol; 138 -46 kg N-P 2 O 5 /ha for Motta, Injibara, Debark and Debre Tabore; 64 -46 kg N-P 2 O 5 /ha for Finote Selam) was applied in the form of urea and diamoniumphosphate (DAP). The whole DAP was applied at planting but Urea was split into one third at planting and the remaining two third at tillering stage. Other management practices were performed following the recommendation.
Combined analysis of variance (ANOVA) was carried out using the PROC MIXED model of SAS program (SAS, 2002). Genotypes were assumed fixed and environmental effects as random. Significance levels of the ANOVA procedure for mixed model were determined as suggested by McIntosh (1983) and Romagosa and Fox (1993). Variance components were estimated following the PROC VARCOMP of SAS program. Fourteen stability measures were computed in accordance with Wricke's (1962) ecovalence (W i ) as cited in Becker and Leon (1988), Eberhart and Russell's (1966) coefficient of regression (b i ) and deviation from regression (S 2 di ), Shukla's (1972a) stability variance ( 2 ), Pinthus's (1973) coefficient of determination (r 2 i ), Francis and Kannenberg's (1978) CV and environmental variance (S 2 xi ), Lin and Binns (1988) cultivar superiority measure (Pi), Nassar and Huehn's (1987) mean absolute rank difference (S i (1) ) and variance of ranks (S i (2) ), Kang's (1988) rank sum, Fox et al. (1990) TOP and LOW parameters and Purchase's (1997) ASV. Most of these stability measures were computed using AGROBASE20 computer program (Agrobase, 2000). Whereas, Fox et al. (1990) TOP and LOW stability measures were computed using a SAS program called SASG × ESTAB (Hussein et al., 2000).
To see the association among the stability measures, Spearman's rank correlation coefficient were computed between all possible pairs of stability measures including grain yield using AGROBASE20 computer program and principal component analysis (PCA) using Genstat program. In order to determine Spearman's rank correlation coefficient as outlined by Steel and Torrie (1980) between the different procedures, all the genotypes evaluated was respectively assigned stability values and ranked according to the procedure and definitions used. Ranking numbers are whole numbers and when two or more equal ranking numbers occur, the average of ranking numbers that they otherwise would have received are ascribed to each genotype. Consider n genotypes are arranged in the same following order for the two stability measures; X i indicates the ranking order or number of the i th genotype for the first stability measure, Y i indicates the ranking order of the i th genotype of the second stability measure, then d i = X i -Y i (I = 1,2,3,…..n) and Spearman's rank correlation coefficient (r s ) can be described as: The significance of rank correlation coefficient between any two stability measures was tested by means of student's t test as described by Steel and Torrie (1980) with n-2 degrees of freedom:

Cultivar performance and genotype × environment interaction
The combined ANOVA for grain yield indicated that there were a highly significant difference between genotypes, environments and GEI. The significant GEI indicated that genotypes under different environments behave differently for the expression of their performance. It means a particular genotype may not exhibit the same phenotypic performance under different environmental conditions or different genotypes may respond differently to a specific environment. The grain yield performance of cultivars were ranged from 3.78 to 4.49 ton/ha ( Table 2). The variance component estimation for grain yield also indicated that environments, genotypes and GEI contributed about 72.25, 5.35 and 10.87% of the total variation. This indicates that the test environments were highly variable and had the highest influence on the yielding potential of bread wheat cultivars. The variance components due to GEI is higher than the genotypes variance indicating one could not ignore the influence of GEI on cultivar recommendation for a specified growing condition.

Association among stability measures
Spearman's rank correlation coefficients were computed for the various parametric and nonparametric stability measures including mean grain yield and presented in Table 3. Mean grain yield had statistically highly significant positive rank correlation with the cultivar superiority measure, TOP, LOW and with the R-sum. It had also a significant rank correlation with the CV. Flores et al. (1998) reported a significant rank correlation of grain yield with P i and CV and they suggested that yield has an important influence on the ranking of genotypes by these stability measures. The highly significant rank correlation of mean grain yield with P i , LOW, R-sum and TOP indicates that selection for increased grain yield in bread wheat would change yield stability by decreasing P i , LOW and R-sum, but by increasing the TOP value. This further indicates the need to develop genotypes that are specifically adapted to environments with optimal growing condition. Similarly, a significant positive rank correlation of grain yield with TOP and R-sum was reported by Sabaghnia et al. (2006) and Solomon et al.  (2007) reported a negative but non-significant rank correlation between grain yield and the TOP value on wheat genotypes. Even though the correlation coefficient is not significant and strong, grain yield had negative relationship with the coefficient of regression (r = -0.38). This result disagrees with the previous results of Piepho and Lotito (1992), Mekbib (2003) and Akcura et al. (2006) who reported a positive and significant rank correlation between grain yield and coefficient of regression. Mean grain yield had also a non-significant negative rank correlation with the Francis and Kannenberg's environmental variance. Conversely, mean grain yield had weak positive correlation with the other stability measure. The Eberhart and Russell's regression coefficient (b i ) shows a highly significant positive rank correlation with the Francis and Kannenberg's environmental variance. This indicates that the two stability measures are equivalent in genotype ranking. This result supports the findings of Akcura et al. (2006) and Ferney et al. (2006). Except with CV, however, the regression coefficient had negative rank correlation with most of the stability measures. For example, it had a significant negative rank correlation with  2 , W i, S di 2 , S i (2) , r i 2 and with R-sum. This result supports the findings of Piepho and Lotito (1992) who reported a negative rank correlation of bi with most of the stability measures on sugar beet. However, it disagrees with the results of Mekbib (2003). The significant negative rank correlation between regression coefficient (b i ) and coefficient of determination (r i 2 ) indicated that the genotypes that were highly responsive to high yielding environments were less responsive to low yielding environments and vices versa.
The Eberhart and Russell's deviation from regression showed a highly significant correspondence with Shukla's stability variance, Wricke ecovalence, S i (1) , S i (2) , ASV, r i 2 and with R-sum; but non-significant positive rank correlation with mean grain yield and CV. In line with this, Mekbib (2003) reported a significant positive correlation between S 2 di ,  2 and W i . It also had negative but negligible rank correlation with the TOP, LOW and with the environmental variance. This negligible rank correlation suggested that it is imperative to include the deviation from regression, while using the TOP, LOW and environmental variance as a tool for cultivar stability assessment and recommendation.
Shukla's stability variance had a highly significant rank correlation with most of the stability measures (such as deviation from regression, mean absolute rank difference, variance of ranks, ASVs, coefficient of determination and R-sum). This indicates that either of these stability measures could be used for bread wheat genotype recommendation. A perfect rank correlation between Shukla's stability variance and Wricke's ecovalence (r = 1.00) indicates that these two stability measures were equivalent for genotype ranking purposes. This may be due to their biometrical relationship that Shukla's stability variance is the linear combination of the ecovalence. In line with this result, Solomon (2006) on maize reported a perfect correspondence between them.
The Lin and Binns cultivar superiority measure (P i ) shows a highly significant positive rank correlation with the TOP, LOW and R-sum. This indicated that either of these stability measures could be sufficient for cultivar stability assessment and recommendation. In this case, however, care should have to be taken. Because Lin and Binns (1988) defined cultivar superiority measure (P i ) as the mean square distance between a cultivar's yield and highest yield achieved, it may be therefore influenced by scale of observations which will be more important when ranges of site mean yields are large as commonly seen in multienvironment trials. There was no significant relationship between P i and most of the parametric stability measure depicting P i is not normally a stability measure rather a performance indicator. Similar result was reported by Purchase et al. (2000).
The Wricke's ecovalence shows a highly significant positive rank correlation with S 2 di , r i 2 , ASV, S i (2) , R-sum, and S i (1) . Similarly, a positive correlation between W i and S 2 di were reported by Duarte and Zimmermann (1995) and Mekbib (2003). A positive but negligible rank correlation was also observed between ecovalence and the LOW parameter. On the other hand, ecovalence had negligible negative rank correlation with the parameters TOP and S xi 2 . Because of their biometrical relationship, the observed high correspondence between W i , (Si (1) ) and (Si (2) ) is highly expected. The nonparametric stability measures (Si (1) ) and (Si (2) ) are based on the ranks of values denotes the observed values of genotype i in environment j, the mean of genotype i in all environments and the overall mean respectively. Subtracting the environmental mean   j X  from the above term will not affect the ranking within environments. So ranking values is equivalent to ranking of ( (Piepho and Lotito, 1992). It is known that the Wricke's ecovalence (W i ) is the sum of squares of the term . This relationship clearly justifies that these stability measures are almost similar for genotype ranking.
Similarly, because ecovalence may be partitioned into two components: the covariance between GEI effects and environmental effects  (Becker and Leon, 1988), the higher rank correlation between W i and S 2 di indicates that the covariance component explains only a small portion of the ecovalence values. In other words, since the regression coefficient (b i ) in this study was non-significantly different from unity and the sum of squares of is constant for all genotypes, most of the ecovalence value was contributed by the deviation from regression.
Nassar and Huehn's mean absolute rank difference (S i (1) ) and variance of ranks (S i (2) ) showed a highly significant positive rank correlation with each other. These two nonparametric stability measures also had a highly significant positive rank correlation with the ASV, coefficient of determination, deviation from regression and with the R-sum. This result suggests their similarity and consequently, only one of these stability measures would be enough to identify stable genotypes in a breeding program. Kang's rank-sum showed a significant positive rank correlation with most of the stability measures except with b i being negatively correlated. The ASV had a significant rank correlation with the coefficient of determination. In addition, the percentage of sites for which each genotype occurred in the top (TOP) and bottom (LOW) third of entries in each trial showed a significant positive correspondence with each other indicating their similarity for genotype ranking purposes. Similar finding was reported by Solomon et al. (2007).

Principal component analysis (PCA)
To understand the relationships among the various stability measures, PCA based on the rank correlation matrix was performed. The first two PCA's explained 77.5% (41.9 and 35.6% by PCA1 and PCA2, respectively) of the total variance of the original variables.
The relationships among the different stability measures are graphically displayed as a two dimensional scatter plot of PCA1 and PCA2 (Figure 1). This scatter plot clearly reveals three different groups of the stability measures. The mean grain yield, cultivar superiority measure (P i ), the TOP and LOW parameters scattered together in one group indicating their being performance measures rather than stability. The second group consists of S i (1), S i (2) , W i ,  2 , S di 2 , and r i 2 together (representing the dynamic concept of stability); and the regression coefficient (b i ) and the environmental variance (S xi 2 ) in the third group which represents the static concept of stability. Whereas the CV as well as the Rsum was not grouped in any of the three classes. They were clustered separately indicating they are different from the other stability measures in genotype ranking like CV or associated with most of the stability measures like R-sum. This biplot clustering has indicated the similarity and dissimilarity of the various stability measures in cultivar ranking.

Conclusion
The observed strong positive association among ecovalence (W i ), stability variance ( 2 ), deviation from regression (S di 2 ), ASV and coefficient of determination (r i 2 ) indicate their similarity in cultivar ranking and therefore a breeder can use only one of them depending on their simplicity and the nature of data set. In addition, mean absolute rank difference (S i (1) ), variance of ranks (S i (2) ) and R-sum that showed a strong association with the aforementioned measures can be a good alternative for cultivar stability assessment and recommendation. This holds more important especially in cases where the data set exhibited a problem of outliers, violation of assumptions such as normal distribution, independence and homogeneity of error variance. Besides these stability measures, the genotypes' grain yield performance measures should always be considered together with the stability measures.