GGE biplots to analyze soybean multi-environment yield trial data in north Western Ethiopia

The study was undertaken with the objective to examine the nature and to quantify the magnitude of genotype x environment interaction effects on soybean [Glycine max (L.) Merr.] grain yield and to determine the winning genotype (s) for test environments in north western Ethiopia. The experiment was executed at four different locations of Ethiopia for two consecutive years (2007 and 2008) using thirty two genotypes including two checks. Randomized complete block design with three replicates was employed. The combined analysis of variance over environments explained soybean grain yield was significantly (p<0.001) affected by environments (25.58%), genotypes (14.87%) and genotype x environment interaction (59.55%). The result depicted differential performance of soybean genotypes at different test environments and hence the interaction was crossover type. The genotype main effect plus genotype x environment interaction (GGE) biplots were applied to analyze and visualize pattern of the interaction component. The first two principal components (PC1 and PC2) of the GGE explained 63.4% with PC1=41.6 and PC2=21.8 of the GGE sum of squares using environment standardized model. Genotypes, G13 (TGX-1998-29F), G3 (TGX-849-313D), and G7 (TGX-1889-29F) combined both high mean yield and high stability performance across the test environments and could be characterized as an ideal genotypes.


INTRODUCTION
Soybean [Glycine max (L.) Merr.] is the world's leading source of oil (20%) and protein (40%). It is produced in a wide range of environments around the world. The crop was introduced in to Ethiopia in the 1950 and it has been growing in different agro-ecologies of the country. However, its production has not yet spread over compared to the country's potential. It is mainly constrained by lack of improved and stable varieties suited for different growing ecologies in the country and lack of popularization and market linkages (Asfaw et al., 2006). Soybean research in the country has been going and managed the release of some improved varieties tested across environments in Ethiopia. Nevertheless, the national programme overlooked the effect of genotype x environment (GE) interaction and the concept of stability and it capitalizes on varieties with only good mean performance across a wide array of environments and years. Moreover, the GE interaction effect is, most often, a common phenomenon in a multi-environment yield trail and presents limitations on variety selection and recommendation for target environments, and hence, must be either exploited by selecting superior genotype for each specific target environment or avoided by selecting widely adapted and stable genotype across wide range of environments (Ceccarelli, 1989). Previous studies in Ethiopia and elsewhere revealed significance presence of genotype x environment interactions in soybean multi-environment yield trial data (Amira et al., 2013;Asfaw et al., 2009;Bueno et al., 2013;Gurmu et al., 2009;Tukamuhabwa et al., 2012). Numerous statistical methodologies have been proposed and used to analyze and visualize the nature and magnitude of genotype by environment interaction. In the recent literatures, the use of Additive main effect and multiplicative interaction (AMMI) (Gauch, 2006;Gauch and Zobel, 1988;Zobel et al., 1988) and Genotype plus Genotype x Environment interaction (GGE) proposed by Yan et al. (2000) models have been emphasized for multi environment trial data. However, GGE best fits for megaenvironment analysis (like 'Which-won-where' pattern), genotype evaluation (mean vs. stability), and test environment evaluation which provides discriminating power vs. representativeness (Amira et al., 2013;Yan et al., 2007) of the test environment. GGE has been recognized and implemented as useful method to analyse and visualize the pattern of genotype x environment interaction in multi environment cultivar evaluation of different crops including wheat, maize, soybean, and oilseeds (Asfaw et al., 2009;Brar et al., 2010;Fan et al., 2007;Jandong et al., 2011;Yan et al., 2000).
The aims of this study were to examine the nature and to quantify the magnitude of genotype x environment interaction effects on soybean grain yield and to determine the winning genotype (s) for test environments in north western Ethiopia.

MATERIALS AND METHODS
Two sets of experiments comprising of 16 soybean genotypes each were used for this study. The experiments were conducted for the same two years (2007 and 2008) and at similar four locations: Pawe, Manbuk (Dangur), Dibate and Bullen. The experiments were also arranged with similar experimental set up while executed in the field. The only noted difference was the genotypes were grouped based on maturity category. The maturity grouping was based on the mean number of days to physiological maturity and which is of course very much relies on the environmental conditions. Sixteen of the genotypes were late maturity type and the remaining sixteen as medium maturity type. Given the above conditions, the two experiments were merged without changing the essence of separate experiments to have good number of observations and degree of freedom for reliability of the result and possible recommendation. The locations were regional soybean testing sites (North western Ethiopia). Pawe is used for both national and regional soybean testing site in the country. The genotypes were obtained from International Institute for Tropical Agriculture (IITA) as test material (Table 1) and Belesa 95 and Davis were checks. Taking year by location combinations, eight environments were considered in the study. A randomized complete block design with three replicates nested at each environment was employed. Plots comprised of five rows of 4m long with 60 and 5cm between rows and between seeds within a row, respectively, were used. Standard agronomic and plant protection treatments were applied uniformly across the plots for the duration of the experiment. The central three rows were harvested for grain yield measurement.
Grain yield data were subjected to combined analysis of variance using SAS GLM (SAS, 2004) to examine the main effects of the environment (E) and genotypes (G) and their interaction effect (GE) variances. Existences of significant interaction (GE) variance justify further partitioning of this variance component. Further partitioning and analysis of the GE was computed using the GGE model (Yan, 2001). The GGE biplot was constructed using the first two principal components (PC1 and PC2) derived from subjecting environment centered yield data (Yan et al., 2000). The GGE model used was: Where Yij is measured mean of genotype i(=1,2,….,n) in environment j(=1,2…,m), µ is the grand mean, βj is the main effect of environment j, µ + βj being the mean yield across all genotypes in environment j, λ1 and λ2 are the singular values (SV) for the first and second principal component (PC1 and PC2), respectively, ξi1 and ξi2 are eigenvectors of genotype I for PC1 and PC2, respectively, ŋ1j and ŋ2j are eigenvectors of environment j for PC1 and PC2, respectively, εij is the residual associated with genotype i in environment j. PC1 and PC2 eigenvectors cannot be plotted directly to construct a meaningful biplot before the singular values are partitioned in to the genotype and environment eigenvectors. Singular value partitioning was implemented by, Where f1 is the portion factor for PC1. The f1 can range between 0 and 1. To visualize relationship among genotypes, the GGE biplot based on genotype metric (that is, f=1; S.V.P=1) is appropriate and environment metric (f=0; S.V.P=2) GGE biplot is important to visualize relationship among environments. So the following formulae from Equation (1) were formulated to generate the GGE biplot: If the data were environment-standardized, the common formulae to generate the GGE biplot were as follows: Where sj is the standard deviation in environment j, i=1,2,….,k, gi1and e1j are PC1 scores for genotype i and environment j, respectively. In the present study we used environment standardized model, Equation (4).

Combined analysis of variance
The combined analysis of variance over environments explained soybean grain yield was significantly (p<0.001) affected by environments (E), genotypes (G) and genotype by environment interactions (Table 2). Environment accounted about 25.58% of the variation.
The GE explained about 59.55% of the variation which is more than double of the environmental and four times of the genotypic effects of the total variation. The large GE effect in this study suggests the possible presence of different mega-environments with different winner genotypes (Yan and Kang, 2003). Similar result on the same crop in Nigeria was revealed by Jandong et al. (2011). This result depicted that the performance of soybean genotypes were different at different testing environments (different winners at different environments) due to the existence of large GE interaction. As revealed by differential yield ranking of genotypes, the GE was crossover type (Table 3). The four environments out of eight had different winner genotypes. This situation complicates selection process and cultivar recommendation in breeding programs (Comstock and Moll, 1963). Existence of significant and large GE in soybean in Ethiopia and other African countries has been revealed (Asfaw et al., 2009;Bueno et al., 2013;Gurmu et al., 2009;Tukamuhabwa et al., 2012).

GGE biplot analysis
The GGE refers to the genotype main effect (G) and the genotype x environment interaction (GE), which are the two most important sources of variation for cultivar evaluation in a multi environment trials (Yan et al., 2007). A GGE biplot displays the genotypic main effect (G) and genotype by environment interaction (GE) of a genotypeby-environment dataset (Yan et al., 2000). This biplot is specially and perfectly used for mega-environment analysis based on genetic correlation between environment and the which-won-where pattern; test environment evaluation based on their discriminating ability and representativeness; and genotype evaluation based on their mean performance and stability across a mega-environment. The present data set showed 0.901  correlations between the primary effects and the genotype main effects which justifies the use of GGE biplot (Crossa et al., 2002;Yan et al., 2000). The first two principal components (PC1 and PC2) of the GGE explained 63.4% of the sum of squares with PC1 = 41.6% and PC2 = 21.8% of the GGE sum of squares using environment standardized model.

Mega-environment analysis
GGE biplot produces best polygons to view or visualize the genotype x environment interaction pattern (Yan and Kang, 2003). Visualization of the 'Which-won-where' pattern in the polygon view is helpful to estimate possible existence of different mega-environments in the target environment (Yan and Rajcan, 2002;Yan et al., 2000;Yan and Tinker, 2006). Figure 1a presents a polygon view of thirty two soybean genotypes tested at eight environments. With this biplot, a polygon was constructed by connecting the vertex genotypes (located farthest away from the biplot origin in various directions) with straight lines and as a result, the rest of the genotypes placed inside the polygon. Genotypes, G13, G20, G6, G1, G23 and G32 were vertexes of the polygon. From the polygon view of this biplot, test environments and genotypes fell in to three and four sectors, respectively. Three of the sectors in the polygon had no test environment. Pawe in both years fell in one sector suggesting repeatable performance of genotypes in this location. Decisively, a repeatable 'which-won-where' pattern was observed in Figure 1b. This figure presents a polygon view of sixteen soybean genotypes tested at eight environments. These sixteen genotypes were subsets of the thirty two soybean genotypes considered in the main study. The necessary and sufficient condition for mega-environment delineation is a repeatable whichwon-where pattern rather than merely a repeatable environment-grouping pattern (Yan and Rajcan, 2002;Yan and Kang, 2003). Hence, Pawe could be considered as separate mega-location for soybean variety evaluation and recommendation. A similar result has been documented by Asfaw et al. (2009). Genotype G13 was winner at Manbuk (2008) and Bullen (2007). G20 was top yielder at Bullen (2008), Dibate (2007) and Manbuk (2007). Genotypes G6 and G1 were winners at Dibate (2008) and Pawe (2007) and (2008). Vertex genotypes G23 and G32 were not winners at any test environment.

Test environment evaluation
The purpose of test-environment evaluation is to identify test environments that effectively identify superior genotypes for a mega-environment. An "ideal" test environment should be both discriminating of the genotypes and representative of the mega-environment (Yan et al., 2007). Figure 2 presents the "discriminating power vs. representativeness" view of the GGE biplot of 32 soybean genotypes tested at eight test environments based on environment-focused scaling (Yan, 2002), with the singular values entirely partioned in to environment scores (SVP=2). In the biplot, when the data are not scaled (scaling=0), the line that connects the environment marker to the biplot origin is proportional to the standard deviation of the genotype mean in the environment. Test environments with longer vectors are more discriminating of the genotypes whereas a test environment marker with a very short vector provided little or no information about the genotype differences (Yan et al., 2007). Thus, in the present study, Dibate (2008), Bullen (2007 and Pawe (2007) were the most discriminating of the genotypes whereas Dibate (2007) provided very little information about genotypic differences. The second most important deal of test environment evaluation is its representativeness of the mega-environment. It is visualized by the angle between the environment vector and abscissa of average environment axis. When SVP = 2, the cosine of the angle between any environment vector and the average environment axis approximates the correlation coefficient between the genotype values in that environment and the genotype means across the environments. The smaller the angle, the more representative the test environment would be (Yan and Tinker, 2006;Yan et al., 2007). Hence, Bullen (2007) followed by Dibate (2007) and Manbuk (2008) were more representative environments for soybean regional trials. Bullen (2007) had long vector and small angles with the abscissa of average environment axis was ideal for selecting superior soybean genotypes for the north western soybean growing regions of Ethiopia.

Genotype evaluation
Mean yield and stability performance of genotypes, and ranking relative to an ideal genotype: Ranking of thirty two soybean genotypes based on their mean yield and stability performance are presented in Figure 3a. The line passing through the biplot origin is called the average tester coordinate (ATC) (Yan and Kang, 2003). The double arrow line which is perpendicular to ATC and passes through the origin represents stability of genotypes. An ideal genotype should have the highest mean performance and be absolutely stable (Yan and Kang, 2003). Ranking of soybean genotypes based on PC2 PC1 Figure 2. The "discriminating power vs. representativeness" view of the GGE biplot based on 32 soybean genotypes tested at eight test environments. PC2 PC1 Figure 3a. GGE biplot showing ranking of genotypes for both mean yield and stability performance across environments. G1-G32 is codes for soybean genotypes.
both mean yield and stability relative to an ideal genotype is presented in Figure 3b. In the biplot, an ideal genotype is located at the center of the concentric circle. It is a location with the longest vector of all the genotypes and PC2 PC1 Figure 3b. Ranking genotypes based on both mean and stability relative to an ideal genotype. Putting the ideal genotype at the center, concentric circles were drawn to visualize how far each genotype is from the ideal genotype. G1-G32 is codes for soybean genotypes.
with nears zero ordinate ATC axis projection. Hence, genotypes, G13, G3 and G7 in that order were placed in the center of the concentric circle and could be considered as an ideal soybean genotype with the highest mean yield and be most stable across the test environments. Other genotypes based on distance from ideal genotype were ranked as G4>G2>G25>G30>G8>G12>G11>G31>G5>G20>G16 (check)>…. >G28>G32 (check)>G23, where those ranked last were unfavourable as they were most far from the ideal genotype.
Ranking genotypes relative to the highest yielding environment (Pawe08): The highest yielding environment, Pawe08, was used to evaluate the genotypes and ranking of genotypes relative to Pawe08 is presented in Figure 4. A line that passes through the biplot origin and Pawe08 marker was drawn to make Pawe08-axis. A perpendicular line from each genotype towards this axis was also drawn and used to compare the relative yield of the genotypes. The genotypes were ranked based on length of their projections onto Pawe08axis. Rank increases as one goes to the positive end (Yan et al., 2000). Hence, sixteen genotypes including G13, the best yielding genotype and one of the two checks, G16 had yields above average yield, while the rest genotypes yielded below the average performance.
Relative adaptation of G13, which is the best yielding genotype: Figure 5 reveals the performance of test environments relative to the highest yielding genotype (G13). The relative adaptation of G13 was studied by drawing a line that passed through the biplot origin and G13 marker, and environments and G13 were ranked along this axis (Yan et al., 2000). The length of environment projections onto the G13 axis assessed the performance of G13 at different environments, relative to other genotypes. Thus, G13 had yields higher than the average in seven of the eight testing environments that is, except at Manbuk (2007).

Conclusion
The study result revealed that soybean yield performance was significantly influenced by genotype x environment interaction followed by environment and genotype effects. The magnitude of the GE effect was about two and four times more than that of the environmental and genotypic effects, respectively. The GE effect was crossover type as revealed by differential yield ranking of the genotypes across environments. GGE biplots were effective enough for analyzing and visualizing the patterns of GE of the soybean multi-environment data with respect to test environment and genotype evaluations. Thus, genotypes, PC2 PC1 Figure 4. Ranking genotypes relative to Pawe08, which is the highest soybean yielding environment. G1-G32 is codes for soybean genotypes. G13 (TGX-1998-29F), G3 (TGX-849-313D), and G7 (TGX-1889-29F) combined both high mean yield and high stability performance across the test environments and could be characterized as an ideal genotypes.