Germplasm-regression-combined marker-trait association identification in plants

In the past 20 years, the major effort in plant breeding has changed from quantitative to molecular genetics with emphasis on quantitative trait loci (QTL) identification and marker assisted selection (MAS). However, results have been modest. This has been due to several factors including absence of tight linkage QTL, non-availability of mapping populations and lack of substantial time needed to develop such populations. To overcome these limitations and as an alternative to planned populations, molecular markertrait associations have been identified by the combination between germplasm and the regression technique. In the present preview, we first surveyed the successful applications of germplasm-regression-combined (GRC) molecular marker-trait association identification in plants; secondly, we described how to do the GRC analysis and its differences from mapping QTL based on a linkage map reconstructed from the planned populations; thirdly, we considered the factors that affect the GRC association identification, including selections of optimal germplasm and molecular markers and testing of identification efficiency of markers associated with traits; and finally we discussed the future prospects of GRC marker-trait association analysis used in plant MAS/QTL breeding programs, especially in long-juvenile woody plants when no other genetic information such as linkage maps and Quantitative Trait Loci are available.


INTRODUCTION
The improvement of quantitative traits (that is productivity, disease resistance, abiotic stress tolerance and/ or quality) has long intrigued plant breeders.Many desirable varieties and lines have been selected and bred by conventional breeding programs, such as systematic selection, cross and mutation breeding.With a pedigree breeding program, the breeder will cross two parents and practice selection until advanced-generation lines with the best Abbreviations: QTL, quantitative trait loci; MAS, marker assisted selection; GRC, germplasm-regression-combined; RAPD, random amplified polymorphic DNA; RFLP, restriction fragment length polymorphism; AFLP, amplified fragment length polymorphism; ISSR, inter-simple sequence repeats; SAMPL, selective amplification of microsatellite polymorphic loci; SSR, simple sequence repeat; SCAR, sequence characterized amplified region; SNP, single nucleotide polymorphism; PCR, polymerase chain reaction.phenotype for the quantitative trait under selection are identified.These lines will then be entered into a series of replicated trials to further evaluate the material with the goal of releasing the best lines as a cultivar.This conventional breeding program requires large inputs of labor, land and financial resources.For these reasons, plant breeders are motivated to identify promising lines as early as possible in the selection process.
In the past 20 years, the major effort in breeding has changed from traditional phenotypic-pedigree-based selection systems to molecular genetics with emphasis on quantitative trait loci (QTL) identification and marker assisted selection (MAS).MAS, which uses DNA markers to select optimal genotypes, is an excellent tool for selecting beneficial genetic traits that are difficult to measure, that exhibit low heritability and/or are expressed late in development (Ribaut and Hoisington, 1998;Davies et al., 2006;Wilde et al., 2007;Ender et al., 2008;Knoll and Ejeta, 2008), as well as for assessing the genetic potential of specific genotypes prior to phenotypic evaluation (Gebhardt et al., 2004).Molecular markers linked with QTL/major genes for traits of interest are being routinely developed in many crops and trees, using materials derived from planned crosses such as F 1 s (trees), F 2 s (progeny from F 1 selfing), BCs (back crossed progeny), BILs (F 2 selfed and back-crossed progeny), RILs (progeny from F 2 selfing) and NILs (F 2 back-crossed and selfed progeny).Some of these markers have been successfully applied in some plants breeding programs (Collard and Mackill, 2008) by Marker-assisted backcrossing (Helguera et al., 2003;Benchimol et al., 2005) and pyramiding (Ashikari et al., 2005;Zhang et al., 2006;Barloy et al., 2007) and some will be hopefully used in future MAS (Ren et al., 2005;Song et al., 2007;Hansen et al., 2008).
However, results of MAS/QTL have been modest (Kearsey and Farquhar, 1998;Collard and Mackill, 2008;Hospital, 2009), especially as there are still few reports on success stories in long-juvenile woody plants.This may be because (i) in linkage-based QTL analyses, non-availability of mapping populations (only few genotypes that are used as parents of mapping populations may not be effective in different backgrounds (Liao et al., 2001;Steele et al., 2006)), lack of substantial time needed to develop such, and planned populations (especially for long-juvenile woody plants) are major limitations in the identification of molecular markers for specific traits; (ii) the success of MAS/QTL largely depends on the extent of genetic linkage between markers and relevant QTL loci (Virk et al., 1996;Misztal, 2006), but the absence of tight linkage is in mostly linkage-based studies (Thomas, 2003); (iii) the existence of QTL × environment interactions (Bouchez et al., 2002).In order to overcome these limitations and as an alternative to planned populations, molecular marker-trait association identifications have been conducted through the combination between the present germplasm and the regression technique (Wright and Mowers, 1994;Yonash et al., 2000;Chatterjee and Pradeep, 2003;Chatterjee and Mohandas, 2003;Pradeep et al., 2007;Srivastava et al., 2007) and increasingly adopted in many plants (Maureira-Butler et al., 2007).The germplasm-regression-combined (GRC) association studies not only allow mapping of genes/QTLs with higher level of confidence, but also allow detection of genes/QTLs, which will otherwise escape detection in linkage-based QTL studies based on the planned populations.
In the present preview, we first surveyed the successful applications of GRC molecular marker-trait association identification in plants and then describe how to conduct this analysis and the difference from mapping QTL based on a linkage map reconstructed from the planned population.We then considered the factors that affect GRC association identifications, including selection of optimal germplasm and molecular markers and testing of markers associated with desirable traits.In the end, we discussed briefly the future prospects of GRC marker-trait association analysis used in plant MAS/QTL breeding programs, especially in long juvenile and long-lived woody plants when no other genetic information such as linkage maps and Quantitative Trait Loci are available.
Traits in these studies shown in Table 1 include yield and quality traits, disease and pest resistance, and drought tolerance.Molecular markers used in these studies involve random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), intersimple sequence repeats (ISSR), selective amplification of microsatellite polymorphic loci (SAMPL) and simple sequence repeat (SSR).The mostly regressive methods used in the GRC association identifications are multiple regression analysis (MRA), also including single linear regression analysis and general and mixed linear model.Sampling accessions with different phenotypic traits shown in Table 1 are selected from (i) different genotypes of one species growing in different regions, (ii) different genotypes of multiple species and (iii) different individuals and clones of the same genotype of one species.

HOW TO DO GRC MARK-TRAIT ASSOCIATION IDENTIFICATION
The linkage-based QTL development is the selection of two parents that differ markedly in a particular quantitative character and then the determination of associations between markers and that character in F 2 or backcross progeny (Figure 1).In contrast, the GRC marker development is based on the phenotypic evaluation for traits of germplasm and associations between DNA fragments and traits can be detected without the construction of linkage-map based on the planned mapping populations (Figure 1).The time and cost used in the GRC analysis are significantly lower than that of the linkage-based QTL One S and two F markers associated with PH; six S, three A and four F for DF; three S and one A for DM; one S, two A and three F for TN; four S, three A and one F for FLA; two S and one F for PL; one A and two F for SL; four S and five F for SP; two S, one A and seven F for FL; two S, one A and six F for GR; one S and three F for BY; three S and one F for GY; four S, one A and one F for HI; two F for GW development.
Well then, how do we conduct the GRC marker-trait association identification?We should first select optimal accessions from germplasm and evaluate the phenotypic values of interesting traits per accession and then amplify each accession using optimal molecular marker.After then, we do marker-trait association identification by the GRC analysis and then test identification efficiency of markers associated with traits (see below).In this process, marker-trait associations may be screened using the MRA approach: each quantitative trait is treated as a dependent variable and the various marker genotypes (scored as 1 for presence and 0 for absence) as independent variables.The analysis is based on the model: which relates the variation in the dependent variable (Y = accession means for a quantitative trait) to a linear function of the set of independent variables m j , representing the markers.The b j terms are the partial regression coefficients that specify the empirical relationships between Y and m j ; d represents the accessions residual which is left after regression and e is the random error of Y that includes environmental variation (Virk et al., 1996;Kar et al., 2008).F-values with P-values between 0.045 and 0.099 were used to enter and remove independent variables from the regression equation, respectively (Affifi and Clark, 1984;Roy and Bargmann, 1957).R 2 denotes the square of r, the correlation coefficient.Selected markers should be further tested with linear curve fitting, using linear models for confirming the significance of β-statistics for each band identified by MRA.Beta can be defined as standardized regression co-efficient = BSx/Sy, where B is the regression coefficient or slope and Sx and Sy are the standard deviations of independent (x) and dependent (y) variables (Affifi and Clark, 1984;Kar et al., 2008;Ruan et al., 2009).This method provides maximum likelihood estimates of relationships between individual quantitative traits and various markers, and has been successfully used in some studies shown in Table 1.

FACTORS THAT AFFECT GRC MARKER-TRAIT ASSOCIATION IDENTIFICATION
The success of MAS/QTL basked on linkage map is determined by availability of mapping populations and tight linkage between markers and QTLs.Therefore, the essential validation of the GRC marker-trait association identifications should largely depend on the selection of optimal germplasm materials and molecular markers and the test of identification efficiency of markers associated with traits.First, we should know how to select optimal germplasm as sampling accessions in the GRC association analysis.Based on the schemes of MAS reviewed by Kearsey and Farquhar (1998), Asins (2002), Collard et al. (2005), Mohler and Singrun (2005), Holland (2007), Collard and Machill (2008), Hospital (2009) and this paper, ideal accessions used in GRC marker-trait association identification should have (i) obviously phenotypic difference in interesting traits, (ii) widely genetic diversity and (iii) optimal genetic background.If there are obvious differences in phenotypic traits among different individuals belonging to one genotype or different genotypes of one species, which grow in the same or different regions, these samples are suitable for the GRC analysis, but we should pay attention to intraspecific genetic diversity.This sampling strategy was used in some GRC association studies (Virk et al., 1996;Roy et al., 2006;Vijayan et al., 2006;Shalini et al., 2007;Wang, 2007;Achleitner et al., 2008).On the other hand, accessions from different species belonging to the same genus in some studies (Kar et al., 2008;Ruan et al., 2009) were used for the GRC analysis.In this case, optimal genetic background should be noticed to avoid the frequency of specific alleles in multiple germplasms that may be too low to detect their effects (Maureira-Bulter et al., 2007).Secondly, we should know how to select optimal molecular markers in the GRC association analysis.
Traditionally, morphological markers have general demerits that reduce their usefulness.These include delay of marker expression that results in late development of the organism, dominance, deleterious effects, pleiotropy, confounding effects of genes that are unrelated to the gene or trait of interest, but which also affect the morphological marker (epistasis), rare polymorphism and highly influencing environmental conditions.To avoid problems specific to morphological markers, molecular markers with various qualities have been developed (Teixeira da Silva et al., 2005).They are highly polymorphic, with simple inheritance (often co-dominant), and occur abundantly throughout the genome; they are easy and fast to detect, have minimum pleiotropic effect and detection that is not dependent on the developmental stage of the organism.Well then, what markers are optimal for MAS? Important properties of ideal molecular markers associated with traits of interest should include (i) easy recognition of all possible phenotypes (homo and heterozygotes) from all different alleles, such as sequence-related amplified polymorphism (SRAP) and SSR markers, (ii) measurable differences in expression between trait types and/or gene of interest alleles, early in the development of the organism, (iii) no effect on the trait of interest that varies depending on the allele at the marker loci, (iv) low or null interaction among the markers allowing the use of many at the same time in a segregating population and (v) abundance in number and high polymorphism.
However, so far, most types of molecular markers (for example RFLP, RAPD, ISSR, AFLP and SRAP), though nowadays PCR-based, are still too impractical to be used in large-scale MAS schemes due to the complexity of the assay preventing the appropriate automation, insufficient robustness or inadequate level of detected polymorphism (Koebner and Summers, 2003).The most widely used markers in major cereals are SSR markers (Collard and Mackill 2008), which are highly reliable (that is reproducible), have co-dominant inheritance, relatively simple and cheap to use and generally highly polymorphic; but SSR markers require a substantial investment of time and money to develop and plants with developed SSR markers are still very limited, especially for orphan crops and long-juvenile woody species.Due to high polymorphic information content, sequence-tagged site (STS), sequence characterized amplified region (SCAR) that are derived from specific DNA sequences of markers (for example RAPD, RFLP, AFLP, ISSR and SRAP) and single nucleotide polymorphism (SNP) markers, are presently the most appropriate marker class for MAS (Sanchez et al., 2000;Sharp et al., 2001;Mohler and Singrun, 2005;Collard and Mackill, 2008).For orphan crops and woody species that have no genetic information used to develop STS and SNP markers, SCAR marker still will be a potential marker for their MAS in the future.
Finally, the testing of identification efficiency of markers associated traits is one key step for the MAS.Identi-fication percentages of two SCAR markers transferred from ISSR BFLI-3 and RAPD BFL-16 that are associated with fiber birch length were 76 for short fiber (Xia et al., 2008) and over 92% for long fiber in birch (Wang et al., 2008), respectively.In four SSR markers associated with fiber birch length of birch, PB15 M3 could identify as high as 75% long fiber individuals (Wang, 2007).

FUTURE PROSPECTS
Although the extent of MAS use will depend on available resources and may be delayed in less-developed countries, the greater adoption of MAS in the future is inevitable (Collard and Mackill, 2008), especially for orphan crops and long-juvenile woody plants.The cost of genotyping (an example of a molecular marker assay) is reducing while the cost of phenotyping is increasing particularly in developed countries, thus increasing the attractiveness of MAS as the development of the technology continues.MAS will then increasingly be applied to obtain improved efficiency and effectiveness in the selection of genotypes with traits that are difficult and expensive to phenotype, for the pyramiding of disease resistance genes in single genotypes and for the carefully directed choice of parental lines in crossing programs allowing a controlled combination of alleles targeted for selection (Mohler and Singrun, 2005).
The tight linkage between markers with genes/QTLs is critical to the success of MAS.Identification of genes/ QTLs at distances over 2 cM from the closest markers is hardly suitable either for MAS or for identification/cloning of functional genes (Mohler and Singrun, 2005), however, in some cases, recombination occurs between the marker and genes/QTL due to loose linkage (Collard and Mackill, 2008).In addition, even as Virk et al. (1996) suggested that since the success of MAS programs depends exclusively on the extent of genetic linkage between markers and the relevant loci such as QTL, there is no reason why the same principles cannot be applied to either natural populations or genetic resources generally, assuming that similar associations are observed between marker loci and the various allelomorphic forms of QTLs and that the basis of these is in fact genetic linkage.Therefore, the GRC molecular marker-trait association identification will provide several potentials for the improvement of quantitative traits in orphan crops and long-juvenile woody plants, including marker-assisted evaluation of breeding material or germplasm, early selection of desirable traits and effective selection of putative parents for producing populations to map QTLs for a particular trait or cloning particularly functional genes.
First of all, use of new crops and woody plants, which can grow in the areas of mounting water scarcity, environmental degradation, increasing pollution and inevitable emergence of new biotypes of pathogens and pests, is increasing, accompanying the requirement of increased crop production from a rising global population but the declining rate of increase in crop yields (Collard and Mackill, 2008).However, the linkage-based identification of markers associated with genes/QTLs often needs to be too long-time for these orphan crops and woody plants with high heterozygosity.For example, in general at least 9 -10 years are required for mulberry to express all its traits to its full potential (Vijayan et al., 2006).Meantime, selection of individual genotypes for mapping of traits in bi-parental populations will have sampled only a small portion of the resident variation within the original germplasm and may have reduced the odds of finding regions affecting the traits of interest (Maureira-Bulter et al., 2007).Hence, for orphan crops and long-juvenile woody plants with strong tolerance to harsh environments, in which most of them have no genetic information such as linkage maps and Quantitative Trait Loci, marker-assisted evaluations of breeding material or germplasm by the GRC association studies should allow MAS to become more widely applicable for their breeding programs, especially for screening germplasm with desirable traits.
Secondly, the early selection of desirable traits that is expressed late in plant development, like fruit and flower features or adult characters with a juvenile period, so that it is not necessary to wait for the organism to become fully developed before arrangements can be made for propagation.For long juvenile and long-lived woody plants with heterozygosity, progeny selection is always a difficult process as the expression of most of the agronomical traits varies according to the stages of development (Vijayan et al., 2006).Hence, the GRC molecular markertraits association identification could be of great use for the breeder to identify promising seedling/hybrids at an early stage of these plants.For example, sea buckthorn is highly heterozygous and has a juvenile period from 3 -5 years.Dried-shrink disease (DSD) is a dangerous pathogen that destroys this species and halts commercial production, but it is always a difficult process for sea buckthorn to select materials with DSD-resistance in conventional breeding programs.This is because DSD infects plants that are a minimum of three years old, whereas all plants infected by DSD are more than 3 years old and in general at least 3 -5 years are required for the sea buckthorn plant to express symptom of DSD to its full potential (Ruan et al., 2008).The QTL linkage map is unknown for sea buckthorn, thus construction of mapping populations (even F 1 crosses) is time-consuming and labor-intensive.Hence, the ISSR markers associated with DSD resistance identified by GRC analysis (Ruan et al., 2009) could be used to eliminate futureless seedling/hybrids at an early stage: if the markers associated with high disease index were identified in the seedlings, it should fall into disuse to prevent great environmental and commercial loss because of the death of sea buckthorn plantations from DSD-infection.
Finally, if genetic linkage is the main cause of the associations, then one benefit of obtaining information about Ruan 579 molecular markers and quantitative traits could be for use in more efficiently selecting putative parents for producing populations to map QTL for a particular trait and for cloning particularly functional genes.For example, mapping population used to map QTL of DSD-resistance in sea buckthorn may be constructed by selection parents which genomes cover the molecular markers associated with DSD-resistance and susceptibility respectively.
In conclusion, the apparent advantages of the GRC marker-trait association identification are (i) that this could allow the detection of QTL that varies across a wide spectrum of biodiversity rather than just between two planned parental genotypes; (ii) that QTL for any quantitative trait can be studied in the same investigation and (iii) that this requires less inputs of time, labor and financial resources, compared to the linkage-based QTL identification.The GRC marker-trait association identification will play an import role in plant MAS/QTL breeding programs, especially in orphan crops and long-juvenile woody plants with heterozygosity when no other genetic information such as linkage maps and Quantitative Trait Loci are available.However, in this review we also do not try to minimize successes of molecular genetics by laying emphasis on quantitative trait loci (QTL) identification based on the linkage-map, in which many major genes have been identified and successfully used in the MAS for the improvements of quantitative traits in major crops (Hansen et al., 2008;Hospital, 2009;Jannink et al., 2009).

Table 1 .
Molecular marker-trait association identifications by the combination between germplasm and regression technique in plants.