Molecular profiling of an interspecific rice population derived from a cross between WAB 56-104 ( Oryza sativa ) and CG 14 ( Oryza glaberrima )

International Crops Research Institute for the Semi-Arid Tropics, P. O. Box 39063, Nairobi, Kenya. Africa Rice Center (WARDA), 01 BP 2031, Cotonou, Benin. 3 IRD/CIAT (Institut de Recherche pour le Développement/International Centre of Tropical Agriculture), A.A. 6713, Cali, Columbia. 4 Forum for Agricultural Research in Africa, PMB CT 173 Cantonments, Accra, Ghana Plant breeding, 162 Emerson hall, Cornell University, Ithaca New York 14 853 – 19 01.


INTRODUCTION
Rice is the staple food of nearly one-half of the world's population, and contributes over 20% of the total caloric intake of humans (Bhattacharjee et al., 2002).The genus Oryza consists of 22 wild and 2 cultivated species -Asian rice (O.sativa) and African rice (O.glaberrima).Long before Asian rice was introduced to Africa, local farmers in West Africa had domesticated O. glaberrima.However, *Corresponding author.E-mail: srm4@cornell.edu.Tel: (607) 255 0420.Fax: (607) 255 6683.cultivation of O. glaberrima is being increasingly replaced by O. sativa varieties due to low yield-performance, high shattering and susceptibility to lodging of most O. glaberrima cultivars (Jones et al., 1997).However, O. sativa varieties often do not withstand the range of abiotic and biotic stresses found in Africa.O. glaberrima germplasm harbors a reservoir of genes that have allowed the species to survive and prosper in West Africa for more than 3500 years (Bidaux, 1978;Carpenter, 1978).These genes confer valuable traits, including (i) rapid and profuse vegetative growth coupled with droopy lower leaves (Jennings et al., 1979;Koffi, 1980) that contribute to weed competitiveness (Fofana et al., 1995); (ii) moderate to high levels of resistance to blast (Silue and Notteghem, 1991), rice yellow mottle virus (Attere and Fatokun, 1983;John et al., 1985), rice gall midge (Alam, 1988;Sauphanor, 1985) and nematodes (Reversat and Destombes, 1995); (iii) reasonably good levels of tolerance to acid soils, iron toxicity and drought (Sano et al., 1984;Jones et al., 1994).Weed competitiveness is particularly important for West African rice production since weeding is usually done manually and effective weed removal is nearly impossible due to labor shortages and competition with other activities.
Backcross breeding aims to introgress one or more genes from a donor parent into the background of an elite variety.This offers a way to retain the favorable qualities of a good variety while substituting unfavorable alleles for desirable traits such as resistance to disease and pests, tolerance to environmental stresses, enhanced nutritional quality, etc. Donor materials may be selected from either domesticated or non-domesticated germplasm sources.Targeting the numerous constraints that limit yields of upland rice (drought, insects and diseases, weeds and low inputs), WARDA breeders have succeeded in generating fertile interspecific (O.sativa x O. glaberrima) lines by first crossing the two cultivated rice species and then backcrossing the F 1 hybrids to their O. sativa parents.Such an approach has helped to improve the agronomic performance of O. sativa varieties by introgressing monoor oligogenic trait(s) from O. glaberrima.These new interspecific lines are inbred progeny collectively called NERICA, which is an acronym for New Rice for Africa.
In the early 1990s, WARDA developed over 300 interspecific BC 2 inbred lines from a cross between an upland O. sativa tropical japonica variety, WAB 56-104, as the recipient parent, and an O. glaberrima variety, CG 14, as the donor parent.Seventy of these interspecific lines were selected for detailed studies using microsatellite markers.These included seven named varieties (NERICA 1 to NERICA 7) released by WARDA, as well as several other lines that have been advanced to the Participatory Varietal Selection (PVS) stage.The most popular NERICA varieties combine the best traits of both parents: high yield, presumably derived from the O. sativa parent and the ability to thrive in harsh environments from the O. glaberrima parent (Jones et al., 1997;www.warda.org).The NERICA varieties offer real hope for improving the productivity, profitability, and sustainability of rice farming in sub-Sahara Africa.Semagn et al. (2006) previously investigated genetic variation and patterns of relationships among NERICAs 1 to 18.That study revealed (i) a wide range of genetic variability in all except NERICA 8 and 9, and (ii) distinct separation of NERICA 1 to 7 from NERICA 8 to 18.However, they did not examine the proportion and distribution of O. glaberrima introgressions in the larger collection of 70 sibling lines which are of great interest for future breeding efforts.Semagn et al. 2015 Using graphical genotypes to estimate the proportion of donor genome in a recurrent parent background using molecular marker data was first explored by Young and Tanksley (1989).Since then, graphical genotyping has been used for a variety of different purposes: (a) to determine whether there are key areas of the genome that are critical to certain varieties, (b) for identifying specific regions of the genome associated with desirable traits, (e.g., McCouch et al., 1997;Hayano-Saito et al., 1998;Foolad et al., 2001), (c) for developing highly informative genotyping sets (e.g., Macaulay et al., 2001;Coburn et al., 2005) and (d) for tracing the inheritance of specific genomic regions through a pedigree or in a set of related lines.In the present study, 70 introgression lines were analyzed to determine (a) the extent and map position of introgressions from O. glaberrima, (b) the extent of heterozygous loci and non-parental alleles contained in the interspecific inbred lines, and (c) the genetic relationships among the lines, with particular interest in evaluating the potential breeding value of the lines that have not been released as varieties.

Plant materials
Seventy interspecific lines (Table 1) were developed by crossing a tropical japonica variety of O. sativa, WAB 56-104 as recurrent parent and an O. glaberrima variety, CG 14, as the donor parent.WAB 56-104 is an upland variety bred at WARDA with several desirable agronomic traits, such as high yield, short growth duration and plant height adapted to upland conditions.CG 14 is a variety from the Casamance (Senegal) and it has low yield potential due primarily to grain shattering and susceptibility to lodging (Jones et al., 1997).However, CG 14 has several useful traits, including long panicle length and high weed-competitiveness as a result of early vigor and high tiller number (Fofana et al., 1995;Dingkuhn et al., 1997).Twenty six of the lines were developed as double haploids (DH) derived from BC2F1 plants using anther culture (Guiderdoni et al., 1992).The rapid genetic fixation achieved in the DH lines is expected to retain blocks of genes that would have been lost through conventional inbreeding and artificial selection (Jones et al., 1997).The remaining 44 samples were BC2F8 lines developed using pedigree selection (Jones et al., 1997).All 70 lines in this study had gone through two generations of backcrossing followed by either eight generations of selfing or doubling the haploid chromosome numbers and all behaved as inbred introgression lines.

Genotyping
DNA was extracted from 4-week-old seedlings using a Cetyl trimethylammonium bromide (CTAB) protocol as described by Saghai-Maroof et al. (1984).The isolated DNA was dissolved in double distilled water.To estimate DNA concentration, the fluorescence of DNA samples was compared with a 1 kilobase pair (kb) DNA ladder (Life Technologies) after running them on a 1% agarose gel that contained 0.5 µg/ml ethidium bromide solution.Polymerase chain reactions (PCR) were performed in a total reaction volume of 20 µl that consisted of 20 ng DNA, 1X PCR buffer (50 mM KCl, 1.5 mM MgCl2, 0.001% gelatin, 50% glycerol and 10 mM Tris-HCl, pH 8.3), 0.4 µM of each of the forward and reverse primers, 200 µM each dNTP (Boerhinger Mannheim) and 1 Unit Taq DNA polymerase (Perkin-Elmer).Amplifications were carried out in MJ PTC 100/96 thermal cyclers using the following programs: 5 min at 94 o C followed by 35 cycles of 1 min 94°C, 1 min 55°C and 2 min at 72°C, with a final extension of 5 min at 72°C as described by Temnykh et al., 2001.Amplified products were separated on 5% denaturing polya-crylamide gels and stained with silver nitrate following the silver staining protocol (Panuad et al., 1996) (Promega).One hundred and sixty four microsatellite or simple sequence repeats (SSR) primers were selected based on their position on an indica x japonica genetic map (Temnykh et al., 2001) and used to screen for polymorphism in the present study.

Data scoring and statistical analyses
Microsatellite markers were scored as follows: '1 or A' for lines with donor DNA (CG 14); '2 or B' for lines with the recurrent parent DNA (WAB 56-104); '3 or H' for heterozygous lines with alleles from both parents; '4 or U' for lines that contained non-parental alleles (alleles present in the introgression lines but not found in either parent) and '5 or M' for missing data.Three different statistical analyses were performed.First, the pro-portion of the donor and recurrent parental genome was calculated using the software package GGT (an acronym of Graphical GenoTypes; http://www.dpw.wau.nl/pv/pub/ggt(Van Berloo, 1999).This approach considers distances between markers in such a way that a chromosomal segment flanked by two markers derived from the donor parent (D'D") is considered to contain 100% donor DNA across the region, while a chromosomal segment flanked by two markers of recipient type (R'R") is considered as 100% recurrent parent DNA, and a chromosomal segment flanked by one marker of donor type and one marker of recipient type (DR) is considered to be a recombinant, with donor DNA extending approximately half way across the interval and recurrent parent DNA across the other half (Young and Tanksley, 1989).Single-tailed Student's tests were performed to compare the mean introgression of (a) the 7 NERICAs with the other 63 sister lines, and (b) the double haploid lines with those developed by pedigree selection.Second, simple matching coefficients (the ratio of number of matches to total number of markers) were calculated as a measure of genetic similarity between lines and used to generate a dendrogram using the complete-link method of SAHN clustering.This similarity coefficient was selected due to the qualitative and multi-state nature of the microsatellite data.Third, principal component analysis (PCA) was used to investtigate the overall variation and patterns of relationship among the lines.For PCA, the data were used to derive correlation matrices from which the principal components (PCs) were extracted and projected in two dimensions (EIGEN, PROJ, and MXPLOT programs).Both cluster and principal component analyses were performed using NTSYS-pc for windows, version 2.0, Exeter Software (Rohlf, 1998).

Polymorphism and parental genome coverage
An initial polymorphism survey was conducted using DNA from the two parents .One hundred and thirty of the 164 SSR primers (79.3%) screened were polymorphic between the two parents.The number of polymorphic markers per chromosome varied from 8 on chromosomes 7 and 12 to 15 on chromosome 1 (Figure 1), and the overall average was 10.8 polymorphic markers per chromosome.The 70 lines were then genotyped with the 130 polymorphic SSR markers.Introgressions were detected in all individuals and on all chromosomes.O. sativa alleles were detected at all 130 marker loci in one or more individuals but only 57 markers (43.8%) showed introgressions from O. glaberrima.When the data from the 130 markers were used to estimate the proportion of each parental genome in the 70 individuals, O. glaberrima DNA represented from 0.9 to 12.1% of the genome (Table 1; Figure 2) while O. sativa represented between 79.0 to 94.4%.The average proportion of the genome containing O. glaberrima alleles in the 70 lines was 6.3% (108 of 1,725 centiMorgans; abbreviated as cM) while it was 87.4% (1507.9 of 1,725 cM) for the O. sativa parent.Heterozygosity was observed at 19 marker loci in a total of 28 lines (40%).The frequency of heterozygosity ranged from 0.3 to 3.4 loci/line, and the average was only 0.4% per line.
Non-parental alleles were detected at 20 SSR loci (15.4%) in one or more of the 58 lines (82.9%).The frequency of non-parental alleles among the 58 lines varied from 0.4 to 5.5% non-parental alleles/line, with an average of 2.7%.When the genomic composition of the 44 lines developed through pedigree selection was compared with those of the 26 double haploids, there was significantly higher (p < 0.05) representation of O. glaberrima introgressions in the pedigree lines (6.7%) than in the double haploids (5.5%), but the proportion of recurrent parent (O.sativa) genome remained identical (87.4%).More significantly (p < 0.001), the number of loci containing non-parental alleles was twice as high in the pedigree lines (2.7%) compared to the double haploids (1.3%).The mean proportion of loci showing introgression of O. glaberrima in the seven NERICAs (NERICA 1 to NERICA 7) was 8.2%, which is significantly (p < 0.05) higher than in the 63 sister lines (6.0%).However, the proportion of non-parental alleles and of recurrent parent genome in the 7 NERICAs were not different from their sister lines.
The distribution of introgressions varied among the 12 rice chromosomes.The chromosomes with the fewest O. glaberrima alleles (25% of SSR loci) and those with the highest proportion of O. glaberrima alleles (87.5% of markers) were chromosomes 3 and 12, respectively.When the map distances (cM) between markers were considered as the basis for estimating the extent of O. glaberrima introgressions on each chromosome, chromosome 3 had the smallest amount of introgressed DNA (2.5%) while chromosome 6 had the highest (21.5%), and the overall average size of introgressed DNA per chromosome was 7.5% (Figure 1).The proportion of recurrent parent (O.sativa) genome varied from 64.3% on chromosome 6 to 94.7% on chromosome 11.Eight of the 12 chromosomes (all except chromosomes 3, 5, 9 and 12) contained heterozygous loci and the average heterozygosity for the 8 chromosomes varied from 0.2 to 1.5%.Nonparental alleles were also observed in 8 of the 12 chro- mosomes (all except chromosomes 5, 7, 10 and 11).The highest proportion of non-parental alleles (5.2%) was observed on chromosome 6 where the average across all 70 lines was 1.4% of loci.The proportion of non-parental alleles across chromosomes showed a highly negative correlation (r = -0.72)with the proportion of recurrent parent (WAB 56-104) genome.

Genetic relationships among lines
Cluster analysis using the simple matching coefficients derived from SSR markers produced two major groups, with six sub-groups observed in the dendrogram in Figure 3.The seven released NERICA varieties belong to one major group (group-1), with NERICA 6 being the most genetically divergent.Principal component analysis (PCA) also revealed two major groups.As shown in Figure 4, a plot of PC1 (12%) and PC2 (7%) clearly separated the two groups in the same way as the dendogram.There were two differences between the dendogram and principal component analysis: (i) NERICA 3 was intermediate between the two groups in the PCA while NERICA 6 was the outlier in the dendrogram, and (ii) the six subgroups from the dendrogram were not evident in the PCA.

DISCUSSION
Hundreds of microsatellite markers have been incorporated into rice genetic maps constructed using both intraspecific and interspecific populations (McCouch et al., 1988;Causse et al., 1994;Chen et al., 1997;Cho et al., 2000;Lorieux et al., 2000;Temnykh et al., 2001).Lorieux et al. (2000) constructed an interspecific O. sativa subsp.indica x O. glaberrima genetic linkage map and reported very good colinearity between the genetic maps of O. sativa x O. glaberrima and indica x japonic crosses, and about the same map length for both the intraspecific and interspecific crosses.For the present study, a total of 164 SSR primers were initially selected according to their Table 1.Pedigree and donor genome coverage (introgression) of the 70 lines developed using an O. sativa variety WAB 56-104 as recurrent parent and an O. glaberrima variety CG 14 as donor parent.The twenty six lines in boldface were double haploids derived from BC2F1, while the remaining were BC2F8 lines developed using repeated selfing and pedigree selection.position on the genetic map for the indica x japonica population (Temnykh et al., 2001).The 130 microsatellite markers used in the present study were well-distributed along the 12 chromosomes (Figure 2).However, there were a few intervals with uneven distribution of markers, primarily due to the lack of polymorphic markers within those intervals.

Pedigree
Recurrent backcrossing programs are planned on the assumption that the proportion of recurrent parent genome is recovered at a rate of 1-(1/2) t+1 for each of t generations of backcrossing (Babu et al., 2004).Grap-hical representations of molecular data are very useful in determining the proportion and location of introgressions along the chromosomes (van Berloo, 1999).Molecular markers enhance the ability of breeders to determine the inheritance and parentage of specific genomic regions and to monitor the introgression of specific chromosome segments that are linked to desirable traits in breeding lines.
In the present study, the average proportion of recurrent parent DNA in the interspecific lines was consistent with what is expected for a random set of BC 2 lines (87.5%) suggesting that there was no significant selection against the O. glaberrima genome.However, the proportion of introgressed O. glaberrima DNA (6.3%) was half of what would be expected in a random set of lines (12.5% expected in the BC 2 generation) (Ribaut and Hoisington, 1998;Babu et al., 2004).The presence of non-parental alleles (2.2%), heterozygosity (0.4%) and missing data (3.7%)explains this discrepancy in the interspecific lines.
A total of 58 out of 70 interspecific lines (82.9%) contained at least one non-parental SSR allele.Non-parental alleles may be due to genetic mixtures in the parental materials used in crossing, undocumented outcrossing during generation advance or spontaneous mutations at SSR loci.The highest frequency of non-parental alleles was observed on chromosome 6 (Figure 1), near a known sporogametophytic sterility gene in the vicinity of the waxy starch synthase gene (Sano, 1990;Lorieux et al., 2000;Heuer and Miezan, 2003).Sterility association  1.
with this locus could affect the production of viable pollen within a line and would tend to promote outcrossing; as a result, gametes that did not contain O. glaberrima introgressions in this region of chromosome 6 would show preferential survival in the population.The ability to trace regions of chromosomes from parents to offspring through multiple generations provides valuable information for breeders.Since NERICA 1 -NERICA 7 were selected for variety release from among the 70 lines developed from the same parents, this study compared the average introgression in these seven varieties (Figure 2) with those in other sister lines and found that the released varieties contained significantly more O.glaberrima DNA than the other lines.Based on results from field evaluation for phenotypic traits and participatory varietals selection, the present study suggests that the presence of specific glaberrima introgressions is associated with superior performance in the field.This hypothesis, however, remains to be tested using QTL and association analysis to identify which specific segments of the O. glaberrima genome are associated with superior agronomic performance in the upland rice production system.
The 7 NERICA varieties represented only one of the two major groups revealed both by the cluster and principal component analyses (Figures 3 and 4), suggesting that the other lines may provide an opportunity for selection and additional varietal development.The possibility for further selection and varietal development within group-2 will be highly dependent on the availability of reliable morphoagronomic data from multi-location trials.Our study is the first attempt to characterize the introgression of chromosomal segments among a diverse set of interspecific lines derived from the two cultivated rice species and provides valuable information for breeders.

Figure 1 .
Figure1.Pie charts for 12 rice chromosomes depicting the proportion of genome introgression in an interspecific rice population derived from CG 14 (donor) and WAB 56-104 (recurrent) parents.The pie charts were plotted from the graphical genotyping analyses outputs; the numbers in the center of the pies correspond to the number of SSR markers used in the study.

Figure 2 .
Figure 2. Graphical genotypes of the 13 interspecific lines using 130 microsatellite markers.Vertical bars represent the 12 chromosomes of rice, with chromosome number given on the left side of each bar; Each chromosome is segmented by horizontal lines at the marker positions.Numbers on the top of the vertical bars refer to the 13 lines: N1 to N7 refers to NERICA 1 to 7 (e.g., N1: NERICA 1; N7: NERICA 7); line number 30, 69 and 70 had the lowest number of introgressions while line 4, 48 and 58 contained the highest number of introgressions.Refer to Table 1 for pedigree for each line.Legend: A: CG14; B: WAB56-104; H: heterozygote; M: missing data; U: non-parental alleles.

Figure 4 .
Figure 4. Score plot of the first two principal components from principal component analysis of the 70 interspecific lines genotyped with 130 microsatellite markers.Numbers in the plot correspond to the names as shown in Table1.
Dendrogram of the 70 interspecific lines using simple matching coefficient derived from 130 microsatellite markers.The lines were separated into two major groups and six subgroups c.Numbers correspond to the column "Name" as shown in Table1.All lines within group-2, except the five lines indicated by arrows, contained fewer introgressions than the 6.3% average for the population.