Genetic diversity in South African maize ( Zea mays L . ) genotypes as determined with microsatellite markers

One thousand and forty three (1043) maize genotypes including white and yellow maize inbred lines as well as hybrids from the public germplasm collection were characterized with 80 microsatellite markers distributed throughout the genome. A total of 1874 alleles were amplified and used in the genetic diversity analysis. Principal coordinate analysis confirmed the geographical distribution of the breeding lines. Cluster analysis using Rogers distance measures placed the breeding lines in several clusters and corresponded well with known pedigrees. Lines with mixed origin were classified in separate clusters and duplicate entries in the collection were identified. These mixed lines could not be placed in known heterotic groups, but could rather be used to identify new groups to be used in the breeding program. The genetic distances determined in the study can be applied to plan a more focused breeding program.


INTRODUCTION
Maize is one of the most widely grown crops worldwide and certainly an important staple food in the African diet.It is used as human food, animal feed and provides an important source of income and employment for a large proportion of the population.Maize breeding is monopolized by large multi-national private companies, with little attention given to more specific needs of local markets.The public sector breeding program is essential in breeding and providing varieties for this niche market as well as providing inbred lines to local private companies.Due to ever decreasing research funds, it is however, essential to scale down the amount of testcrosses and the size of current breeding programs.
Several heterotic groups are currently identified and used in the local breeding program.The locally *Corresponding author.E-mail: charlotte.mienie@nwu.ac.za.Tel: +27182992319.is being used extensively in crosses.The I heterotic group is considered to be the most advantageous for yield adaptation in South Africa.In hybrid combination with the I-group, the US Cornbelt lines, related to the Lancaster line, Mo17, seems to be the preferred choice.Smaller heterotic groups include the K64 group.
Genetic diversity among inbred lines has traditionally been based on morphological data such as endosperm type, pedigree record and amount of heterosis expressed by the hybrid.This method has several limitations as morphological characters do not always reliably portray genetic relationships due to environmental interactions.
Testcross designs with numerous testers are extremely expensive, labour intensive and require large-scale field evaluations.Molecular analyses can lower the cost involved in plant breeding.It also provides a means for determining the purity of hybrids produced in the seed industry and can save seed companies and farmers money in terms of ensuring the quality of the seed.
Knowledge of the relationships among lines would also help identify a set of inbreds that have maximal diversity for the analysis of the effects of genetic background (Liu et al., 2003).The ideal marker system is highly polymorphic, co-dominant, accurate, reproducible, high-throughput and low cost (both in terms of capitalinvestment and cost per assay).For decades, simple sequence repeats (SSR) or microsatellites, have been the genetic markers of choice, because they are economical to score, have high allelic diversity and are usually selectively neutral (Smith et al., 1997).Large numbers of single nucleotide polymorphisms (SNPs) are already available and have been used extensively in genotyping maize.However, it is still not cost effective in South Africa to genotype large numbers of inbreds for the purpose of association analysis and it is thus necessary to identify a set of lines that capture the maximum number of alleles or haplotypes and determine population structure.Hamblin et al. (2007) compared the use of SSR and SNP markers in maize.
The different mutational properties of these two classes of markers resulted in differences in heterozygosities and allele frequencies, but SSRs performed better in clustering germplasm into populations.SSRs provided more resolution in measuring genetic distance based on allelesharing.Their results suggested that large numbers of SNP loci would be required to replace highly polymorphic SSRs in studies of diversity and relatedness (Hamblin et al., 2007).The relationship between mid-parentheterosis of single-cross hybrids and the genetic distance of their parental inbreds, determined with molecular markers, were investigated both in theory (Charcosset and Essioux, 1994) and numerous experiments with maize and other crops (Brummer, 1999).Xia et al. (2004) concluded from their study of CIMMYT sub-tropical maize inbreds that the SSR based genetic distances, calculated with a modified Roger's distance measure, in combination with field evaluations, provided a solid basis for the detection of heterotic groups.The objectives of our research were to: 1) fingerprint local maize genotypes according to the most appropriate and cost-effective procedure for determining genetic distances of breeding lines and 2) compile a database of local maize breeding material according to DNA data.

Plant material
The pedigrees of hybrids studied in this paper are commercial intellectual property.Therefore, detailed information is not allowed.Seeds from breeding lines and hybrids were obtained from plant breeders at ARC-GCI in Potchefstroom and Cedara, South Africa and planted in small pots for germination.Young leaves from approximately 20 plants from each line were harvested and two bulks made up from equal amounts of leaf material from ten plants each.We assumed that the bulk sample will capture all heterozygosity present in the genotypes.Leaf material was freeze dried and ground to a fine powder.

SSR genotyping
DNA was extracted from each bulk sample using a modified CTAB extraction technique (Saghai-Maroof et al., 1984) and diluted to 50 ng/µl.Microsatellite marker genotyping involved the use of 96 fluorescently labelled SSR selected from published markers (www.agron.missouri.edu/ssr.html)to be evenly distributed throughout the maize genome (Figure 1).
The SSR primers were labelled with four different dyes [6FAM (blue), HEX (green), NED (yellow) and PET (red)] and combined in 12 multiplex groups (Table 1) with each containing seven to ten primer pairs according to colour and size avoiding overlapping of the same colour.PCR amplifications were carried out on a Techne 384-well thermal-cycler in a final reaction volume of 5 to10 µl, containing 50 ng genomic DNA, 1X multiplex PCR master mix (Qiagen, Valencia, USA) containing HotStarTaq DNA polymerase, multiplex PCR buffer, 2 mM MgCl 2 , 250 mM of each dNTP (dATP, dCTP, dGTP, dTTP), and 0.05 µM of each multiplex fluorescent primer mix.The PCR products were diluted 1:20 and 1 µl was added to HiDi-formamide (Life Technologies) containing 0.12 µl GeneScan-LIZ500 as size standard and electrophoresed on an ABI Prism3130xl sequencer.Fragments were analyzed with Gene mapper 4.0 (Applied Biosystems) software and allele sizes was verified.A list of microsatellite loci and their chromosomal locations is given in Table 1.

Data analysis
Genetic distances of breeding lines and hybrids tested were calculated with the Powermarkerver 3.25 (Liu and Muse, 2005) using the Rogers (1972) parameter.The polymorphic information content (PIC) for each marker was determined using Powermarkerver 3.25 software.Average linkage [unweighted paired group method using arithmetic averages (UPGMA,)] clustering was calculated based on Rogers distance (RD) estimates between pairs of inbred lines for the yellow and white lines and hybrids separately.To evaluate the robustness of the UPGMA dendrogram, the cophenetic correlation was calculated (Sneath and Sokal, 1973) utilizing NTSYS ver 2.21 software (Rohlf, 2009).The data was transformed with the dcenter module and Eigenvectors calculated.Principal coordinates analysis (PCoA) was carried out on the calculated distances usingNTSYS ver.2.21 software (Rohlf, 2009).

RESULTS
A total of 96 SSR primer sets were used to fingerprint the breeding lines and hybrid samples.The SSR markers with more than 20% missing values were discarded and data from 80 markers were used in the final analysis.The PIC values of markers varied between 0.2 and 0.9.The average number of alleles per locus was 12.7 for a total of 1043 genotypes tested, with a total of 1874 alleles amplified (Table 1).The number of alleles amplified varied between 6 and 36 per locus for the total population screened.When looking at the separate populations tested, the mean number of alleles per locus was the highest for the yellow maize breeding lines, which also had the highest entries in the screening program (Table 2).
In PCoA of the RD of the breeding lines, the white and yellow groups were clearly separated in two clusters, with relatively few overlapping genotypes (Figure 2).When regarding the two localities where the breeding lines originated, it was clear that the white lines could be separated into three clearly distinguished groups, with the lines from Potchefstroom grouping in two separate clusters.
The yellow breeding lines were much more heterogeneous between the two localities, with only a few lines falling in definite clusters.The lines originating from CIMMYT, Mexico, clustered together with the Cedara material for both white and yellow lines.These lines were used in the Cedara breeding program over a long period of time and therefore the result was as expected.
The white lines were compared to some anchor lines with known heterotic grouping (Figures 3 and  4).The lines with known pedigrees related to K64, clustered together (r=0.91), with only one line with larger RD from the rest of the group.A large number of white lines clustered with the Lancaster line Mo17.The rest of the groups could be separated into distinguishable clusters by the UPGMA derived dendrogram (Figure 4).PCoA and cluster  analysis of yellow breeding lines revealed a small group of lines clustering with Mo17 (Figures 5 and 6) (r=0.72), with the rest of the lines with mixed origin.The cluster analysis (Figure 6) identified several related groups of lines indicating close genetic relationships.These results show the extent to which these lines were interbred in the breeding program over several decades.One hundred and ninety three (193) hybrids were included in the study to explore the diversity achieved and the genotypes were compared to locally cultivated maize cultivars produced by private companies.Five distinct groups of ARC hybrids could be identified.No overlap could be seen between ARC hybrids and those from the private companies tested (Figure 7).Cluster analysis (not shown) revealed several identical hybrids included in the entries (r = 0.88).

DISCUSSION
Maize breeding is a never-ending challenge for developing new inbred lines and cultivars with improved yield, as well as tolerance to a diverse spectrum of biotic and abiotic stress conditions, such as diseases, pests, low soil fertility and drought.New cultivars must also conform to changing industry requirements such as grain quality for milling and ethanol production.Climatic changes such as global warming pose new challenges to breeders for developing cultivars that will perform well under changed conditions.Although seed companies produce their own hybrids, they remain dependent on foundation-seed suppliers for inbred lines and even sometimes locally adapted hybrids.
The ARC-GCI has a large germplasm collection of inbred lines, containing both white and yellow lines, at its disposal.These lines needed to be characterized on a molecular level to determine close relationships or genetic distances to narrow down the possibilities for utilizing this material in testcrosses.It also served the purpose of identifying duplicate lines, as well as genetic drift within well-known breeding lines.
During the past decade, the technology used in fingerprinting of field crops developed from restriction fragment length polymorphism (RFLP) to amplified fragment length polymorphism (AFLP), SSR and SNP.The RFLP technique was laborious and time consuming and only single data points could be analyzed per reaction.The AFLP technique was less time consuming and labour intensive and rendered up to 100 data points per reaction, but it was difficult to duplicate results over time and between different laboratories.The SSR technique is a simple polymerase chain reaction (PCR) technique, which lends itself to high throughput of samples when multiplex reactions are used with an automatic genetic analyzer like the ABI3130xl.The technique is also easy to duplicate among different laboratories.
Although the SNP technique can analyze thousands of data points per run, one has to look at the application that you need the analysis for.The SSR technique can be utilized to give low coverage of the whole genome of maize, giving an overall indication of the genetic material present in a specific population.Knowledge of the relationships among lines can help identify a set of specific inbreds that have maximal diversity for the analysis of the effects of background (Liu et al., 2003), to be used further in experimentation and association analysis of specific traits.
A similar African study was conducted by Legesse et al. (2007), to determine the diversity present in an African breeding population in Ethiopia and Zimbabwe.They found that they could distinguish between their inbred lines with as little as 27 SSR loci, which grouped their lines in groups corresponding to adaptation to different altitudes and consistent with known pedigree information.In our local line collection, the breeding program has used lines from different countries and sources and the end result is a totally different population from the original due to interbreeding between different heterotic groups.New grouping of the breeding lines became of vital importance for improvement of the crop.
SSR markers were utilized in this study in the genotyping of the germplasm collection of the ARC-GCI from two different localities and were used at a density of eight markers per chromosome.The 80 markers utilized amplified a total of 1874 alleles giving an average of 23.4 alleles per locus, which is higher than observed by Van Inghelandt et al. (2010).Results were used to determine the genetic relationships among the inbred lines and     hybrids.Principal coordinates analysis clearly separated white and yellow lines as expected from the pedigree data that were available from the breeding programs.These groups were further analysed separately.In the analysis of white breeding lines based on Rogers estimates, the lines clearly were separated in two different groups with PCoA analysis.A large group of lines were grouped together with Mo17, a Lancaster line, in accordance with pedigree data.All the white lines could not be grouped into specific heterotic groups and some were of mixed origin.These lines were developed through crosses between the different groups in the breeding program and it is thus not possible to group them with currently known heterotic groups.The cluster analysis gave a more detailed grouping of lines (Figure 4): the K-group was derived from K64 crosses.K64 is a Kansas inbred known for fairly good drought tolerance.
The CB group derived from inbreeding US Corn belt hybrids clustered with Mo17, a Lancaster line.A 3rd group was identified as a mixed group containing combinations of Corn Belt material with Sahara or I-group lines as well as lines with maize streak virus resistance.The M37W group was derived from the Australian inbred 21A.Lines originating from CIMMYT also clustered in this group as a sub-group.
The I137TN group is the major South African group known for its superior combining ability for grain yield, especially in combination with US Corn Belt material.The group originated from the locally developed I137TN inbred.I137TN was selfed out of a yellow endosperm local variety cross between the two varieties Teko Yellow x.

Figure 1 .
Figure 1.Maize genomic map indicating SSR markers used in the study (based on IBM2 2008 distances).

Figure 2 .
Figure 2. Principal coordinates analysis of white and yellow breeding lines from the two localities.

Figure 3 .
Figure 3. Principal coordinates analysis of white breeding lines compared with anchor lines of known heterotic groups.

Figure 4 .
Figure 4. Associations among white inbred lines revealed by average linkage (UPGMA) cluster analysis based on Rogers distance.

Figure 5 .
Figure 5. Principal coordinates analysis of yellow breeding lines compared with anchor lines of known heterotic

Figure 6 .
Figure 6.Associations among yellow inbred lines revealed by average linkage (UPGMA) cluster analysis based on Rogersdistance.

Figure 7 .
Figure 7. Principal coordinates analysis of hybrids tested.

Table 2 .
Average and range of number of alleles per locus for 1043 maize genotypes.