The genetic diversity and population structure of common bean ( Phaseolus vulgaris L ) germplasm in Uganda

The knowledge and understanding of the genetic variability of common bean (Phaseolus vulgaris L.) germplasm is important for the implementation of measures addressed to their utilizations and conservation. The objective of this study was to characterize common bean in Uganda using polymorphic molecular markers for use in hybridization and variety development. Genomic DNA was extracted from plants at the first trifoliate leaf stage growing in pots using the modified cetyltrimethylammonium bromide (CTAB) method. The gene pool membership (Andean vs. Mesoamerican) for each accession was established with the phaseolin marker. Simple sequence repeat (SSR) alleles were separated by capillary electrophoresis that provided further information on the organization of genetic diversity. The Andean and Mesoamerican genotypes were present in similar frequencies (51 vs. 49%, respectively). All SSR markers tested were polymorphic with mean polymorphism information content (PIC) of 0.8. The model-based cluster analysis of SSR diversity in the STRUCTURE software found three sub populations (K3.1, K3.2 and K3.3) genetically differentiated with moderate Wrights fixation indices (FST) values 0.14, 0.12 and 0.09, respectively and many cases of admixture. The STRUCTURE result was confirmed by principal coordinate analysis (PCoA) which also clustered beans in three groups. Most Andean genotypes were included in K3.1 and Mesoamerican genotypes belonged to the K3.2 and K3.3 subgroups. This study sets the stage for further analyses for agronomic traits such as yield, resistance to biotic and abiotic stresses and the need for germplasm conservation.


INTRODUCTION
The domesticated Phaseolus vulgaris L. (2n = 2x = 22) consists of two major gene pools, one originating among wild beans ranging from northern Mexico to Colombia (Mesoamerican gene pool) and the other descending from wild beans distributed from southern Peru to northwestern Argentina (Andean gene pool) Freyre et al., 1996).The common bean has a widespread distribution on many continents such as Mesoamerica, South America, Europe and Africa.It reached Uganda presumably in the 18th century via the East African coast (Gepts and Bliss, 1988).Currently, the gene pool of the domesticated species is organized into four Mesoamerican and three Andean eco-geographical races based on morphological, agronomic and ecological grounds (Singh et al., 1991a;Beebe et al., 2001).In the Andean gene pool, the races are Nueva Granada, Peru, and Chile, while in the Mesoamerican gene pool, they are Durango, Guatemala, Jalisco, and Mesoamerica (Blair et al., 2006;Diaz and Blair, 2006).The distinction the Andean and Mesoamerican gene pools is achievable, with the Phaseolin -marker (Kami and Gepts, 1994;Burle, 2008).
Microsatellite markers (SSRs) have also been used in common bean to construct genetic reference maps (Yu et al., 2000;Blair et al., 2003), and evaluate intra-specific diversity (Gaitán-Solís et al., 2002).These markers were also employed to study the genetic structure in Andean and Mesoamerican races in common bean (Blair et al., 2006;Diaz and Blair, 2006;Kwak and Gepts, 2009) and population structure in 192 landraces from Ethiopia and Kenya (Asfaw et al., 2009).The use of SSRs to characterize common bean germplasm in this study is justified by their high informativeness, co-dominance, and wide distribution in the genome.
The National Crops Resources Research Institute (NaCRRI) at Namulonge holds 320 distinct accessions of common bean that are morphologically distinct but have little information documented on their genetic diversity and genetic potential.This hinders the utilization of these materials as parental sources in the different breeding programs and slows the process of designing appropriate conservation strategies.The continued adoption of elite bean varieties by farmers over time has replaced some landraces that are not continuously planted in the farmers' fields (Sekabembe, 2010) leading to genetic erosion.This erosion reduces alleles and genotype frequencies from the breeders' gene pool, narrowing variation and, hence, restricting the amount of adapted genetic diversity for future breeding.For example, a previous survey conducted in South-western Uganda, a region popular for large-seeded bean production (CIAT, 2005), found that 40% of farmers had stopped growing medium and large-seeded landraces in favor of newly introduced, small-seeded varieties that are tolerant to root-rot disease.
Pest and diseases also cause loss of bean cultivars in farmer's fields (Mukankusi, 2008).Genetic diversity is necessary for the rapid genetic improvement of crop species (Trethowan and Kazi, 2008) and its studies provide a major step towards enhancing the genetic potential of the bean germplasm.Thus, the objectives of the study was to genotype and determine the level of population structuring of common bean in Uganda using microsatellite markers for use in present and future bean breeding schemes and conservation.

Plant material
The study included 100 accessions (Appendix 1) representative of the phenotypic diversity of common bean in Uganda.The selection of the bean sample specifically relied on two traits: (i) plant type/growth habit and (ii) weights of 100 seeds of each accession.These traits have been mapped as part of the crop's domestication syndrome (Koinange et al., 1996).The sample included the released bean varieties in Uganda and the ones frequently used in breeding activities.The place of collection/origin was also considered in order to have representatives from the different agroecological zones in Uganda.Three bean seeds per variety were planted in plastic pots and monitored until the first trifoliolate leaf stage in the greenhouse facilities of the Department of Plant Sciences, University of California Davis (UC Davis), U.S.A.

DNA extraction, gene pool identification and genotyping
Genomic DNA was extracted from each bean accession following procedures described in Doyle and Doyle (1990).The diluted DNA samples (30 to 40 ng/μl) were subjected to Polymerase Chain Reaction (PCR) amplification.The phaseolin marker and 22 fluorescent labeled microsatellites markers labeled according to Schuelke (2000) was used in succession to determine gene pools and genotyping accessions as follows.The gene pool identification involved PCRs set up to amplify specifically a region surrounding the 15-bp tandem direct repeat of the phaseolin gene family as in Kami et al. (1995) and Burle et al. (2010).The PCR was conducted in a MJ thermocycler in a total reaction volume of 25.2 μl (containing: 18 μl double distilled water, 10X Thermopol Reaction buffer (2.5 μl), 250 μM dNTPs (2.5 μl), 1U Taq DNA polymerase (0.2 μL), 10 μM phaseolin primer (1 μl), and 50 ng/μl DNA (1 μl).The PCR products with loading dye were loaded in a 10% vertical polyacrylamide gel electrophoresis system using 0.5X TBE buffer, and run at 130 V for 2 h.Gene pool control genotypes (BAT93 for Mesoamerican and Jalo EEP558 for Andean) were always loaded with samples.Gels were stained in 10 μl in 10 mg/ml of ethidium bromide solution and photographed for future reference.The genotypes were quantitatively assigned to either Andean or Mesoamerican gene pool based on similarity of their band patterns to the gene pool controls (Plate 1).
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License  Prior to allele visualization by automated capillary electrophoresis, eight PCR products was randomly picked for each marker and run on 2% agarose gels to check for amplification.

Capillary electrophoresis
The preparations of PCR products for capillary electrophoresis involved markers co-loaded or multiplexed in non-overlapping panels based on expected allele size (http://phaseolusgenes.bioinformatics.ucdavis.edu/)labeled with different dyes.Multiplex panels had two to three markers with at least 30 bp size differences to avoid background stuttering of allele peaks for the different dyes during allele binning.A master mix of 10 μl consisting of 1 μl each diluted PCR product, 5.7 μl formamide (Hi-Di) and 0.3 μl GeneScan-500 LIZ size standard (Applied Biosystems Inc., USA) was prepared in optical 96-well MicroAmp plates (Applied Biosystems Inc, USA) for electrophoresis.Fragment separation was performed using the ABI PRISM 3730 genetic analyzer instrument (Applied Biosystems, USA) at the Veterinary Genetics lab at UC Davis.

Allele calling
The allele fragments were determined in strand 2.2.30 software (Mellissa et al., 2000) for peak detection and fragment size matching to the reference data (Figure 1).The allele sizes were auto-calculated with reference to the internal lane size standard GeneScan-500 LIZ, ranging from 35 to 500 base pairs.The allele size bins were exported to Microsoft Office (MS) Excel ( 2007) program for subsequent analyses.

Data analysis
The following genetic parameters: allele number and frequency, gene diversity, heterozygosity, and polymorphism information content (PIC) were calculated in PowerMarker software version 3.25 (Liu and Muse, 2005).The level of population structure among the bean accession was established by subjecting SSR allele sizes to a model-based program STRUCTURE version 2.3.3 (Pritchard, 2000).A Principal Coordinate Analysis (PCoA) was obtained using GenAlEx v6.4 software (Peakall and Smouse, 2006) and Unweighed Neighbor-joining (NJ) tree using microsatellite diversity implemented in the PowerMarker program generated with the Darwin program for displaying genetic relationships among accessions and to test the results of STRUCTURE.

Defining the population structure of common bean in Uganda
The STRUCTURE program (Pritchard and Stephens, 2000) is the most widely used clustering software applied to detect population genetic structure using a defined number of pre set populations K, where each K is characterized by a set of allele frequencies at each locus.The analysis was run with 10 simulations per K value from K = 2 to 6 using 5,000 replicates for burn-in and for analysis 50,000 replicates.The "true" number of populations (K) was confirmed according to Evanno et al. (2005) using the STRUCTURE Harvester (Earl and Vonholdt, 2012), online (http://taylor0.biology.ucla.edu/struct_harvest/)for visualizing outputs.Microsoft Excel program was used for easy conversion of estimated membership coefficient and generating the bar plot.The analysis generated the membership coefficients of each subgroup and the most correct number of subpopulations (K) using different colours according to Rosenberg et al. (2002) with each individual with a fixed length line segment partitioned into K colored components.

K = 3 analysis
The individual membership coefficient at K = 3 from the STRUCTURE run had maximum mean probability of likelihood value of L (K) = -7658.3which led to assignment of bean accessions to K= 3 three sub populations as K3.1, K3.2 and K3.3.Accessions with a membership coefficient 0.5 (50% ancestry limits) and above were clustered in the same group at the K= 3 level.The membership coefficient data generated using the STRUCTURE software was exported to MS Excel (2007) for visualization as graphical bar plot for membership coefficient of each accession within the three subpopulations.

Allelic diversity of the common bean germplasm
The key parameters used to define genetic diversity among beans from Uganda are presented in Table 2.There was a high level of polymorphism with a mean of 19 alleles per locus and a range of 7 to 45 alleles in the germplasm.The frequency of the major allele ranged from 0.1 to 0.51 with mean of 0.31.In total, the 22 markers detected 423 alleles, ranging from 114 to 267 bp in size and PIC ranging from 0.64 for BM159 to 0.95 for BMd37 with mean of 0.80.The overall mean heterozygosity was 0.45 with highest heterozygosity value of 0.96 in marker BM197.

Genetic diversity of Andean and Mesoamerican gene pools in Uganda
The common bean germplasm in Uganda comprised of 51% Andean and 49% Mesoamerican accessions based on the phaseolin analysis (Appendix 1).The mean allele number per locus was higher in the Mesoamerican group (16 alleles) than in the Andeans (13 alleles).The mean major allele frequency in the Mesoamerican gene pool was 0.45 and 0.33 in the Andean gene pool (Table 3).The loci BMd37 and BM183 had the highest and lowest allele number respectively, among Ugandan common beans.The mean gene diversity detected was higher in the Mesoamerican (0.79) than in the Andean group (0.67).Mean heterozygosity was comparable between the Andean and Mesoamericans (0.46 and 0.44, respectively).The highest polymorphism was recorded in the Mesoamerican gene pool with a mean PIC value of 0.78 compared to 0.66 in the Andean.The most polymorphic locus in the two gene pools was BMd37, with PIC values of 0.94 among Andean and 0.96 among Mesoamerican genotypes.Locus BM189 also showed high polymerphism in the Andean materials.The markers BM183 and BM159 detected the highest allele frequency of 0.86 and 0.73 respectively in both gene pools with Mesoamerican group showing more genetic diversity.

Population structure in common bean germplasm in Uganda
The STRUCTURE clustering technique identified the population membership, structuring, and admixture as shown in the K=2 and K=3 sub-populations (Figure 3).The lowest mean probability of data was recorded at L (K) = -8382.0for K = 6 and a highest mean probability = -7658.3for K = 3.The Evanno test found a clear maximum for Delta K at K = 3 in the plots of L (K) versus Delta (Figure 2) confirming a likely assignment of the bean germplasm to three sub-groups.The mean genetic diversity statistics (Table 4) for the three subgroups formed at K=3 were calculated as in Hunt et al. (2011) using PowerMarker v3.25.The STRUCTURE program calculated the level of genetic differentiation or Wright fixation index of F statistics (F ST ) simultaneously between the different bean sub populations according to Wright (1978).The three subpopulations had moderate differentiation, with F ST values ranging from 0.05 to 0.15.

DISCUSSION
The objective of the study was to characterize common beans in Uganda using the phaseolin marker to determine their frequencies and membership to the Andean and Mesoamerican gene pools.A subsequent analysis with fluorescently labeled SSR markers was done to assess the levels of their genetic diversity and structure.The two gene pools of domesticated common bean are present in Uganda in similar frequencies.Previous reports show a striking difference in numbers of Andean and Mesoamerican gene pools in Africa (Bellucci et al.,  (Mkandawire et al., 2004).In Kenya, Andeans dominate while in Ethiopia, Mesoamericans are more frequent (Asfaw et al., 2009).
The great lakes region of Central Africa e.g Rwanda and Democratic Republic of Congo is predominated by the Mesoamerican gene pool (Blair et al., 2010).The suitability of markers for multiplexing, informativeness and efficiency in finding the levels of genetic diversity and structure of the different genetic groups was tested in this study.Data about genome-wide genetic variability was obtained quickly and accurately using a set of nine panels of multiplexed SSR markers that are well distributed throughout the genome and were scored semiautomatically.Ramachandran et al. (2003) Gomez et al. (2004) found 5.7 alleles per locus in 108 small seeded individual beans from nine different sites in Nicaragua.The marked difference in frequencies of alleles recorded in this study and other studies in common bean can be attributed to differences in the number of polymorphic markers used, sample sizes, collection sites, and the geographical coverage.Common bean samples in this study included beans with different pedigrees, such as CIAT breeding lines currently used as sources of disease resistance, improved varieties such as NABE4, NABE 12C, NABE 13, NABE 14, K20, and K132, and predominantly landraces collected from farmers' fields.The high number of alleles detected in this study originates from the 22 SSRs, chosen deliberately because of their high PIC values with a high number of alleles.
Population structure refers to sub-divisions of a simple population in some way into smaller groups resulting from a single population's deviation from Hardy-Weinberg proportions (Falush et al., 2003a) by inbreeding, selection or migration.The modeled population structure of beans in this collection was represented graphically for K= 3 and K=2 (Figure 3).Given the existence of two major gene pools in common bean, a K = 2 situation be reasonably predicted (Gepts and Bliss, 1985;Singh et al., 1991a).The K= 2 level consisted of sub-populations K2.Namayanja et al., 2006), and G2333 (Anthracnose; Nkalubo et al., 2009).Group K3.3 (green) was also identified with predominantly Andean landraces (94%).The third smaller sub-group K3.2 (maroon), identified consisted of 67% breeding lines of Mesoamerican origin such as VAX3, VAX4, and BAT93.Some accessions showed shared population membership (among K3.1, K3.2 and K3.3) and reflect the effect of hybridization or gene flow.For example, in the K3.1, accessions U33, MCM5001, MEXICO54, U10-5, R5410, G2333, U625, U318 and U3-5 were hybrids.Among the sub group K3.3 accessions, U268, U382, NABE12C, U273, U371, U10-7 and U124, also consisted of contributions from the three sub-groups identified at K=3.Some accessions derived ancestry from two subgroups, for example, K3.2 and K3.3 with accessions U616, U6-3 and BAT93 (Mesoamerican gene pool control genotype) as shown in Figure 3. Attempts to infer gene pool of different genotypes involved relating the sub-populations members to the known gene pool controls JaloEEP558-andeanCTRL (for Andeans) and BAT93-mesoCTRL (for Mesoamericans).The ancestry of the Mesoamerican control genotype (BAT93-mesoCTRL) was shared between K3.1 and K3.2 (26% -green and 74% -maroon).Burle et al. (2010) used the same accession as a control for the Mesoamerican gene pool and reported this accession as a breeding genotype, which originates from a four-way cross, with all parents having Mesoamerican origin, hence explains its shared membership between two sub-population at K=3.The sharing of ancestry between genotypes is generally explained by recombination in some parts of the genome due to inter and intra gene pool crossings in breeding or natural hybridization.Membership switching is demonstrated by the PCoA (Figure 4) and NJ tree (Figure 5) with no correspondence between the SSR and phaseolin marker groupings, with a lot of gene pool membership switching.Membership switching among presumed subpopulations in common beans occurred in previous studies, involving allozymes and RAPD markers (Freyre et al., 1996) for presumed ancestral group of the Andean and Mesoamerican gene pools.According to Kwak and Gepts (2009), the lack of phaseolin polymorphism in domesticated gene pools prevents the detection of more subtle genetic differentiation between closely related accessions at this single albeit complex locus.The same reason can be extended to this study, in addition to admixture, to explain the lack of concordance between phaseolin and the model-based approaches used to group the bean germplasm.The long cohabitation of the two cultivated gene pools possibly led to the introgression of alleles between cultivars creating hybrids with shared phenotypic traits.The identification of hybrids in similar studies was based on the intermediate position between gene pools in the NJ trees (Bellucci et al., 2014).The high levels of admixture observed within the subpopulations in the STRUCTURE, PCoA and NJ tree analyses, clearly shows that the common bean germplasm in Uganda have considerable variations for utilizations in breeding.Model-based analyses of population divisions can be performed separately in Andean and Mesoamerican, in addition to analysing the entire sample, to detect accessions membership switching, since a marked reduction in genetic differentiation is observed by analyzing separate gene pools (Kwak and Gepts, 2009).
The STRUCTURE analysis from Burle et al. ( 2010) found five groups in common beans landraces from Brazil They used 67 microsatellite markers spread over the 11 linkage groups of crop's genome and Mesoamericans four times more frequent than Andean gene pool.In other crops, for example, African rice (Oryza glaberrima), Khady et al. (2011) identified three distinct populations in 74 rice varieties collected from Benin through population structure analysis.The high diversity in the Mesoamerican gene pool (both maroon and green components in STRUCTURE bar plot), compared to Andean in this study is due to farmers' preference for small seeded beans.In practice, preference for Mesoamerican bean types by some farmers (Blair et al., 2010) results in planting of many smaller weight seeds than larger seed weights.CIAT (2005) reported farmer's preference for smaller-seeded Mesoamerican genotypes to manage root rot disease as is the case in Southwestern Uganda, the leading common bean producing area in Uganda.These bean materials eventually find their ways to other parts of the country through various routes and activities.
The PCoA analysis graphical display showed that the Mesoamerican group was the most diverse and included many presumed hybrids.This observation shows that large variations arose from gene introgressions in breeding material and out-crosses in landraces that occur in farmers' fields.In other studies, Blair et al. (2010) conducted a PCoA and found diversity within and between gene pools in a larger collection of 365 common bean genotype from Central Africa.Introgression between gene pools was observed for 32 intermediate genotypes.Maciel et al. (2003) however, found no clear distinction between domesticated common bean samples from Brazil using AFLP markers and suggested that admixture was the possible cause.The common bean germplasm in Uganda has a moderate level of population structuring with F ST values ranging from 0.09 to 0.14 for the three clusters generated from STRUCTURE analysis.The K3.3 group was the most differentiated with F ST values of 0.14.The K3.1 and K3.2 groups followed with F ST values of 0.12 and 0.09, respectively.
In other studies, Asfaw et al. (2009) found diversity in East African bean landraces and cultivars of Andean origin to be more differentiated (F ST =0.331) than ones in the Mesoamerican gene pool F ST =0.04) with mean F ST of 0.27 among pairs of populations analyzed.They related genetic divergence in East African bean landraces to the original differences in introduced germplasm from the primary centre of origin combined with spontaneous outcrossing in farmers' fields and further farmer selection for adaptation and production uses.Further subdivision of the two gene pool into eco-geographic races (Singh et al., 1991a) was not carried out in this study and thus recommended in future to facilitate the use of the germplasm in breeding.

Conclusions and recommendations
The fluorescently labeled SSR markers revealed genetic

Plate 1 .LizFigure 1 .
Figure 1.An electrophoregram genotype plot of the Strand 2.2.30 software, showing allele binning in sample 17:24(H3).In this case the most polymorphic marker from the study (BMd37) had two heterozygous alleles (peaks) with sizes 164/182 bp against the standard LIZ (orange) dye in the lower panel.

Figure 2 .
Figure 2. Plots of parameters L (K) and Delta K against the likely sub populations (K) generated according to Evanno et al. (2005) with sub-populations in Ugandan common bean with three sub populations (K=3) as most likely.

Figure 3 .
Figure 3. Hierarchical organization of genetic relatedness of 102 common bean accessions based on 22 SSRs markers from STRUCTURE program analysis described in data analysis.K=2 (above) and K = 3 (below);).

Figure
Figure Principal coordinates analysis (PCoA) of accessions from microsatellites diversity based on the presence and absence of alleles.The three subpopulations are represented by diamond symbols, whose colour's reconcile with the three subgroups in STRUCTURE (Figure 3): K3.1 (blue), K3.2 (green), and K3.3 (maroon).The two black diamonds are controls, that is, genotypes JaloEEP558 and BAT93 for the Andean and Mesoamerican gene pools, respectively.

Figure 5 .
Figure 5. Un-weighed Neighbor-joining tree generated in Darwin program using microsatellite diversity of common bean accessions in Uganda, based on the Chord distance implemented in the PowerMarker program.Each branch is color-coded according to membership into the K=3 groups identified by STRUCTURE (same colors as in Figure 3).The names of accessions in brackets show their gene pool; (A) Andean and (M) Mesoamerican as determined with the Phaseolin marker, in methods and Plate 1.All the accessions in the blue branch are Mesoamerican, 95% of the green branch is Andean and 67% of maroon branch contains Mesoamerican accessions.

Table 1 .
Linkage group of microsatellite markers (SSRs), dye used in fluorescently labeling, SSR allele sizes and targeted sequence repeat for genotyping Common beans in Uganda.

Table 2 .
Genetic diversity, observed heterozygosity (He) and number of alleles detected in 100 common bean genotypes.

Table 3 .
diversity parameters in Andean (A) and Mesoamerican (M) common beans in Uganda revealed with 22 polymorphic microsatellite markers.
Gene pool and morphology of 100 common bean accessions from Uganda studied at 22 fluorescently labeled microsatellite loci.Accession name: superscript is status in breeding: I=improved, L=landrace, C=CIAT line.Gene pool: A= Andean and M = Mesoamerican determined as described in the methods with the corresponding weights of 100 seeds.Gene pool controls are accessions: No.52 (for Andean) and 102 (for Mesoamerican).Plant types: I = Determinate growth habit, II = Indeterminate growth habit, III= Indeterminate prostrate growth with some climbing ability and IV= Indeterminate growth habit with strong climbing ability.Place of collection: Region with district name in parentheses (C = Central, W = Western, WN = West Nile, E = Eastern and N = Northern) and CIAT (regional headquarters in Uganda).Seed coat colour: mixed are accessions that segregated with seeds of multiple colours. 1.