Microsatellites revealed genetic diversity and population structure in Colombian avocado ( Persea americana Mill . ) germplasm collection and its natural populations

Avocado (Persea americana Mill.) is one of the most consumed fruits around the world. The species is differentiated into three botanical races: Mexican, Guatemalan and West Indian. A genetic characterization of a Colombian avocado germplasm collection (N = 105) preserved since 2006 and 92 avocado “criollo” trees sampled in Antioquia was made with 12 microsatellites. Colombian avocado exhibits higher genetic diversity (13.9 alleles/locus, Ho = 0.79-0.38, Na =12.3-2.4, Ne= 5.3-1.8, I = 1.870.68) than avocados from other germplasms as criollo trees are the product of free crossing (multiple hybridizations). Population structure was found within Antioquian criollo avocados (FST = 0.13, p<0.0001) and in the germplasm collection (FST = 0.068, p<0.0001). Divergence in Antioquia resulted from differences in elevation and climatic conditions. Southwest, East and Altiplano-norte Antioquian sub-regions are genetically close and produce criollo avocados in high elevations where grafting with Hass is likely to be successful. In Colombia, avocado genetic differences were enhanced by the Andean Chain of Mountains. STRUCTURE HARVESTER assignment revealed that germplasm avocados were distributed in K = 2. The first cluster was composed of samples collected from the south of Colombia (Valle del Cauca, Nariño), the second by samples from the north (Antioquia, Bolivar, Magdalena) and avocados from the rest of the departments were distributed in both. The results obtained are relevant for avocado certification because knowledge of genotypic and phenotypic variation is crucial for crop management and grafting with avocado cultivars such as Hass, which is economically important for Colombian avocado exportation.

resources conservation and utilization of plant material (Odong et al., 2011).Conservation and use of plant genetic resources include material, that is, both reproductively and vegetatively propagated.This material consist of: (a) varieties that are currently used as new plant cultivars, (b) obsolete cultivars, (c) races and cultivars traditionally used by agriculturists, (d) wild relatives of plant species and (e) genetic lineages that include elite materials and improved cultivars of species as well as aneuploids plants (Frankel et al., 1995;Karp et al., 1997;Corona-Jácome et al., 2016;Guzmán et al., 2017).Estimation of genetic structure of heterogeneous germplasms is essential for core collections sampling as partitioning before sampling ensures that genetic, phenotypic and ecological spectra are entirely represented (Odong et al., 2011).
In avocado (Persea americana Mill.) (2n = 2x = 24), the species has great genetic variability with almost limited leveraging since cross fertilization between trees occur freely in nature through open pollination enhanced by insects (Sánchez, 1999;Barrientos-Priego and López, 2007).Genetic characterization of avocado germplasm as well as knowledge of its genetic diversity allow researchers for agronomic improvement through the development of new varieties and cultivars that are resistant to diseases and soil salinity and cultivars that are more profitable in terms of fruit production, fruit quality and fruit maturation precocity (Guzmán et al., 2017;Sánchez, 1999).This species is a subtropical evergreen tree native to Central America and Mexico (Chanderbali et al., 2008;Chen et al., 2009;Ashworth and Clegg, 2003;Corona-Jácome et al., 2016;Guzmán et al., 2017).It belongs to the Lauraceae family and the Magnoliid clade of early-divergent angiosperms and is differentiated in three horticultural races (Chanderbali et al., 2008;Chen et al., 2009;Ashworth and Clegg, 2003;Gross-German and Viruel, 2013;Corona-Jácome et al., 2016) that are adapted to different climatic conditions and recognized as: Mexican [P.americana var.drymifolia (Schltdl.& Cham.)S.F.Blake], Guatemalan [P.americana var.guatemalensis L. Wms.], and West Indian [P. americana var. americana Mill.].These races originated from highland Mexico, highland Guatemala, and lowland Mexico (Tierras bajas), respectively (Corona-Jácome et al., 2016) and are distinguishable on the basis of morphological, physiological and horticultural traits (Bergh, 1992;Alcaraz and Hormaza, 2007;Gross-German and Viruel, 2013).Mexican and Guatemalan races are adapted to cooler climates while West Indian races to tropical conditions (Gross-German and Viruel, 2013).In Colombia, most commercial avocados are the product of interracial hybrids developed from seedlings that are commonly recognized as "criollo" (Bernal and Bernal et al., (2014); Cañas et al., 2015).In this country, avocado production is mainly based on grafting selected criollo trees with commercial cultivars, particularly the variety Hass (Cañas et al., 2015;Rodríguez et al., 2009).However, criollo avocados have not been certified yet and Colombian avocado exportation requires genetic and phenotypic characterizations as well as avocado fruit production free of pathogens (that is, Phytohpthora cinnamomi) and pests (that is, Thrips such as Selenothrips rubrocinctus Giard and Frankliniella gardeniae Moulton, and individuals of the genus: Leptothrips and Karnyothrips species (Echeverrri-Florez et al., 2004)).Previous molecular characterization of "criollo" trees and few cultivars including Hass, Fuerte and Papelillo from the Department of Antioquia (Northeast Colombia) was made using 38 amplified fragment length polymorphism (AFLP) (Cañas et al., 2015).They found genetic similarities between criollo avocado trees and the variety Hass (AFLP fingerprints shared between Hass and criollo trees) and population structure (Fst = 0.137, p <0.01) between Antioquian municipalities (collection sites) according to differences in elevation and climate conditions.In avocado, as in other fruits, morphological traits have been traditionally used to identify genotypes (Guzmán et al., 2017;Alcaraz and Hormaza, 2007;IPGRI, 1995).Nevertheless, evaluation of these traits is laborious and inexact due to the influence of environmental effects, subjectivity and a limited number of discriminatory traits (Alcaraz and Hormaza, 2007).For this reason, molecular markers such as microsatellites have been used to clarify the genetic similarities between avocado races and cultivars (Sharon et al., 1997;Alcaraz and Hormaza, 2007;Borrone et al., 2007;Gross-German and Viruel, 2013;Guzmán et al., 2017).These markers, also known as simple sequence repeats (SSRs), are stretches of DNA that consist of tandem repeats of 1 to 6 bp.They are located throughout nuclear, mitochondrial, and chloroplast genomes and represent suitable molecular markers to perform genetic studies as they are highly reproducible and co-dominant (Freeland, 2005).Several microsatellite loci have been developed in avocado, as they have been used for linkage map construction (Sharon et al., 1997;Borrone et al., 2007), molecular germplasm characterizations (Alcaraz and Hormaza, 2007;Guzmán et al., 2017) and diversity analysis (Alacaraz and Hormaza, 2007;Borrone et al., 2007;Gross-German and Viruel, 2013;Corona-Jácome et al., 2016;Guzmán et al., 2017).
Given that no studies on the molecular characterization of an avocado germplasm from Colombia (N= 105, ex situ trees from diverse departments) preserved since 2006 together with criollo avocado trees sampled (from 2014 to 2015, N = 92) at the Department of Antioquia (Northeast Colombia), have been carried out in Colombia; the aim of this study was to employ 12 microsatellites previously standardized in the species (Alcaraz and Hormaza, 2007) to analyze its genetic diversity and its population genetics to determine if Colombian avocado is partitioned due to environmental effects such as elevation as previously found in Antioquia ( Cañas et al., 2015).Elevation plays an important role in avocado, as Mexican and Guatemalan races are usually found in highlands and cool environments while West Indian races in lowlands and tropical conditions (Gross-German and Viruel, 2013).The results obtained in this study are relevant as they represent the first step taken for Colombian avocado certification also for port grafting criollo trees with the variety Hass, which is a product of a cross between Mexican × Guatemalan races (Chen et al., 2009) and one of the most desired avocados in the world (Chanderbali et al., 2008).

Plant
Fresh leaves collections were carried out in 197 cultivars including: 92 trees corresponding to criollo trees sampled at the Department of Antioquia (Northeast Colombia).2).

Genomic DNA extraction and amplification
Total genomic DNA was isolated from avocado leaves based on Cetyl Tri-methyl Ammonium Bromide (CTAB) extraction method previously standardized in Colombia (Cañas et al., 2015).Twelve microsatellites were selected from those used by Alcaraz and Hormaza (2007) based on their high polymorphism.Three multiplex PCR amplifications were performed in 10 µl volume containing 16 mM (NH4)2SO4, 67 mM Tris-HCl pH 8.8, 0.01% Tween20, 2 mM MgCl2, 0.1 mM each dNTP, 0.4 µM of each primer, 25 ng genomic DNA and GoTaq ® ADN polymerase (Promega, WI, USA).Forward primers were labeled with WellRed fluorescent dyes on the 5' end (Proligo, France).Reactions were carried out in an I-cycler (Bio-Rad Laboratories, Hercules, CA, USA) thermo cycler using the following temperature profile for multiplex PCR 2 and 3: an initial step of 1 min at 94°C, 35 cycles of 30 s at 94°C, 30 s at 50°C and 1 min at 72°C, and a final step of 5 min at 72°C.For multiplex PCR 1, same temperature profiles were used with the exception of 55°C for annealing temperature was used instead.The PCR products were analyzed by capillary electrophoresis in the equipment ABI PRISM® 3130 Genetic Analyzer (Applied Biosystem, CA, USA).Each reaction was repeated twice to minimize run-to-run variation.
After amplifications, each putative genotype was identified with the program Peak Scanner Software v1.0 (Applied Biosystem, CA, USA).Weinberg equilibrium and linkage disequilibrium per population.Bonferroni corrections were carried out after obtaining HWE results and linkage disequilibrium estimations.These corrections were made to modify the significance levels (α) according to comparisons made amongst sub-regions or departments in the HW analysis and between paired loci for the linkage disequilibrium (Sokal and Rohlf, 1995).GENALEX 6.501 was also used to estimate analysis of molecular variance (AMOVA) to determine if avocados collected from different sampling sites (sub-regions) from Antioquia and/or different departments of Colombia (germplasm) were genetically different.This program also used to calculate Nei's genetic paired distances between sampled sub-regions of Antioquia and between departments of Colombia that are part of the germplasm bank.These distances were further used to graph neighbor joining (NJ) dendrograms with the program MEGA 4.0 (Kumar et al., 2008).No bootstrap was estimated for the NJ trees as the data entry consisted of Nei's pair distances calculated between sampled sites solely.

Gene diversity and population genetics analyzes
The model based clustering analysis STRUCTURE 2.3.4 (Pritchard et al., 2000) was used to assess the most probable avocado cluster membership from all samples maintained in the germplasm bank from Colombia (departments of Colombia) together with samples taken from Antioquia.This program was run for 15.000 Markov chain Monte Carlo (MCMC) steps after a burn-in period of 150.000 interactions from K= 1-15.Each K was calculated from ten independent runs and also considering (a) no admixture model and (b) allele frequency model independent following the procedure given by Chen et al. (2009).The ad hoc estimated likelihood of K (delta K) (Evanno et al., 2015) was used to estimate the most likely number of populations (K) based on the rate of change in the log probability of the data [Ln Pr (X/K)].The number of K = 2 genetic units was estimated with STRUCTURE HARVESTER web v.0.6.92 (Earl and vonHoldt, 2012).

Microsatellite polymorphisms and genetic diversity
A total of 147 fragments were detected with 12 microsatellites in avocados collected in the six subregions of Antioquia (Northeast Colombia).The number of alleles per locus ranged from 9 (AVT386, AVT372, AVD102) to 18 (AVD013).The mean allele number was 12.3 alleles per locus.The total number of alleles were high for the Antioquian Southwest (375 alleles) and West (244 alleles) sub-regions suggesting that both included the most diverse Antioquian avocados.On the contrary, Altiplano-norte only presented 13 alleles.3).Diversity parameters obtained from the germplasm collection (12 Colombian departments) generated 167 amplified fragments with the following information: the number of amplified alleles was from 10 (AVD102) to 21 (AVD006), with an average of 13.9 alleles per locus.The total number of amplified alleles per population was N= 969 for Antioquia.This Colombian department presented the highest number of alleles.However, it was the most sampled department of Colombia.Valle del Cauca also had a high number of amplified alleles with N = 217 and Cesar Department only had 20 amplified alleles.The total number of different alleles (Na) was 12.3 for Antioquia with the highest value and 2.3 for Caldas with the lowest value.Shannon-Weaver index ranges were from 0.7 for Valle del Cauca to 1.9 for Antioquia.Observed (Ho) and expected heterozygosities (He) ranges were from 0.4 (Cauca Department) to 0.70 (Risaralda Department) (Table 3).HWE was found in 30% of the six Antioquian subregions after Bonferroni corrections and in 50% of the 13 departments (given that Antioquia was also included in the analysis) (Table 4).Analysis of linkage disequilibrium showed that microsatellites were in linkage equilibrium after Bonferroni corrections (data not shown).

Population genetics results
The AMOVA analyses estimated between six Antioquian sub-regions produced F ST = 0.084, p<0.0001 and between 13 Colombian departments (germplasm collection together with samples from Antioquia) generated F ST = 0.068, p<0.0001.Both values suggest population differentiation between sub-regions and between departments.NJ dendrograms obtained with Nei's genetic distances between sub-regions (Figure 3) and between departments' (Figure 4) clustered populations that are geographically close.In Antioquia, clustering occurred between sites that are located in similar elevations and climatic conditions (Table 1).The NJ dendrogram obtained for Antioquia separated Urabá and Altiplano-norte from the rest of sub-regions.Elevations were from 35 to 337 masl in Urabá, from 804 to 2040 masl in West sub-region, from 873 to 1057 masl in Northwest sub-region, from 833 to 2094 masl in Southwest sub-region, from 876 to 2277 masl in East sub-region and finally in Altiplano-norte elevations were from 2353 to 2493 masl.Genetic differentiation between departments might also be influenced by the three Andean Mountains Chains as clustering between sampling sites that occurred in locations found in the same mountain chain suggesting these mountains are a barrier to gene flow in this species as avocado producers share seedlings between neighbor orchards (Figure 2).The only sampling site from Antioquia that was genetically apart from the rest was Altiplano-norte.This genetic differentiation might be due to its small size.
The program STRUCTURE HARVESTER assigned K = 2 from the 12 departments of Colombia that composed of the germplasm bank together with samples collected in Antioquia according to the following estimations: Mean Ln (P|D) = -7267.10,Stdev LnP (K) = 0.1870, Ln´ (K) = 410.44,and ΔK = 773.13(Figure 5).These results suggest that Colombian criollo avocado is sub-divided into two populations where most avocado samples taken from the south of Colombia were found in cluster 1, including the departments of Valle del Cauca and Nariño and samples taken at the north of Colombia in cluster 2, including the departments of Antioquia, Bolivar and Magdalena.The rest of the departments were evenly distributed in both clusters (Table 5).STRUCTURE HARVESTER also subdivided criollo avocados from Colombia in a similar clustering pattern to the NJ dendrogram obtained with Nei's distances as this dendrogram separated samples according to their geographic proximity and their distribution in the Andean Chain on mountains.

DISCUSSION
P. americana constitutes one of the most consumed fruits around the world (Guzmán et al., 2017;Gross-German and Viruel, 2013;Chen et al., 2009;Chanderbali et al., 2008).This study represents the first molecular analysis made with microsatellites in Colombian criollo trees collected in nature together with cultivars obtained from an ex situ germplasm preserved since 2006.In Colombia, few studies have focused on analyzing this species at the molecular level.In avocado, open pollination increases the genetic variability of the species (Sánchez, 1999;Barrientos-Priego and Lopez, 2007) as in the case of criollo cultivars in Colombia.Criollo cultivars are distributed in a wide range of thermic zones and are    greatly desired in this country for domestic consumption.Also, given their rusticity, adaptability and compatibility with several cultivars, they have been used as port grafting material, particularly the variety Hass (Cañas et al., 2015).Ninety two avocado criollo trees were sampled from 6 Antioquian sub-regions.However, collection sizes were unevenly distributed as avocado production differs per municipality.These cultivars generated 147 microsatellites (alleles) that ranged from 9 to 18 alleles with a mean of 12.25 alleles per microsatellite whereas    (Sánchez, 1999;Barrientos-Priego and Lopez, 2007).The number of samples used by Guzmán et al. (2017) was 318 avocados whereas in this study 197 avocados were analyzed.Although the number of samples used here are lower than in Guzmán et al. (2017), the number of alleles found in Colombian avocado are still very high.Shannon-Weaver diversity index was between 1.87 and 1.16 in the germplasm collection meaning that genetic variability of avocado from Colombia is fairly homogenous amongst departments.Nevertheless, the highest estimation was obtained in Antioquia and the lowest in Cauca.These results could be explained because the majority of avocado samples came from Antioquia.Additionally, a comparison between observed (Ho) and expected heterozygosities (He) showed that municipalities from in Antioquia presented Ho from 0.52 (Altiplano-norte) to 0.59 (West) and He from 0.30 (Altiplano-norte) to 0.76 (East).In Colombia, Ho and He were from 0.4 (Cauca department) to 0.70 (Risaralda department).Differences in heterozygosity estimations between departments could also be due to differences in sample sizes.These Ho and He values are relatively similar to the studies made by Ashworth and Clegg (2003), Alcaraz and Hormaza (2007), Borrone et al. (2007), Gross-German and Viruel (2013) and Guzmán et al. (2017).Ashworth and Clegg (2003) found Ho = 0.61 with 25 microsatellites standardized in the three botanical races and the species Persea steyermarkii.They also included the cultivars Fuerte and Bacon.In the study made by Borrone et al. (2007), they estimated Ho from 0.17 to 0.73 and included the three botanical races and P. schiedeana.Gross-German and Viruel (2013), produced Ho estimations were 0.67 in average from 42 accessions that were analyzed with 47 microsatellites from Spain and in the work made by Schnell et al. (2003) Ho values ranged from 0.73 to 0.94, they analyzed a larger sample size compared to the present work as they studied 224 accessions from a GenBank hold in Miami.Guzmán et al. (2017) estimated Ho = 0.59 and He = 0.75 from 318 accessions from Mexico where three botanical avocado races were included.
HW analysis showed that 67% of Antioquian subregions were in HWE including samples taken from East, Southeast, Urabá and West.Additionally, 53% departments of Colombia were also in HWE including samples collected at the departments: Antioquia, Bolívar, Nariño, Quindío, Santander, Tolima and Valle del Cauca.HWE is usually found when the studied marker is neutral and are widely distributed in a species genome (Freeland, 2005).Most avocado sampled sites from Colombia that were in HW disequilibrium had low sample sizes.Additionally, no linkage disequilibrium was found between pair of loci studied here.On the contrary, in Guzmán et al. (2017) HWE and linkage equilibrium were not found in the three botanical races.They observed high inter-locus linkage disequilibrium for two races: drymifolia (Mexican race) and guatemalensis (Guatemala race) and they suggest that this result was due to domestication bottleneck process experienced by each botanical race as in Mexico human intervention imposed selection intentionally to propagate certain linages.In contrast, in Colombia most avocados that are produced and consumed are criollo and this type of crop is mostly composed of recombinant avocados (hybrids) as free pollination occurs freely in Colombian orchards.
Results obtained from AMOVA with six Antioquian subregions and between the Colombian departments that compose of the germplasm collection, were all significant, suggesting population structure in Colombian avocado.These results corroborate a previous study made with 111 criollo trees from Antioquia with 38 AFLP.They found genetic differentiation due to diverse agroecological conditions (elevation and climate) between sampled zones (Cañas et al., 2015).NJ dendrograms found here showed clustering between sampled sites that were geographically close at the six Antioquian sub-regions and the 13 Colombian departments that compose of the avocado germplasm (including Antioquian samples).Genetic structure within Antioquia might have been due to differences in elevation and climate conditions between sites.Avocado samples collected in Southwest, East and Altiplano-norte were mainly found in high elevations meaning they are likely to be genetically similar to the variety Hass which is a product of a cross between Mexican × Guatemalan races.These two races are usually found in highlands (Gross-German and Viruel, 2013).Guzmán et al. (2017) demonstrated that Mexican avocado is genetically split in two groups due to differences in ecological conditions suggesting that not only human intervention but adaptation process is relevant for the genetics and production of this crop.Avocado genetic structure found in Colombia might also be enhanced by the three Andean Mountains Chains as the first NJ cluster was composed of avocados collected at the North of Colombia including the departments of Antioquia, Magdalena, Cesar and Santander.The second cluster was composed of samples collected in Cauca, Risaralda, Huila, Tolima and Valle del Cauca.These departments are found between the Central and the West Andean mountains.The third cluster composed of samples from Caldas, Nariño and Quindío, which are found between the Central and East Andean Mountains.On the other hand, avocados found in Altiplano-norte were genetically distant from other sub-regions of Antioquia.
Results obtained with STRUCTURE HARVESTER produced K = 2 from 12 departments analyzed here together with avocado samples collected in Antioquia (N = 13), this outcome suggests that Colombian avocado is split in two main clusters: the first group is mainly represented by avocados from the south of Colombia at Valle del Cauca and Nariño departments and the second cluster by departments found at the north of Colombia, including Antioquia, Bolivar and Magdalena.The rest of the departments: Caldas, Cesar, Huila, Nariño, Quindío, Risaralda, Santander, and Tolima were found in both clusters.Similar results were obtained with the NJ dendrogram of the germplasm collection as criollo avocados were grouped according to their geographic proximity and their distribution in the Andean Chain on Mountains.These mountains are a barrier to seedling movement as farmers share plants between neighbor orchards.These results differ to the study made by Gross-German and Viruel (2013) as they also used STRUCTURE in 42 Spanish avocado accessions genotyped with 47 microsatellites and found K = 3.However, they analyzed avocados from the three botanical races and did not analyzed different thermic zones whereas in this work only criollo trees were studied and both elevations and Andean Mountains sampling locations were considered; these criollo trees represent hybrids between these three races as they are the product of multiple hybridizations between them enhanced by insect pollinators (Sánchez, 1999;Barrientos-Priego and López, 2007).Nevertheless, the results obtained here agreed with the outcome found by Guzmán et al. (2017) as they studied the three botanical races from 318 avocado accessions and estimated K = 2 that corresponded to two ecological regions of Mexico.The first cluster coincided with humid semi warm to humid semi cold conditions and the second cluster to humid semi warm to hot semi dry conditions.

Conclusion
This study represents the first step taken in Colombia to analyze the genetic diversity and population genetics with twelve microsatellites of an avocado germplasm (N = 105) together with samples obtained from criollo trees collected in Antioquia department (Northeast) (N= 92).Microsatellites used here were very useful as they were easily standardized, highly polymorphic among cultivars and established different genetic groups according to their geographic origin and elevation differences.Although new techniques based on next generation sequencing have been recently used to characterize crops, microsatellites are still very useful as in a sequence only four nucleotides are studied per nucleotide position, whereas in the case of microsatellites, many alleles can be found per locus meaning their polymorphism is high.Population genetic differentiation was found in Antioquia and in the germplasm avocado accessions suggesting the importance of climatic conditions and elevation to choose selecting genotypes for grafting purposes, particularly criollo avocados found in highlands are likely to be easily grafted with Hass cultivar as this avocado is a cross between Mexican × Guatemalan races that are mainly adapted to highlands (Gross-German and Viruel, 2013).Guzmán et al. (2017) analyzed 318 avocados from the three botanical races from Mexico and found that ecological conditions are important for genetic differentiation between these races as they might have evolved in two climatic condition regions.Population differentiation found here between Colombian departments was also influenced by the Andean Chain of Mountains suggesting that is a barrier to gene flow as avocado farmers share plant material between neighbors.Further studies are necessary to evaluate the results of grafting Hass variety with Colombian criollo trees including gene expression analysis, resistance to pathogens such as P. cinnamomi, resistance to soil salinity and soil drainage effects amongst other characteristics that are important for the improvement of avocado production in Colombia as a certified crop should be characterized at genotypic and phenotypic levels and free of pathogens and pests.

Figure 1 .
Figure 1.Department of Antioquia with all sample sites (municipalities) used for leaves collections where leaves were collected.

Figure. 2
Figure. 2 Departments of Colombia where avocado samples were taken for the germplasm bank.

Table 1 .
Avocado criollo trees sampled in six sub-regions of Antioquia during 2014 and 2015.

Table 3 .
Genetic diversity estimators obtained from criollo avocados collected in (a) Antioquia department (six sub-regions) and (b) maintained at the Genebank collection in AGROSAVIA.

Table 4 .
Hardy Weinberg equilibrium results obtained for (a) six sub regions of Antioquia and (b) the avocado Genbank maintained in AGROSAVIA.

Table 5 .
Assignment probability obtained with STRUCTURE for the Colombian avocado Genbank (N = 105).