Genetic diversity of Dacryodes edulis provenances used in controlled breeding trials

1 Tree Diversity, Domestication and Delivery, World Agroforestry Centre (ICRAF), P. O. Box 16317, Yaounde, Cameroon. 2 Department of Biochemistry and Biotechnology, Kenyatta University, P. O. BOX 43844 00100, Nairobi Kenya. 3 Tree Diversity, Domestication and Delivery, World Agroforestry Centre (ICRAF), P. O. Box 30677, 00100, Nairobi, Kenya. 4 Tree Diversity, Domestication and Delivery, World Agroforestry Centre (ICRAF), P. O. Box 210, Freetown, Sierra Leone.


INTRODUCTION
The progressively deteriorating situation of the environment under tropics has often been associated to declining rainfall and over-exploitation of phytogenetic resources by local communities in the developing countries (Megevand et al., 2013). Moreover, the degradation of this natural ecosystems due to subsistence agriculture, demographic pressure, urbanization, infrastructure and mining developments, unsustainability harvesting of both timber and non-timber forest products from trees such as Dacryodes edulis (G.Don) H.J.Lam contribute to accelerated deforestation (Alemagi and Kozak, 2010;Phelps et al., 2013). Immediate consequences are lack of availability of wild fruits, medicinal plants and other plant and animal products. This exposes the most vulnerable segment of the communities, the aged, the poor, women and children, to malnutrition and reduced income, as traditionally their livelihoods partly depend on forest products (Cribb, 2010).
D. edulis is an edible fruit tree indigenous to the Gulf of Guinea and central Africa regions. However, due to human activities, its current geographical distribution extends beyond its area of origin to as far as tropical Asia (Kengue, 2002). It belongs to the Burseraceae family, a tropical tree family known for fragrant resins, which include frankincense, myrrh, and copal (Onana, 2008). The species contributes to rural incomes, supplements the local diet and is used in traditional and modern therapies (Anegbeh et al., 2005;Schreckenberg et al., 2006).
In the last few years, agroforestry has been more fully recognized as a land-use practice that can contribute to both biodiversity conservation and livelihood development (Pye-Smith, 2010;Leakey, 2012), and this has led to greater interest in understanding the genetic variation of trees within these systems (Muchugi et al., 2006). This is especially so as the roles of intraspecific diversity in underpinning wider ecosystem functions have become evident. This interest needs to be underpinned by increased genetic research on farm populations of agroforestry tree species such as D. edulis. For most species, how genetic diversity is distributed within geographical space is unclear. Furthermore, how the distribution of this variation depends on the breeding system of a species (Assogbadjo et al., 2011), and on processes such as climate change, floral evolution, forest management and cultivation, is largely unknown (Dawson et al., 2009). Lack of understanding is a disincentive to crop improvement programs and the conservation of genetic resources, because it's difficult or impossible to choose in the different options that are available for management of a particular species, only some options will prove it to be sustainable (Atangana, 2010). Therefore, understanding the molecular basis of this essential biological phenomenon in plants is crucial for the efficient conservation, management and utilization of plant genetic resources (Mondini et al., 2009). It has been demonstrated that DNA markers are the most promising techniques used to differentiate among genotypes at species and subspecies level (Kumar et al., 2009). Furthermore, DNA based markers have become methods of choice in genetic diversity studies, as they analyse variation at DNA level. This excludes all environmental influences and time specificity, since analysis can be performed at any growth using any plant part and requires only small amounts of material (Rao, 2004, Khanam et al., 2012. In this study therefore genetic diversity analysis was done using simple sequence repeats (SSR) DNA based markers. This approach was chosen because it had already been used with success to determine genetic diversity of many tropical species including Allanblackia spp. (Russell et al., 2009;Atangana et al., 2010); Citrus spp (El-Mouei et al., 2011) and Carica papaya (Madarbokus and Sanmukhiya, 2012). Furthermore, comparative studies in plants have shown that SSR markers are more valuable than other DNA based markers and provide an effective means for discriminating between genotypes (Powell et al., 1996;Li et al., 2001). Hence, the limited knowledge regarding the genetic diversity of the D. edulis provenances used in controlled breeding trials at ICRAF has been a disincentive to the breeding program. Thus, the objective of this study was to assess the genetic diversity of three D. edulis provenances used in controlled breeding trials at ICRAF-Cameroon based on SSR marker technique, in order to provide crucial background genetic diversity information for the efficient management and utilization of the collections in the breeding programme.

Plant material
Ninety one genotypes, 40 collected from Boumyebel, 13 collected from Limbe and 38 collected from Kekem were used in the study. Accessions were divided into three populations according to their geographical origin: Boumyebel, Limbe and Kekem. The different sites of Cameroon where the provenances were collected can be distinct in agro-ecological zones namely: (i) humid forest with monomodal rainfall; (ii) humid forest with bi-modal rainfall and (iii) western high lands savannah (   After incubation, 550 μl of Chloroform:Iso Amyl Alcohol (24:1) was added to each tube and the solution mixed by inversion. After mixing, the tubes were centrifuged at 14000 rpm for 15 min at room temperature to spin down cell debris. The upper aqueous phase (contains DNA) was then transferred to a clean 1.5 ml Eppendorf tube. To each tube, 0.7 volume of ice cold isopropanol was added to precipitate the DNA. The solution was mixed by inversion followed by overnight incubation at -20°C. The precipitated DNA was then isolated by spinning the tubes at 14000 rpm for 10 min at 4°C to form a pellet. The precipitated DNA stuck to the bottom of the Eppendorf tube and was visible as a clear solid precipitate. The upper phase was carefully discarded leaving the DNA precipitate at the bottom of the Eppendorf tube. To wash the DNA, 500 μl of ice cold 70% ethanol was added to each tube and mixed gently by inversion. After mixing, the tubes were centrifuged at 14000 rpm for 10 min to pellet the DNA at the bottom of the tube. The supernatant was removed and the DNA washed again with ice cold 70% ethanol. The supernatant was discarded and the DNA pellet allowed to air dry for approximately 45 min. The DNA was re-suspended in 100 μl ddH2O). To remove RNA, 4μl of RNase Cocktail (RNase A (500 U⁄ml) and RNase T1 (20,000 U⁄ml) was added and the solution incubated at 37°C for 30 min. The isolated DNA was stored at 4°C until analysis.

DNA quality and concentration analysis
Quality and quantity of DNA samples were determined by comparison to a DNA standard marker (Lambda DNA, Promega USA). 2 μl of each DNA sample was mixed with 1 ul of SYBR® green fluorescent dye and 2 ul loading buffer (xylene cyanol, sucrose and bromophenol blue). Each mixture was then loaded into a well prepared on 0.8% agarose gel. Electrophoresis was then carried for 1 h at 80V and DNA sample band intensities were compared with that of the DNA standard marker. Moreover, the concentration and purity of the DNA was determined using the spectrophotometer at an absorbance ratio of 260 nm (A260) and 280 nm (A280). Quantification of doublestranded (ds) DNA was achieved by using 1 μl of each extract and scanning its absorbance over a 200 to 300 nm range with Nucleic Acid settings applied for the instrument. Based on this reading all extracts were standardised to a concentration of 20 ng/μl using necessary dilution.

SSR analysis
A total of 6 polymorphic SSR markers published for D. edulis by Benoit et al. (2011) were used to screen 91 D. edulis DNA samples (Table 2). PCR amplification reactions were performed with MyTaq DNA polymerase (Bioline). The PCR reactions were set up as follows: 4 μl of 5x MyTaq Reaction Buffer Colorless comprising 5 mM dNTPs solution and 15 mM MgCl2, 2 μl of 10 μM forward primer labelled with either 6-Fam or Vic or Ned or Pet (Applied Biosystems), 2 μl of 10μM reverse primer (unlabeled) solution, 0.4 μl MyTaq DNA polymerase, 3 μl of diluted 20 ng/μl DNA, and 8.6 μl of ddH2O to bring the total volume to 20 μl. The amplification was carried out using GeneAmp® PCR System 9700 thermal cycler (Applied Biosystems, USA). For each SSR amplification process, an initial denaturation of DNA at 94°C for 3 min was followed by 35 cycles of denaturation at 94°C for 20 s, annealing at 57°C for 1 min and extension at 72°C for 1 min with the final extension at 72°C for 10 min. The PCR products were then stored at 4°C awaiting analysis.

Electrophoresis and PCR product visualization
The PCR products were separated on 2% agarose gels using 1X TBE buffer (44.5 mM boric acid, 1 mM EDTA and 44.5 mM Tris base). The DNA fragments were stained with 1 µL of SYBR® green fluorescent dye and mixed with 2 ul loading buffer (xylene cyanol, sucrose and bromophenol blue). Each well was loaded with 5 µL of the mixture (PCR products, SYBR® green fluorescent dye and loading buffer). 1 µL of 100 bp DNA Ladder (Invitrogen) was loaded in the first well of each gel. Electrophoresis was carried out for 11/2 h at 80 V. Subsequently, the DNA bands were visualized using a UV transilluminator and photographed. The sizes of the amplified DNA fragments were determined by comparing the migration distance of amplified fragments relative to the molecular weight of the 100 base pairs (bp) DNA ladder.

Scoring of DNA fragments
DNA fragments were pooled together along with internal size standard (GeneScan™ 500 LIZ® from Applied Biosystems, USA) based on dye label. The PCR products were co-loaded based on dye label, to reduce the cost of genotyping. ABI 3730xl Genetic Analyzer (Applied Biosystems, USA) was then used to carry out Capillary electrophoresis. Genetic Analyzer (Applied Biosystems, USA) employs a fluorescent based capillary detection system that uses polymer as the separation matrix (Kitavi et al., 2014). This facilitates the accurate sizing of the alleles. Based on the relative migration of the internal size standard, data generated by the ABI 3730xl Genetic Analyzer was then analyzed using GeneMapper 3.5 software (Applied Biosystems, Foster City, California, USA) and the allele sizes scored in base pairs (bp) based on predefined analysis protocols. Only alleles with a relative fluorescent unit of > 500 were scored.

Gene diversity, heterozygosity, and polymorphism
Gene diversity, heterozygosity, and polymorphism information content (PIC) for all accessions were calculated using PowerMarker Version 3.25 (Liu and Muse, 2005). Gene diversity was defined as the probability that two randomly selected alleles from the accessions are different. PIC was used to deduce the polymorphic in-formativeness of each SSR primer as described by Weir (1996). PowerMarker Version 3.25 was also used to generate the major allele frequency. The major allele frequency is the frequency of the most common allele at a locus (Lu et al., 2011).

Analysis of molecular variance
The analysis of molecular variance (AMOVA) was done by employing GenAlEx software version 6.5 (Peakall and Smouse, 2012) in the aim of assessing the structure of genetic diversity among D. edulis provenances. AMOVA revealed the partitioning of genetic variation among populations, among individuals, within populations and within individuals. The variance components were used to calculate F-statistics which is useful for analysis of gene flow levels, where FST reflects genetic differentiation; FIT is the inbreeding coefficient within individuals relative to the total and FIS is the inbreeding coefficient within individuals relative to the subpopulation (Wright, 1978;Kiambi et al., 2005;Kahiu et al., 2013). F-statistics values were tested by 99 permutations for significance in all analyses (Peakall and Smouse, 2012).

Cluster analysis
Cluster analysis is the partitioning set of objects into groups so that objects within a group are similar and objects in different groups are dissimilar. It is efficient in grouping objects with similar characters (Hodgkin et al., 1995). Genetic relationships among the provenances were displayed by classification using dissimilarity matrix (dissimilarity = 1similarity). The dendrogram was constructed with the weighted neighbor joining method on the basis of allelic data from 6 SSR markers across the D. edulis accessions using DARwin software Version 5.0 (Perrier et al., 2003;Perrier and Jacquemoud-Collet, 2006). Average linkage cluster between provenances was analyzed. Clusters were defined based on their unique characters.

Principle coordinate analysis
The genetic relationships among the genotypes were also analyzed using PCoA. PCoA was based on standardized covariance of genetic distances calculated for codominant markers. Genetic distances were computed using GenAlEx software version 6.5 (Peakall and Smouse, 2012). Genetic distance matrices generated from the SSR data sets were then subjected to Principal Coordinate Analysis by employing GenAlEx software version 6.5 (Peakall and Smouse, 2012).

DNA quality and quantity
The quality and quantity of extracted genomic DNA is vital for successful SSR analysis. Agarose gel 0.8% was used for quality control of genomic DNA. Bands were visualized using a UV transilluminator and photographed. The quality and quantity of DNA extracted from samples were determined by comparison to a 50 ng/µl DNA standard marker (Lambda DNA, Promega USA). The extracted DNA produced strong bands of high molecular weight with little or no smear and thus the samples were of the required quality and quantity suitable for PCR amplifications (Figure 2).

DNA concentration and purity
To ensure that a sufficient amount of DNA template is used in analysis it is important to determine the concentration and purity of DNA of samples. However, the presence of residual RNA leads to overestimation of the concentration and purity of DNA hence correct RNA elimination is important. The 260/280 ratio is a good estimate of extract purity and therefore the concentrations and purity of the DNA samples were determined using the NanoDrop 3300 spectrophotometer  (Applied Biosystems) at an absorbance ratio of 260 nm (A260) and 280 nm (A280) (Figure 3). No extracts displayed a 260/280 ratio > 2.0 which suggested there was minimum levels of residual RNA present (Table 3, Appendix 1). The DNA concentrations ranged from 941.3 to 6738.8 ng/μL, which was sufficient for dilution and  Table 3. Upon quantification, all samples were normalized to a final concentration of 20 ng/µl by adding variable volumes of double distilled water to a final volume of 100 μl.

PCR product visualization
The success of PCR amplification was checked through electrophoresis. Agarose gel 2% was used during electrophoresis. Bands were visualized using a UV transilluminator and photographed. Successful amplification of the amplified DNA fragments was determined by comparing the migration distance of amplified fragments relative to the molecular weight of the 100 base pairs (bp) DNA ladder. Successful amplification was noted in all the six primer pairs used in the analysis. Strong bands were seen on the gel images and these bands were within the required base pair size ranges relative to the 100 base pairs (bp) DNA ladder bands and thus suitable product yields were obtained. Moreover, primer dimers were not observed thus amplification was efficient and the amplification products were suitable for genotyping. Below is an agarose gel electrophoresis image showing PCR Amplification Products with Primer LB12 (Figure 4).

Scoring of amplified DNA fragments
GeneMapper 3.5 software was used to display SSR profiles generated on capillary sequencer and to score allele sizes of all the 91 D. edulis PCR amplification products for the 6 SSR primers used in this study. SSR profiles had good peaks, high heterozygote peak height ratios and were free of artefacts ( Figure 5). The peaks were sized and the alleles called based on predefined analysis protocols.

Polymorphism, heterozygosity, and gene diversity
The number of alleles at a determined SSR locus (allelic richness) is the simplest measure of genetic diversity (Saavedra et al., 2013). The six SSR markers used in this study produced a total of 61 alleles, with an average of 11.1667 alleles per marker. Marker CE09 detected the highest number of alleles with 17 alleles while CBE09, LD06, LB12 and CC01 had 14, 12, 11 and 6 alleles, respectively. Marker CG11 detected the least number of alleles with 6 alleles ( Table 4). The average number of alleles per marker detected in this study is comparable to that reported by Benoit et al. (2011) for D. edulis. The number of alleles per locus observed by Benoit et al. (2011) ranged from 3 to 15, with an average of 7.3.
The polymorphism information content (PIC) was defined by Botstein et al. (1980) as the measure to calculate the discrimination power and informativeness of the SSR markers. PIC value is therefore a measure of polymorphism among genotypes for a marker locus used in genetic diversity analysis since it reflects allelic diversity and frequency among the genotypes. Microsatellites, which provide these polymorphic regions of the genome, allow estimation of such parameters as average observed and expected heterozygosities. All the 6 primers were polymorphic across the 91 D. edulis samples. PIC values varied greatly for all tested SSR loci and these values ranged from 0.2652 to 0.8737 for the loci CG11 and CE09, respectively, with a mean of 0.5632 (Table 4). The highest PIC value 0.8737 was obtained for CE09 followed by (0.7525) for CB09. The results of Todou et al. (2013) are in agreement with this finding, since they reported that the most polymorphic loci were CB09 and CE09. Furthermore, according to Ganapathy et al. (2012), markers with PIC values exceeding 0.5 are very efficient in discriminating genotypes and extremely useful in detecting the polymorphism rate at a particular locus. In this study four markers (CC01, LD06, CB09 and CE09) had PIC values that exceeded 0.5 and therefore were the most useful in detecting the rate of polymorphism.
The observed heterozygosity (H o, also known as heterozygosity) values ranged from 0.0220 to 0.5385 for the loci CG11and LD06, respectively, with an average of 0.3132. Meanwhile, gene diversity (H e , also known as expected heterozygosity) was 0.5851 on average and ranged from 0.2763 (CG11) to 0.8846 (CE09). The estimates of gene diversity are comparable with those estimated by Benoit et al. (2011). Benoit et al. (2011)   reported that polymorphism was widely variable among loci and that expected heterozygosities ranged from 0.06 to 0.84 with a mean value of 0.49. Consequently, an average level of genetic diversity was observed at population levels. Generally these findings illustrates that although the 6 D. edulis SSR markers used in this study present a high discriminating power, however, they are under selective pressure. This is typically and/or ideally, only the best individuals are being brought into domestication programmes by farmers, so domestication is generally considered to reduce the genetic diversity of the species being domesticated (Miller and Gross, 2011;Graefe et al., 2013;). Moreover, the expected heterozygosity across the 6 loci was higher than the observed heterozygosity. The major differences between the observed heterozygosity and gene diversity for all the microsatellite markers was therefore an indication of the presence of inbreeding among the accessions.

Analysis of molecular variance
Analysis of molecular variance (AMOVA) was chosen for partitioning the total molecular variation present at the inter-and intra-population levels. A low inter-population variation was found (2%). Furthermore, the AMOVA results (Table 5) showed that most of the molecular variance (98%) was distributed within populations with 45% of this variation partitioned among individuals and 53% within individuals. The results of Todou et al. (2013) are also in agreement with this finding, since they reported that only 3% of the total detected genetic diversity was inter-population.
The above variance components were used to calculate F-statistics. F-statistics values are useful for the analysis of the magnitude of genetic differentiation, where F ST provides a measure of the genetic differentiation between subpopulations; F IT is the inbreeding coefficient within individuals relative to the total and F IS is the inbreeding coefficient within individuals relative to the subpopulation (Kiambi et al., 2005in Wright, 1978. The observed overall inbreeding coefficient F IT (0.473) revealed that the D. edulis populations studied were highly inbred overall. This inbreeding can partially be attributed to inbreeding within individual accessions F IS (0.463). Consequently a low level of genetic differentiation was detected as exhibited by the low fixation index (F ST =0.18) ( Table 6). This F ST value was slightly lower than that observed in a genetic diversity study of different populations of D. edulis from Cameroon and Gabon (Todou et al., 2013). Todou et al. (2013)  reported F ST = 0.03 for the two populations of D. edulis from Cameroon and Gabon on the basis of five different SSR loci. Wright (1978) cited by Kiambi et al. (2005) suggested that an F ST ranging between 0.15 and 0.25 indicates large differentiations, 0.05 to 0.15 moderate, and 0 to 0.05 little differentiations. F ST exceeding 0.25 suggest extremely large differentiations. Hence this study revealed low levels of genetic differentiation among populations of D. edulis accessions studied. Additionally, these findings indicate that most of genetic diversity detected in this study is intra-population. The low level of genetic differentiation among the populations studied and the fact that most of the genetic diversity detected in this study is intra-population can be attributed to high levels of gene flow. This is because populations which have a high common level of gene flow display lower Fst values than populations of a lower gene flow. This result is fairly in line with that pointed out by Flynn (2009).

Cluster analysis
Traditionally, plant breeders have always evaluated plant origins and pedigrees as an extra indication of the characteristics that were acquired by a new genotype. With the advent of molecular marker techniques, new tools that have the capacity to reveal similarities in the genome of related plant species or cultivars directly became available. Knowledge of genetic similarity between genotypes is useful in any breeding program because it facilitates efficient sampling and utilization of germplasm resources. The breeder can use genetic similarity information to make informed decisions regarding the choice of genotypes to cross for the development of populations or to facilitate the identification of diverse parents to cross in hybrid combinations in order to maximize the expression of heterosis (Hemeida et al., 1998).
Cluster analysis separated the accessions based on genetic variability. Accessions with similar genetic traits were grouped together irrespective of the collection region. Genetic relationships among the populations were displayed by classification using dissimilarity matrix (dissimilarity = 1similarity) and a dendogram constructed ( Figure 6). The 91 D. edulis accessions were divided into three major Clusters I, II and III. The first major Cluster I had two sub clusters, A and B: (i) Subcluster A had two groups 1 and 2 and comprised a total of thirty five accessions. Group 1 had the highest number  of accession among the two groups of sub-cluster A of major Cluster I; (ii) Sub-cluster B comprised two groups1 and 2 consisting only fourteen accessions. Major Cluster II had two sub-clusters, A and B comprising a total of thirty-two accessions. Major Cluster III also had two subclusters, A and B. However it had the least number of accessions among the three major clusters. D. edulis genotypes from Boumnyebel and Kekem were placed in all the three major clusters. On the other hand, none of the accessions from Limbe was placed in major cluster III. From the neighbor joining tree, it was clear that there was a close genetic relationship between the D. edulis populations analyzed in this study. The results of cluster analysis also showed that there was low relationship between genetic divergence and geographical origins.

Principle coordinate analysis
Principal coordinate analysis (PCoA) was also used to visualize the genetic association of genotypes, which is primarily explained by the first two principal coordinates. This multivariate approach was chosen to complement the cluster analysis information since cluster analysis is more sensitive with closely related individuals, while PCoA is more informative regarding distances among major groups (Ghosh et al., 2014). The first and second coordinates extracted 20.78% and 8.80% of the total molecular variation, respectively (Table 7). PCoA also showed no clear separation of the populations and thus was consistent with cluster analysis. The genotypes were scattered in no distinct and meaningful groups, either based on population and/or geographical origins ( Figure  7). The two-dimensional plot of the Principal Coordinate Analysis (PCoA) of D. edulis genotypes constructed from the 6 SSR marker data is similar in many respects to that of an AFC (factorial analysis of correspondence) plot constructed by Todou et al. (2013) from 5 SSR marker data.
In general, very little differentiation was observed among the populations with respect to the F-statistics values. In addition, AMOVA revealed that most of genetic diversity detected in this study was intra-population while cluster analysis and Principal Coordinate Analysis revealed close genetic relationships among the populations. Consequently these results clearly illustrated the contribution of gene flow to genetic differences among populations. In low input farming systems, gene flow can be considered a function of pollen flow and seed exchange, which, in different ways, are influenced by natural and human selection pressures. D. edulis is dioecious species (Kengue, 2002) but there are individuals who carry both female flowers and male flowers, and individuals who bear hermaphrodite flowers and this therefore favors self-pollination. D. edulis fruits and seedlings are also usually exchanged between relatives, neighbors, communities and/or villages facilitating seed dispersal Takoutsing et al., 2013). In addition, transactions allow the transport of commercial fruits (seeds) for long distances in search of markets and thus farmers select fruits of good quality in order to create markets for their crops (Degrande et al., 2012). This contributes to a significant deficit in genetic diversity and genetic differentiation among populations. Nonetheless, it is important to note that the markers used in this study were not linked to any traits of interest and this could be the next step in D. edulis genomics. To allow visualization of genotypes that harbor important traits such as shorter juvenile phase to first fruit production, less competitive rooting system if intercropped and quality fruits of desired traits, there is need carry out further diversity studies on these genotypes using markers that are linked to specific traits. This will allow scientists to determine if there are multiple or different mechanisms that control these traits and which genotypes have the traits in order to transfer these traits to the best yielding and most popular varieties. Hence, markers linked to these traits will allow the pyramiding of these traits into a select few D. edulis genotypes. The planned sequencing of D. edulis genome by the African Orphan Crops Consortium (AOCC) will be a major step towards achieving this objective.

Conclusion
This study provided a detailed analysis and quantification of genetic diversity in three D. edulis provenances from Cameroon. Moreover, the present study established that the high common level of gene flow obtained could help in maintaining low genetic differentiation and closing genetic relationships of D. edulis populations. This can be attributed to inbreeding and the exchange of seeds between farmers either through relatives, and/or markets. Hence, broadening the genetic base of the studied D. edulis genotypes may be achieved through introgression of new alleles. Furthermore, the study also highlighted the need of isolation and characterization of more DNA markers in D. edulis and their uses in advanced studies such as gene discovery, trait linked marker assisted selection and gene mapping.