Genetic structure of cassava populations ( Manihot esculenta Crantz ) from Angola assessed through ( ISSR ) markers

1 Federal University of Bahia Recôncavo (UFRB), Rua Rui Barbosa 710, CEP: 44380-000, Cruz das Almas, BA, Brazil. 2 Higher Education Polytechnical Institute of Kwanza Sul, Rua 12 de Novembro, Centro, Cuanza Sul, Angola. 3 Federal University of Bahia Recôncavo (UFRB), Rua Rui Barbosa 710, CEP: 44380-000, Cruz das Almas, BA, Brazil. 4 Brazilian Agricultural Research Corporation (EMBRAPA Mandioca e Fruticultura), Rua Embrapa S/N CP 007, CEP: 44380-000, Cruz das Almas, BA, Brazil. 5 Agronomic Investigation Institute (IIA), Malanje, Angola.

mitigate starvation among the poorest populations in Brazil, mainly in family farms in the vast Northwestern semiarid region (Afonso et al., 2014).
Cassava can become the main raw material for a series of products manufactured with its shoots and roots, which can increase the demand for this species and help agricultural transformation and economic growth in developing countries (FAO, 2018).It is common to consume young cassava leaves in Africa; this dish is called quizaca in Angola; in Mozambique, it is known as matapa, the most popular Mozambican dish.In Brazil, cassava leaves are consumed in a dish called maniçoba (Afonso et al., 2014), its roots are boiled and fried, or eaten in the form of flour and crumbs, in filling and pirão (Oliveira, 2011).
The genetic structure of a species can be defined as the distribution of the genetic variability between and within, populations (Brown, 1978;Loveless and Hamrick, 1984).According to Hamrick (1982), genetic structure refers to the homogeneous distribution of alleles and genotypes in space and time.This process results from the action of evolutionary processes (migration, mutation, selection and genetic drift) that act in species and populations.Cruz et al. (2011) highlight that the knowledge about the genetic structures of a population enables predicting, or even clarifying, the ecological and genetic phenomena acting in it.
Studies on the genetic diversity of populations evidence that markers can be used for different means, such as to evaluate the potential of the genetic resources available, to generate connection maps and to detect quantitative trait loci (QTLs), in its association with agronomic features (Benko-Iseppon et al., 2003), as well as to detect polymorphism between genotypes in order to help plant enhancement programs (Ferreira and Grattapaglia, 2008).
Microsatellite and allozyme markers were the ones, among the many existing genetic markers, presenting mendelian inheritance mechanism and codominant expression.This process allows identifying the heterozygous and homozygous genotypes of individuals and became the main information to estimate different parameters of genetic interest (Ferreira and Grattapaglia, 1998;Grattapaglia, 2001;Souza, 2001).However, the most important difference between the two categories of markers refers to their levels of polymorphism.According to Estoup et al. (1998), microsatellite markers are more polymorphic than isoenzymes in natural populations showing more alleles and overall heterozygosity higher than 0.5.
Knowing the polymorphic information content (PIC) is extremely useful to determine the number and size of families adopted for QTL mapping, PIC is the probability of a parent to be a heterozygous in a locus and of the other parent to have a different genotype.Families can be chosen based on parental heterozygosity in order to improve marker information by selecting the most informative markers (Zhu et al., 2001).Accordingly, based on the scarcity of information about studies on the genetic structure of cassava populations in Angola, the aim of the present study was to estimate the genetic diversity of cassava populations in Angola through ISSR molecular markers.It was done in order to identify their genetic variability and potential to be used in cassava cultivation diversification in Angola.

MATERIALS AND METHODS
Leaf samples of 40 genotypes from three Manihot populations were collected in the Experimental Station of Malanje Food Company, located at latitude 8° 49' South and longitude 13° 13' West (IGCA, 2016), altitude 368 m and total area 8960 m 2 .According to the INAMET classification ( 2004), the climate in the region is subtropical humid with mean annual temperature of 26°C, thermal amplitude of 14°C, relative humidity between 80 and 85%, and mean annual rainfall between 1000 and 1200 mm.
Genotypes collected in each one of the three provinces (States) were the populations taken into account: Cuanza Norte (Ndalatando), Malanje and Uíge (Table 1 and in Figure 1).The collected samples were stored in aluminium paper, labeled and sent to Embrapa Cassava and Fruits Molecular Biology Laboratory in Cruz das Almas City, Bahia State, Brazil.Next, they were stored in freezer at -20°C until DNA extraction.
DNA was extracted through the CTAB method by Doyle and Doyle (1987), with modifications.The protocol encompassed 0.1 M of Trisat pH 8.0; 1.7 M of NaCl at 5 M; 20 m Mof EDTA at 0.5 M; 2.4% CTAB; 2.0% (p/v) PVP-40 and 0.4% (v/v) of pre-heated βmercaptoethanol at 65°C in water bath.The DNA stock was stored in freezer at -20°C after its extraction.Total DNA concentration was estimated in 1% agarose gel stained in Blue Juice.Next, DNA was diluted and stored in gelatin until its use.Polymerase Chain Reaction (PCR) was conducted with 32 Embrapa ISSR primers.The PCR mix (Promega Kit) was composed of 2.5 mM MgCl2 (25 mM); 200 mM dNTP (2.5 mM); 1.0 X Go Taq 5X; primer (0.4 µM); 2.0 U Taq (1 U/ 0.2 µl); DNA (15 ng) and milli-Q water at final volume 15 μL per sample.PCR reactions were generated in automatic thermocycler Applied Biosystems with 96-well blocks, when samples were initially denatured at 94°C for 3 min.This process was followed by 39 amplification cycles, each sample was subjected to 94°C cycles for 40 s, to 72°C for 1 min and 72°C for 5 min.The process was over after the cycles at 72°C for 5 min and cooled at 4°C.PCR products were subjected to electrophoresis in horizontal cube, in 3% agarose gel (p/v) buffer and stained in ethidium bromide (1.5 ng/µL) and in reaction buffer 15 ng of DNA, 0.4 primer, 200 mM dNTP 2.5 mM, 1.0 X Go Taq 5X, 2.5 mM MgCl2 25 mM, 2.0 U Taq (1 U/0.2 µL) and milli-Q water in order to complete the reaction volume at 100 V for 3.5 h; 1Hb molecular weight markers were used (Ladder Invitrogen Norgen).The gels were removed from the cube and photographed under ultraviolet light to reveal DNA fragments stained in Blue Juice after the electrophoresis.Primers presenting the largest number of fragments and good resolution were selected; results of primers that showed low intensity or low definition bands were discarded.A binary matrix was elaborated based on the presence (1) and absence (0) of loci.
Initially, the potential of markers to estimate the genetic variability of genotypes was examined by measuring marker information through band counting.Band features of indicators such as total number of bands (TNB), number of polymorphic bands (NPB) and the percentage of polymorphic bands (PPB) were obtained.Three indices were used to determine the most informative ISSR primers: polymorphic information content (PIC), marker index (MI) and resolution power (RP).

Polymorphic information content (PIC)
PIC values for each ISSR locus were calculated through: Where PICi is the polymorphic information content of primer i, fi is the sequence of amplified loci (presence of bands) and 1-fi is the frequency of null alleles (Roldan-Ruiz et al., 2000).

Marker index (MI)
MI was calculated based on Varshney et al. (2009), wherein: The effective multiple relation (EMR) was calculated as follows: Where n= total of indicator amplification fragments / total no. of assessed fragments; β = number of polymorphic fragments/total number of amplified fragments.

Resolution power (RP)
RP was calculated according to Prevost and Wilkinson (1999): Where, Ib represents the polymorphic loci, Ib can be turned into a 0-1 scale through the following formula: Where, p is the proportion of genotypes presenting the locus.Firstly, 10 primers presenting higher values for each one of the calculated indices (PIC, MI and RP) were identified to allow the selection of the 10 best ISSR primers for genetic studies focused on cassava varieties.Primers were selected after this identification by using the following criteria: (1) coincident primers among the three best ones for the three indices (PIC, MI and RP); (2) coincident primers among the three best ones for the two indices (MI and RP); (3) primers presenting the highest RP values; (4) primers presenting the highest MI values.
Genetic diversity parameters such as number of effective alleles (Ne), number of observed alleles (Na), genetic diversity of Nei (H) and Shannon index (I) were calculated in the POPGENE software version v.1.32(Yeh et al., 1997).The analysis of variance was performed in order to check the existing genetic relation between populations and between accessions within each population in the Manihot collection.The significance test of the analysis of variance (ANOVA) was conducted with 1000 permutations calculated in the ARLEQUIN software Ver.3.1 (Excoffier and Lischer, 2005).
The Bayesian grouping analysis was conducted in the STRUCTURE software, v. 2.3.4 (Pritchard et al., 2000) in order to find the number of genetic groups.The tested K number ranged from 1 to 10.The ΔK method by Evanno et al. (2005) was used to find the best K, as implemented in the Structure Havervester (Earl and Vonholdt, 2011).

RESULTS
Thirty-two primers were tested and 18 of them were selected for the genetic diversity study (Table 2), since they were the most informative ones (high polymorphism indices and resolution adequate for the analyses).The selected primers generated 116 bands in total, bands ranged from 4 to 10, thus totaling 6.4 (93.24%) bands per primer, on average; this number indicated increased genetic variability (Table 2).Primers 11 and 27 were the most informative ones and recorded 100 and 90% polymorphism, respectively, and the largest number of bands (10).Primers 42 and 46 (with four bands each) showed the lowest number of bands (Table 2).
The estimated values of genetic diversity indices in M. esculenta populations are shown in Table 3.The percentage of polymorphic loci (P) was high and recorded 48.15% in the Uíge population, 61.11% in Cuanza Norte and 92.59% in Malanje.The number of alleles (Na) in the assessed populations ranged from 1.48 to 1.92 and the effective number of alleles (Ne) Afonso et al. 147 ranged from 1.37 to 1.59.The Shannon index (I) ranged from 0.29 to 0.50 (Table 3).The distribution of genetic variability between and within, populations was calculated based on Nei (1973) (Table 4).The genetic divergence was 0.1690 and it showed that 16.90% of the genetic variability is observed between populations and that 83.10% of the total genetic diversity (HT) calculated through Nei (1973) reached the estimated mean 0.3113.The observed gene flow value in this study reached 2.4585 (Table 4).
The molecular analysis of variance (ANOVA) applied to the three populations confirmed that most of the genetic variation is within populations (89.59%), 10.41% of the genetic variation was between populations (Table 5).There was significant genetic differentiation (P <0.001) between populations.The inter-population genetic divergence value (F TS ) observed in the current study (Table 5) was 0.1041.F ST values between 0.05 and 0.15 suggested moderate genetic differentiation between populations (Wright, 1978).The analysis of the main coordinates (PCoA) was calculated based on coefficients of dissimilarity; its result is graphically presented in Figure 2. The two axes presented 51% total variation, the first axis (PCoA1) recorded 28.54% total variation and the second one (PCoA2), 22.46% of it.
Based on Figure 4, when K=1, genotypes from regions Cuanza Norte, Malanje and Uìge were formed by three great ancestor groups, basically: green, blue and red.Materials from Ndalatando County belonged to group green; group blue was formed by genotypes from Kalandula and Quizenga counties and from Cambondo Colony, whereas Quitoque and Kiongua colonies provided the genotypes in group red.The green group was formed by eight genotypes, group blue by 12 genotypes and the red one by five genotypes.When K=2, it was possible observing that the population in Malanje region basically remained in the Blue group, which is composed of eight genotypes.Although the marker is dominant, there is allelic richness when K=3, and there is allele sharing between one and multiple populations.This group was composed of genotypes from the Malanje and Uíge region and it allowed observing the formation of two ancestor groups: blue and red.Materials from Cambondo do kuige, Comuna do Lombe and Santa Maria colonies belonged to group blue, whereas genotype Mpelo from Quitoque colony belonged to group red.However, the Cambondo do kuige colony and the Lombe commune were formed by two genotypes: Gueti (39) and Kapumba (40), Muringa (36) and Kinzela (37), respectively, whereas Santa Maria and Quitoque colonies by one genotype, Ngana Yuculu (35) and Mpelo (38).

DISCUSSION
The descriptive analysis between M. esculenta populations was based on the amplification of ISSR markers.The selected primers generated 116 bands and this number   The polymorphic information content represents the likelihood of finding the marker in its two states: absence and presence (Roldan-Ruiz et al., 2000).According to Xie et al. (2010), the PIC value of little informative markers ranges from 0 to 25 in little informative markers, from 0.25 to 0.5 in moderately informative markers and it is above 0.5 in markers presenting highly informative content.This value helps ranking the primers based on polymorphism-detection efficiency, which worked as a parameter to select the primers (Costa et al., 2015).Thus, as for the present study, it is possible considering that primer efficiency in indicating polymorphism between genotypes was moderate.
Similar results to those recorded in the current study were observed by Gonçalves et al. (2017), who found moderate PIC value (0.48) in sweet cassava populations by using ISSR markers.Kawuki et al. (2009) observed PIC values between 0.358 and 0.759 and mean value 0.571 by using ISSR markers in cassava germplasm from Africa, Asia and America.In a study about native Prunus armeniaca L. (Rosaceae) populations in Northeastern China, Li et al. (2013) found PIC values ranging from 0.15 to 0.27, and mean value 0.21 by using ISSR markers; primers were considered little informative.Costa et al. (2015) assessed natural mangaba populations (Hancornia speciosa Gomes) by using ISSR markers and found PIC values ranging from 0.26 to 0.44; primers were considered moderately informative.
Polymorphic information content (PIC) is taken into consideration and multiplied by the index that takes into account the effective multiple relations to calculate the marker index (MI).Primers 26 and 27 recorded the highest MI values.Although primer 27 recorded the highest MI value, it was not selected in the present study through the PIC values of markers.According to Tatikonda et al. (2009), there is no ideal specific MI value; thus, it is possible analyzing results recorded for this index by comparing the values of each primer used in the study, the best indices will always be the ones presenting the highest values.
Resolution power (RP) is a parameter that indicates the discriminating power of the marker (Tatikonda et al., 2009).Just as MI, there is no maximum RP value, since it concerns a sum of values.It was observed that primers presenting the best RP values also showed the largest number of fragments.It happens because the formula used to calculate these indices take into account the sum of l b values.Thus, primers capable of amplifying a large number of bands tend to present higher RP values than the ones amplifying a smaller number of bands.Primer 26 also presented high PIC and MI values and this outcome indicates that this marker is quite informative for cassava.It stood out for its high PIC, MI and PR values.The highest MI and RP values were observed in primers 26 and 27, which were the most informative ones.Thus, the discrimination of all accessions can be conducted by the smaller number of primers through the adoption of the most informative primers: 26 and 27.Reduction in the number of primers not only leads to reduction in the time adopted for the analysis, but also to cost reduction (Varshney et al., 2007).According to Smith and Pham (1996), genetic diversity estimates based on dominant markers are sometimes lower than estimates made for co-dominant markers.However, Ge and Sun (2001) state that the general standard lies on detecting higher geneticdiversity levels through dominant markers in comparison to co-dominant markers.
Results about the number of observed alleles found in the present study evidenced that values ranged from 1.48 to 1.92.Agre et al. (2017) observed Na values from 0.23 to 1.0 and mean value 0.66 by using ISSR markers in cassava grown in the Republic of Benin.The Shannon index (I) was considered an intermediate value; according to Giustina et al. (2014), Shannon indices (I) vary from 0 to 1, wherein 0 means null gene diversity and 1 is the maximum gene diversity.Silva et al. (2016) assessed the genetic diversity estimated through microsatellite markers in commercial Cupuaçu crops and observed lower Nei diversity values (0.11) and Shannon index (I) (0.17) in comparison to values recorded in the current study.Nei genetic diversity index (H) values ranged from 0.20 to 0.24.Pádua (2011) assessed a Eremanthus erythropappus population by using ISSR markers and observed genetic diversity indices ranging from 0.26 to 0.38.
Genetic variability results showed that the highest genetic production was observed within populations (83.10%) and this result complied with Hamrick et al. (1991), who stated that there is more intra-specific diversity than inter-specific diversity due to factors affecting the geographic distribution.Genetic diversity, or heterozygosity, is the most used factor to estimate genetic variety since it is less sensitive to variations in sample size.Therefore, sample size becomes the most important parameter when it is compared to other parameters such as the percentage of polymorphic loci and the mean number of alleles per locus, besides its easy genetic interpretation (Brown and Weir, 1983).The result of total genetic diversity (HT) found through the Nei Index (1973) reached estimated mean 0.3113, and this outcome points towards high heterozygosity in Manihot populations.These values were considered high and they indicated that the variability between and within populations would contribute 31.13 and 68.87%, respectively.This value was close to that found by Teixeira et al. (2012), who worked with Campomanesia species populations by using ISSR markers.These authors observed mean value 0.365, and it confirmed that the populations presented genetic variability.
According to Loveless and Hamrick (1984), if the gene flow is restricted, populations would show high divergence.The value observed in the present study (2.4585) showed limited gene flow among the three assessed populations.This value was not enough to counterpoint the effects of genetic drift.This value should be higher than four migrants per generation (Nm> 4), so that the homogenizing effect of the gene flow overlaps the genetic drift (Slatkin and Barton, 1989;Hart and Clark, 1997).Wright (1931) states that gene flow values lower than 1 indicate genetic isolation.According to Wright (1951), gene flow values higher than 1 are enough to stop random allele losses within populations (drift effects).Estimates based on data from dominant molecular values in three cassava populations and calculated through ANOVA evidenced 89.59% genetic divergence, on average, within populations.Genetic divergence results in the present study showed 10.41% distribution between populations.These data are lower than the ones reported by Brandão et al. (2011), but they comply with the numbers recorded by Fernandes (2008), who assessed Caryocar brasiliense populations.
The G ST value showed greater variation within populations than between populations and this outcome corroborated the ANOVA results.The evolutionary geographic and historical distribution plays a relevant role in genetic variation distribution between and within populations (Hamrick et al., 1992).However, this variation may depend on the presence or absence of certain alleles in geographic regions.The expectation is to observe that the greater the geographic distribution the greater the diversity of this species (Bozza, 2009).
The analysis of the mean coordinates (PCoA) was calculated based on coefficients of dissimilarity and graphically presented in Figure 2. The coordinates were calculated for the two first axes with negative Eigen values.The two axes represented 51% total variation; the first axis (PCoA1) represented 28.54% and the second one (PCoA2), 22.46%.Individuals grouped by geographic location indicated that, although they originated from different locations, they belonged to the three populations, except for the first coordinate, which was represented by individuals belonging to the Malanje and Cuanza Norte populations and for the second coordinate, with individuals belonging to Malanje, Uige and Cuanza Norte populations.A result similar to that recorded in the present study was found by Turyagyenda et al. (2012), who observed PCoA1 and PCoA2 values ranging from 23.56 to 20.06%, respectively, and 43.62% total variation by using SSR markers in cassava grown in Uganda.
The analysis of main components depicted the relations between different geographic areas, besides highlighting that the largest number of individuals corresponded to variety Malanje.This analysis showed wide dispersion of individuals and evidenced wide genetic variability distribution.These results also showed that most of the assessed individuals shared an important part of their genetic information.It was possible observing that there were some individuals in the three populations that got far from each other, and this outcome evidenced that these individuals are the ones that mostly contributed to the genetic variability.
The genetic structure in M. esculenta was characterized through Bayesian analysis, which took into account the separation of the total number of individuals in groups (clusters) by attributing a K number of populations to them and by assuming that these individuals present Hardy-Weinberg equilibrium (Figure 3).The ΔK of each K value was calculated (Evanno et al., 2005) and it allowed an easy interpretation of K, which is the most likely value to represent the number of groups in the data matrix.According to Vigouroux et al. (2008), if a certain variety presents a value higher than the arbitrary limit (80%) of its genome in a group, then, it belongs to the referred group.
Knowledge about the genetic structure of the populations is essential for the efficient use of genetic resources, as well as for better understating their evolutionary history (Venkovsky et al., 2007;Clement et al., 2010).Based on Figure 4, when K=1, genotypes from Cuanza Norte, Malanje and Uíge were formed by three great ancestor groups, basically, the green, blue and red ones.When K=2, it was possible observing that the population from Malanje basically remained in group blue, which was formed by eight genotypes.There was allelic richness when K=3, although the marker was dominant; therefore, there was allele sharing between one or the other populations.This group was composed of genotypes from Malanje and Uíge and presented the formation of two ancestor groups, basically the blue and the red ones.
These results suggested genetically defined groups that corresponded to the pre-defined regional groups.Therefore, farmers select cassava varieties for different purposes, and this outcome also highlights material exchange or gene flow between locations, mainly in Malanje, which was the most heterogeneous community within the group.Muhlen et al. (2012) used microsatellite markers and showed groupings by separating sweet from bitter cassava.Their results were similar to those recorded by Emperaire et al. (2003) and Elias et al. (2004), who detected geographic structuration in cassava varieties.This analysis allowed predicting the structure of the populations and the ancestry of the individuals; thus, it contributed to the development of new varieties.The greatest concentration of genetic variability in Malanje and Cuanza Norte was likely related to the exchange and introduction of material in the crops.The observed variability can be explained by the adaptation of the species to different environments.There was low gene flow (Table 4) and wide genetic variability distribution within populations (Table 5); thus, it was possible confirming the genetic structuration between the assessed populations.According to Bozza (2009), the wider the geographic distribution, the greater the diversity of the species.

Conclusion
Molecular featuring evidenced genetic diversity within each assessed population.The structure division in the three main groups showed genetic information sharing and the consequent insertions of individuals belonging to the Malanje population in the Cuanza Norte group.The PCoA analyses led to the evolutionary connection of these areas.The highest concentration of genetic variability in the and Cuanza Norte regions were possibly related to the exchange and introduction of material in the crops.

Figure 1 .
Figure 1.Geographic map showing the collection location of the 40 cassava accessions.

Figure 3 .
Figure 3. ΔK value of the possible grouping of 40 Angolan cassava varieties deriving from 10 structure analysis simulations.

Figure 4 .
Figure 4. Representation of the number of K groups for 40 cassava individuals in the three assessed populations based on ISSR molecular data calculated in the structure software (Three groups, K=3).

Table 1 .
Number of accessions (N), province (State), county and their respective geographic coordinates.

Table 2 .
Description of 32 ISSR primers used in the present study and their respective parameters in Angolan cassava.

Table 3 .
Summary of genetic diversity parameters found by evaluating three Manihot populations in Angola, based on 18 ISSR primers.
Na, Number of observed alleles; Ne, number of effective alleles; H, Nei genetic diversity; I, Shannon diversity index; ± SD, standard deviation ().

Table 5 .
Molecular analysis of variance (ANOVA) applied to the Manhiot esculenta populations assessed based on 18 IRRS markers.
DF, Degree of Freedom; SS, sum of squares; CV, coefficient of variance; TV, total variance; P are the likelihood of having a coefficient of variance higher than the values randomly observed.The probabilities were calculated through 1023 random permutations.FST = 0.10409.Figure 2. Scatter plot of the three assessed cassava populations.