Diversity and genetic structure in natural populations of Croton linearifolius (Euphorbiaceae) based on molecular markers

Croton linearifolius Mull. Arg, an endemic species of Brazil, has insecticidal activity proven. To the detriment of its importance as a natural resource, the studies of this species are incipient, as well as the strategies applied to its management and conservation. The diversity and genetic structure in 61 individuals of C. linearifolius collected in the National Forest Contendas do Sincorá (NFCS) were estimated. Estimates were based on analysis of the amplification profile of nine combinations of pairs of resistance gene analogs (RGA) primers and eight inter simple sequence repeat (ISSR) primers. A total of 134 markers (81.3% polymorphic) were generated. Bayesian analysis indicated as most likely a structuring into two groups. Based on the molecular analysis of variance (AMOVA) it was possible to verify that 64% (p rand <0.01) of the variation occurs within the regions, and a significant amount (36%) (p rand <0.01) was attributed to variations between regions, indicating genetic structure between them. The AMOVA results were corroborated with the Principal Coordinate Analysis (PCoA), indicating an association between the distribution of variability and the geographical distribution of the collection regions. The probable mechanisms pollination and dispersion would justify, at least in part, the genetic structuring observed for C. linearifolius.


INTRODUCTION
The genus Croton L. (Euprobiaceae), with about 1,200 species, stands out the second largest in the Euphorbiaceae family, with most of its species dispersed in the tropical regions of the world (Govaerts et al., 2000). Brazil is considered one of the main centers of diversity of the genus Croton, with at least 350 species distributed in different regions of the country (Berry et al., 2005). The highest concentration of species of this genus occurs in the Southeast and Northeast regions, with approximately 160 and 120 species, respectively (Cordeiro et al., 2015).
This study highlights the species Croton linearifolius, popularly known as "velame pimenta", with proven insecticidal potential against Aedes aegypti  and Cochliomyia macellaria (Silva et al., 2010), which corroborates the traditional use indicated in the semi-arid region of Brazil (Silva et al., 2010. This species also presents diverse chemical composition, being rich in alkaloids, steroids and flavonoids, among other potentially promising compounds (Silva et al., 2010).
C. linearifolius is endemic in Brazil, with occurrence records in the states of Minas Gerais, Piauí, Tocantins and greater representation in Bahia (Cordeiro et al., 2015) (Figure 1), it has shrubby habit and can measure up to 2 m high. The plants of this species are monoecious, with lanceolate leaves and discolores, bisexual inflorescences, the fruits are dehiscent, the seeds measure about 4.0 mm long and 2.5 mm wide, with brown coloration (Lima and Pirani, 2008).
The exploitation of native species by the population is generally based on exploratory extraction, which can lead to a reduction of biodiversity (Mondini et al., 2009). Among the different components of biodiversity, genetic diversity is the basis for the evolutionary potential of a species and determines its chances of survival, reproduction and adaptation to possible environmental changes (Fleishman et al., 2001). Therefore, knowledge of the diversity and genetic structure of species that present potential as a natural resource is a prerogative for the elaboration of strategies for sustainable management and conservation of genetic resources. In this context, genetic-molecular markers are an important tool for population studies and have been used in the study of a wide range of organisms (Ibrahim et al., 2010).
Resistance gene analogs (RGA) and inter simple sequence repeat (ISSR) markers are examples of markers that do not require prior information on the genome of the species to be evaluated and enable a rapid and low cost characterization, especially useful for species not yet studied and with commercial interest still little explored.
In detriment of the ecological importance and economic potential of Croton species, population genetic studies are limited to a few species (approximately 1%), which hampers the development of management and conservation strategies that may help in the conservation and sustainable use of this biodiversity. Specifically for C. linearifolius the genetic-molecular studies are limited to the works of Scaldaferri et al. (2013Scaldaferri et al. ( , 2014 and Silva et al. (2018).
In this study, the genetic diversity and structure of natural populations of C. linearifolius were characterized and discussed, and present in the National Forest Contendas do Sincorá (NFCS), based on the application of RGA and ISSR molecular markers.

Characterization of the biological material
The study was conducted from DNA samples from 61 individuals of C. linearifolius collected in three regions understood here as populations, present in the National Forest Contendas do Sincorá (NFCS), in the municipality of Contendas do Sincorá, Bahia, Brazil ( Figure 2 and Supplemental Material 1). The exsicatas were deposited in the herbarium of the Universidade Estadual de Feira de Santana, under registration HUEFS 146620. The DNA samples were previously extracted from fresh leaves using the cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle, 1990) with modifications previously tested for Croton by Scaldaferri et al. (2013). The samples were deposited in the genomic DNA bank of the Laboratory of Applied Molecular Genetics (LAMG), at the Universidade Estadual do Sudoeste da Bahia (UESB), Itapetinga, Bahia, Brazil.
The integrity of the DNA samples was evaluated in 1% agarose gel (m/v) by electrophoresis (2 h in 90 V electric current) and visualized from loading dyes (bromophenol blue and xylene cyanol) and GelRed (Invitrogen Co., Carlsbad, CA, USA) according to manufacturer's specifications, in Kodak photodocumentation system, with incidence of ultraviolet (UV) illumination. To estimate the DNA concentration (ng/μL), the lambda molecular weight marker (Invitrogen Lambda DNA) was adopted as standard.
In the amplification reactions, the following amplification programs were adopted: for primers RGAs; 5 min at 95°C; followed by 34 cycles (30 s at 95°C, 1 min at 37°C, 1 min and 20 s at 72°C); and 10 min at 72°C. For ISSR primers: 95°C for 5 min; followed by 34 cycles (94°C for 50 s, 48°C for 50 s, 72°C for 1 min); and 5 min at 72°C. All amplifications were performed in the LGMA of the UESB.
Aliquots (5 μL) of the amplification products were visualized from electrophoretic run, on 2% agarose gel (m/v) and 1x TBE solution, for approximately 2 h at 110 V. The visualization of the result of the electrophoretic run was performed with a Kodak photodocumentator, under the influence of ultraviolet light.  AGT TTA TAA TTY EAT TGC T   14  RGA2R  CAC ACG GTT TAA AAT TCT CA  RGA1F  AGT TTA TAA TTY EAT TGC T   15  RGA7R  CCG AAG CAT AAG TTG GTG  RGA1F  AGT TTA TAA TTY EAT TGC T   16  RGA4R  TAC ATC ATG TGT TAC CTC T  RGA4F  TGT TAC TGC TTT GTT TGG TA   17  RGA5R  TCA ATC ATT TCT TTG CAC AA  RGA5F TGC TAG AAA AGT CTA TGA AG

Data analysis and genetic diversity estimation
The obtained band standards, with analysis of the gels by two evaluators, were considered for the construction of a binary data table, where zero (0) was assigned for the absence of bands and one (1) for presence. Based on the table, the percentage of polymorphism was calculated. In order to maintain only the markers that presented a percentage less than 20% of missing data (that is, data whose identification of presence or absence of bands cannot be determined with certainty). Analyses of population structure were performed with the Bayesian method using the structure software, version 2.3.4 (Pritchard et al., 2000). Considering that the present study was conducted using natural populations, we used an admixture model with independent allele frequencies in each population. The burn-in period and replication numbers were set to 100,000 and 1,000,000, respectively, for each run. The number of groups (K) was systematically varied from 1 to 10 and 20 simulations were performed to estimate each K. The ΔK ad hoc method described by Evanno et al. (2005) and implemented in the online tool Structure Harvester (Earl and vonHoldt, 2012) was used to estimate the most likely K in each set. After estimating the most likely K, the greedy algorithm implemented in CLUMPP v.1.1.1 (Jakobsson and Rosenberg, 2007) was used with a random input order and 1000 permutations to align the runs. Based on the posterior probability of membership (q) of a given accession belonging to a given group compared to the total number of groups (K), individuals were classified with q > 0.60 as a member of a given cluster, whereas for clusters with membership (q) values ≤ 0.60, the accession was classified as admixed.
Estimation of genetic similarity was performed based on the Jaccard coefficient, with the aid of the GENES program (Cruz, 2006). The genotype grouping was performed by the Neighbor Joining method, through the DARwin program (Perrier and Jacquemoud-Collet, 2009). Principal coordinate analysis (PCoA) and Molecular Variance (AMOVA), with 999 permutations were performed using the GenALEX v.6.5 program (Peakall and Smouse, 2012) in order to observe the genetic variation between and within the regions collection.

RESULTS AND DISCUSSION
A total of 134 markers were generated after the amplification reactions in 61 individuals of C. linearifolius.
After sorting the data, nine and eight RGA and ISSR primers were maintained in the analyses, respectively. The combinations of selected RGA primers produced a total of 69 markers (82.6% polymorphic) with an average number of 7.6 markers per primer pair. The combinations 18 (RGA6R + RGA6F) and 19 (RGA1R + RGA4F) generated the lowest (three) and the highest (13) number of brands, respectively (Table 3). The amplification reactions with the ISSR primers produced a total of 65 markers (80% polymorphic), with an average number of 8.1 marks per primer. The primers DiCA3`RG and TriAAC3`RC produced the lowest (6) and highest (10) number of marks, respectively (Table 4).
Although there are no studies with RGA markers for the genus Croton, results similar to the polymorphism observed with such markers for C. linearifolius were observed in different plant species, such as sugarcane (86.5%) (Jayashree et al., 2010) and rice (96%) (Ren et al., 2013). These results reinforce the potential of RGA markers to access genetic polymorphism. The polymorphism results obtained with the use of ISSR primers in C. linearifolius were similar to the results presented for other species of the genus Croton, such as Croton tetradenius (94.8%) (Almeida-Pereira et al., 2017) and Croton heliotropiifolius (94%) (Rocha et al., 2016).
Considering Bayesian analysis (based on Delta K values), the main structuring was observed in two gene pools (Delta K score = 1400) and possible sub-structuring of three or four gene pools (Delta K-400 and K ~200, respectively), as the most probable compositions for the 61 individuals of C. linearifolius (Figure 3). The histogram showing the level of structuring in two gene pools ( Figure  4A) reveals a predominance of the gene pool represented by blue color in regions 1 (q = 0.78) and 2 (points 2, 3, 4 and 5) (Mean of q = 0.73), whereas in region 3 (points 6, 7, 8, 9 and 10) there is a predominance of the gene pool represented by green color (mean of q = 0.89). Considering a sub-structuring in three gene pools ( Figure 4B), although there are no  significant changes in the composition of regions 1 (q = 0.82) and 3 (mean of q = 0.89), the change in composition of region 2 that presents the predominant presence of a distinct gene pool represented by red color (mean of q = 0.62). The structure observed with Bayesian analysis was corroborated by the results obtained from the molecular analysis of variance (AMOVA), where it was possible to verify that 64% (p rand <0.01) of the variation occurs within the regions, and a significant amount (36%) (p rand <0.01) was attributed to differences between regions, which indicates genetic structuring between them. The AMOVA decomposition presented in the evaluation matrix pair to pair among the regions (Table 5) indicates a greater genetic distance between regions 1 and 3 (45.8%) and lower between regions 2 and 3 (29.7%). A similar result was observed for the species Croton antisyphiliticus which according to Oliveira et al., Principal structure in two gene pools (Delta K score = 1400) (A) and sub-structuring in three gene pools (Delta K ~ 400 scores) (B). The colors used in the histograms represent the most probable ancestry of the group from which the individuals were derived. The genetic structure among plant populations is influenced, among other factors, by the effects caused by the dispersion of pollen and seeds, which are directly related to the connectivity among the populations, determining the gene flow rates (Zanella et al., 2012). Pollinators and seed dispersers that have effective dispersion over long distances decrease the likelihood of differentiation between populations, while dispersion at restricted distances has the opposite effect, promoting population genetic structuring (Loveless and Hamrick, 1984).
Although there are no records in the literature of studies on the reproductive biology of C. linearifolius, studies carried out with species of the same genus show that there is predominance of pollination by the wind and/or by different insects [Croton suberosus (Domínguez and Bullock, 1989), Croton floribundus and Croton priscus (Passos, 1995), Croton sarcopetalus (Freitas et al., 2001), and Croton urucurana (Pires et al., 2004)]. Considering that insect-pollinated species tend to have lower gene flow rates and consequently greater genetic differentiation, since the distances traveled are relatively small (Loveless and Hamrick, 1984) it is expected that genus Croton will occur between populations.
The primary autochoric dispersion due to fruit dehiscence and secondary zoocoric dispersion due to associations with several ant species is a common strategy among species of the genus Croton. The presence of the elaiosomes is an attraction for the ants that act as secondary dispersers, thus reducing predation and competition between the seedlings (Webster, 1994;Passos and Ferreira, 1996;Leal et al., 2007;Lobo et al., 2011). The autochoric species tend to exhibit spatially limited seed dispersion, at maximum distances ranging from 3.4 (Passos and Ferreira, 1996) to 8.0 m (Narbona  , 2005). In addition, ants can travel up to 2.5 m away (Passos and Ferreira, 1996), which can also result in greater genetic differentiation between populations. Therefore, the pollen and dispersion strategies observed for the genus and probable for C. linearifolius may be related to the low inferred gene flow between regions 1 and 3, based on STRUCTURE and AMOVA results, as well as the lower structure between these and region 2, which is located in the intermediate region. Similar structure, with existence of three gene pools and presence of greater differentiation between extreme collection points, was observed by Rocha et al. (2016) for C. heliotropiifolius from estimates made with ISSR and RAPD markers.
In addition to the mentioned factors, C. linearifolius is a monoic species, characterized by inflorescences with staminate (male) flowers in the terminal part and pistil (female) flowers in the lower part (Lima and Pirani, 2008). This arrangement of male and female flowers on an inflorescence may facilitate the occurrence of selffertilization, a fact that has already been demonstrated in C. floribundus and C. priscus (Passos, 1995); C. suberosus (Domínguez and Bullock, 1989) and C. sarcopetalus (Freitas et al., 2001) that present flowers of similar morphology to C. linearifolius, which contributes to the increase of inbreeding.
The results obtained through the PCoA analysis ( Figure  5) corroborate the hypothesis of sub-structuring in three gene pools obtained by Bayesian analysis (Figure 4B), corresponding to the geographical distribution of the three collection regions. Figure 5B shows the groupings of individuals delimited by rectangles of color corresponding to those used to represent the gene pools in the histogram. Several individuals collected in region 2 presented high q values for the characteristic gene pool of this region, represented by the red color in the histogram (q ≤ 0.60) ( Figure 4B and Supplementary Material 1). These results are in agreement with the collection regions and corroborate with the dispersion of individuals from that region on the PCoA plot ( Figure 5B).
Based on the PCoA plot ( Figure 5B) it is possible to observe the occurrence of individuals characteristic of region 2 for the gene pool characteristic of the individuals of region 3 (Supplemental Material 1), a fact that was corroborated by the molecular analysis of variance presented in the matrix of pair to pair evaluation among regions (Table 5), which indicated genetic structuring between individuals in regions 2 and 3 (approximately 30%). The observed result may be a reflection of the connectivity between the regions having the region 2 as intermediate, due to its geographical location between points 1 and 3.
In spite of the complementarity observed between the Bayesian analysis, PCoA Analysis and AMOVA results, there was no correspondence with the distance data projection (both by the Neighbor Joining grouping method and by the UPGMA) estimated by the Jaccard coefficient (data not presented). Considering that the choice of coefficients and clustering methods can directly influence the results of the projections (both in scatter plots and in dendrograms) (Scaldaferri et al., 2014;Cerqueira-Silva et al., 2009), it is probable that this absence of association is associated with the influence of the statistical method intrinsic to the procedures of distance estimation and/or clustering method.

Conclusion
The RGA and ISSR molecular markers were efficient in accessing polymorphic markers in C. linearifolius. The populations of C. linearifolius are structured in at least two gene pools and sub-structured in three and four gene pools, with a correspondence between the genetic structuring and the geographical distribution of the populations. The probable mechanisms of insect pollination and autochoric primary dispersion with secondary dispersion by ants would justify the genetic structuring observed for C. linearifolius, since these strategies would not be enough to promote the genetic flow among the collection regions. The results obtained contribute to the genetic-molecular knowledge of C. linearifolius, which may help in the strategies applied to the management and conservation of the species.