Molecular genetic diversity in a core of cocoa ( Theobroma cacao L.) clones with potential for selection of disease resistance, plant height and fruit production

This study aimed to assess the genetic variability in groups of 11 clones of Theobroma cacao L., from different geographical regions, based on microsatellite markers, with the interest to characterize germplasm for breeding. The products of the amplification of these materials with 15 simple sequence repeat (SSR) markers were separated into ABI377 sequencer. The genotype encoded data were analyzed by means of Roger’s genetic distances, which was employed in the main coordinate’s analysis. The high heterozygosity observed in this group of clones (Ho = 0.7276) and genetic distances between pairs of clones (average 0.75) show that there is a high diversity among these clones. In cluster analysis, the cluster of Trinitario clones and hybrids was separated from the others according to their genealogy, while Forasteros were classified into different groups. The variability among these clones makes them important materials for parent selection in order to obtain hybrid progenies.


INTRODUCTION
Theobroma cacao L. is relatively new in domestication (Dias, 2001) and its morphological features vary. The genus, Theobroma is typical of neotropical regions, with a natural distribution that covers the tropical rainforest in the western hemisphere, between latitudes 18° N and 15° S, spreading from southern Mexico to the Amazon Forest (Cuatrecasas, 1964). But this distribution may have been greatly influenced by pre-Columbian civilizations, which were responsible for the domestication of cacao and its distribution throughout Latin America over 2000 years of *Corresponding author. E-mail: ronanxc@uesc.br. Tel: +55-73-3680-5443. Fax: +55-73-3680-5226.
Author(s) agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License cultivation. According to Venturieri (1993), there are 22 species of the genus, Theobroma which are restricted to tropical America. Nine of them are found in the Brazilian Amazon and are fit for cooking, and only five fit the preference for chocolate production. Among them, the species T. cacao and Theobroma grandiflorum S. stand out in the chocolate industry. The probable center of genetic diversity of cocoa, according to Cheesman (1944), covers the region of Napo, Putumayo and Caqueta, in the upper Amazon.
According to the characteristics of fruits and seeds and the geographic distribution, cocoa can be classified into three major racial groups: (i) the Forasteros, with flat, purple seeds, high hardiness and high yield potential (the Forasteros of the upper Amazon are vigorous, early maturing and resistant to certain pathogens; while the species found at the lower Amazon is known for the uniformity of its fruits, called Amelonados), (ii) the Criollo, which were originally cultivated by indigenous peoples of Mesoamerica (Mexico, Guatemala, Belize, Honduras and El Salvador) and have large seeds, rounded cotyledons and white or light violet when they are wet; (iii) and the Trinitario, natural hybrids between Forasteros and Criollo; although they are included in the group of Forasteros, they form a group that share intermediate characteristics (Dias, 2001;Bartley, 2005;Almeida and Valle, 2007).
Several studies on the genetic diversity of cacao sought to characterize and define genetic groups (Santos et al., 2005;Sounigo et al., 2005;Faleiro et al., 2004a, b;Dias, 2001;Risterucci et al., 2000;Lanaud et al., 1999;Lerceteau, 1997a, b;Russel et al., 1993). Some studies made the assumption that it is possible to differentiate the Criollo of Forasteros by isoenzymes (Ronning and Schnell, 1994) or by means of DNA markers (Laurent et al., 1993a, b). Attempts to categorize the genotypes of cocoa in the three racial groups were frustrated when using morphological descriptors (Engels, 1983) and isoenzymes (Lanaud et al., 1987) which prevented the separation into races of vegetables as initially proposed. The use of molecular markers could prove that the upper Amazon Forasteros, lower Amazon Forastero, Criollo and Trinitario have genetic differences. Studies of genetic diversity in T. cacao have been previously used mainly for demarcation of geographical regions (Figueira et al., 1994), discrimination of subgroups, determination of the ancestry of Criollo and modern Criollo, within the group Criollo (Motamayor et al., 2002) and delimitation of the breakdown of material Trinitario, Criollo and lower Amazon Forastero (Motamayor et al., 2003).
The international collections of germplasm are being characterized by their genetic diversity based on microsatellite markers. By means of the use of SSR markers to analyze hundreds of genotypes from commercial plantations and compare them with 246 genotypes from five collections of cocoa (germplasm banks and groups of parents of African cocoa clones), it was shown that there is great genetic differentiation among populations, except among the genebank of the West Africa and materials cultivated in countries of that region (Aikpokpodion et al., 2009). Among the 612 accessions from six groups of old cocoa genotypes, only 316 were kept for diversity analysis, since these markers allowed excluding the remaining accessions and identified as duplicates or misclassified. Such diversity analysis revealed strong structure of genetic diversity, consistent with the hydrographic division of the main rivers of the Peruvian Amazon (Zhang et al., 2009). Therefore, microsatellites have been instrumental in the management of germplasm banks and in understanding the structure of cocoa populations.
The planting of large commercial cocoa growing regions have been characterized by genetic diversity and possible involvement of different genetic groups in their composition. In Madagascar, SSR were used in diversity analysis of 27 cocoa clones and nine Trinitarians Criollo. This investigation revealed the distribution of Trinitario along a continuum between the Criollo and foreigners of the lower Amazon, except that the clones had high introgressions of alleles from Amazon. This shows that the Trinitario have as likely parent, the Amazonian individuals whose genetic diversity is low (Motamayor et al., 2002). Additionally, the hybrids with at least one parent of the high Amazon showed marked diversity. The low genetic diversity found in the family relationship between Trinitario and low Amelonados Amazon is the result of a few low Amazonian genotypes involved in the genealogy of Trinitario (Motamayor et al., 2003). Cocoa clones resistant to witches' broom grown in southern Bahia, Brazil, have high inter-genetic relationship due to the predominance of crossings of Scavina 6 with about a dozen other clones (Faleiro et al., 2001). In the analysis of 30 clones grown in this region, and four international clones of different geographic origins, it was observed that the genetic distances computed on the basis of microsatellite markers range from 0.13 to 0.71, revealing high genetic diversity among these materials (Faleiro et al., 2004a). In the analysis of 36 clones currently grown in southern Bahia, it was found that clones with potential tolerance to drought stress have genetic diversity comparable to international groups of clones such as TSH 1188 and CCN 51 (Bertolde et al., 2010). By examining 332 genotypes from farms in West Africa based on 12 SSR loci, the genetic variability and the presence of alleles associated with amelonado materials (potentially associated with high organoleptic qualities of cocoa) was shown. But this materials has been rarely used in these crop production in West Africa, highlighting the need to include local selections in germplasm banks, for conservation purposes (Aikpokpodion et al., 2009). The large-scale studies of genetic diversity of cacao has indicated more comprehensive ways to define the cacao genetic groups and eliminate access of misidentifications in the germplasm bank (Motamayor et al., 2008). In several other studies where cocoa is grown, also there  ICGD, 2007;Dias, 2001;Pires, 2003. 2 -Dias, 2001Crouzillat et al., 2000;ICGD, 2007;Pires, 2003. 3 -Bartley, 1994Dias, 2001;Sounigo et al., 2005;ICGD, 2007;Pires, 2003. 4 -Sounigo et al., 2005Risterucci et al., 2000;ICGD, 2007;Dias, 2001;Pires, 2003. 5 -Sounigo et al., 2005Risterucci et al., 2000;ICGD, 2007;Dias, 2001;Pires, 2003. 6 -Neto et al., 2005ICGD, 2007;Dias, 2001. 7 -Dias, 2001Sounigo et al., 2005;Bartley, 1994;ICGD, 2007. 8 -Bartley, 1994Dias, 2001;ICGD, 2007. 9 -Faleiro et al., 2004aBartley, 1994;ICGD, 2007;Dias, 2001. 10 -Lecerteau, 1997aICGD, 2007;Dias, 2001. 11 -ICGD, 2007Dias, 2001;Pires, 2003. are studies on the genetic diversity of local materials. These studies make it possible to better understand the genetic basis and potential use of these materials in improvement (Lima et al., 2013, Santos et al., 2015Thondaiman et al., 2013). All these studies demonstrate that microsatellites are useful to characterize the genetic materials included in cocoa breeding programs around the word.
In the cocoa region of Bahia, Brazil, several local institutions develop research with cocoa tree, and groups of clones that best match the focus of improving their breeding programs. The germplasm characterization involves local and international clones for use in breeding programs. In this sense, this work was carried out to verify the genetic variability and clusters obtained with molecular data of 11 clones from different geographical regions, aimed to help the breeding programs of cocoa.

MATERIALS AND METHODS
The 11 clones of cocoa used in this study are from different geographical regions and are part of Brazilian breeding program (Table 1). These clones were a priori selected by the MCCS based on agronomic traits in order to hybridize and subsequently identify the progenies with combining ability for plant size, quality of products derived and to characterize routinely required as yield and disease resistance. Among clones considered promising, are some from the upper Amazon region that were used as parents of commercial hybrids in crosses with the lower Amazon Forastero (Amelonado), aiming at selection for combining ability for resistance to black pod (Tahi et al., 2006) and production components, such as weight per plant and seeds per fruit (Dias, 2001).
DNA samples were extracted from 300 mg of fresh leaves and healthy plants collected from each clone at intermediate age of maturity, using previously modified method (Doyle and Doyle, 1990;Bertolde et al., 2010). After extraction, all samples of DNA were purified with Prep-A-Kit Gene Mapris BIO-RAD according to the manufacturer's recommendations. Amplifications were performed in a total volume of 13 µL of reaction solution with the concentrations previously described (Bertolde et al., 2010), in a thermal cycler programmed for 35 cycles according to the following program: 94°C for 2 min for DNA denaturation, 46°C or 51°C (temperature specific to primer) for 1 min for annealing of primer, 72°C for 1 min extension of the DNA molecule by Taq DNA polymerase. After these 35 cycles of amplification, a final step at 72°C for 7 min was performed.
The products of amplification were separated on ABI Prism 377 (Applied Biosystems) automated sequencer and genetically analyzed according to Bertolde et al. (2010). The genotype data generated by the programs Genescan and Genotype (Applied Biosystems) were coded into numerical matrices. The level of heterozygosity was calculated based on the ratio between the number of heterozygosis loci and the total number of loci analyzed. The matrix of modified Rogers' genetic distances (Goodman and Stuber, 1983) was used in the graphic dispersion analysis of data by principal coordinates (Gower, 1996), the program NTSYSpcversion 2.0 (Rohlf, 1997). For each dendrogram, the value of cophenetic correlation was calculated between the genetic distance matrix and the matrix of cophenetic values.

RESULTS AND DISCUSSION
The 15 pairs of primers generated a total of 83 alleles and the group of analyzed clones showed an observed heterozygosity (Ho) of 0.7276 (Table 2). The number of alleles varied according to the primer: the most informative primer Y16977 (Ho = 0.87500) generated nine alleles and the less informative primer AJ271827 (Ho =0.5413) generated three alleles. The value of Ho in this group of clones was higher than that observed in studies with upper Amazon genotypes and clones grown in Africa, for which were reported Ho = 0.5790 (Aikpokpodion et al., 2009). The average number of alleles per primer (Na = 5.53) detected in these genotypes is lower than Na = 13.8 obtained in 35 commercial clones used in plantations in southern Bahia and resulting from crosses or from different places (Bertolde et al., 2010) and Na = 8.8 alleles detected in 60 clones sampled in different international collections of cocoa . However, in the subgroup of 13 upper Amazon Forasteros, Motilal et al. (2009) found Na = 7.5, a value similar to that found in the present study. In the present study, eight out 11 clones analyzed was formed by upper Amazon clones. This lower number of alleles per locus is slightly higher to that found in other studies with natural populations (Na = 4.45) of hits from the Brazilian Amazon (Sereno et al., 2006). Based on 15 microsatellite markers, high genetic variation among clones was detected (Table 3). The genetic distances ranged from 0.55900 (between genotypes COCA 3370 / 5 and AMAZ 15) to 0.89190 (between genotypes ICS 95 and SCA 24), averaging 0.75 for all combinations. The pairs of genotypes that had the greatest distances, around 0. ICS 95 x H56. The other distances in the crosses had values ranging from 0.60 to 0.79. One of the smallest distances was observed between clone TSH 1188 and ICS 95 and average distances with IMC 67 24 and SCA 24. This result partially agrees with the genealogy of the clone, because TSH 1188 is the result of crosses between SCA 6 and ICS 1 IMC 67. A smaller distance between SCA 24 and IMC 47 shows tendency already expected for these two clones, since they were collected in the Amazon Forest of Peru (Pound, 1938) and are also classified as Upper Amazon or Wild. Underlined numbers indicate the genetic distances between Amazon clones (1-8) and the hybrids (9-10) or Trinitario (11) clones. Distances between all Trinitario clones ranged from 0.60 to 0.66 and the distances between all Forasteros ranged from 0.56 to 0.88 (Table 3). Thus, the increased variability found between Forastero clones as compared to Trinitario is expected because the sample is formed by more Forastero than Trinitario. Additionally, the genealogy of Trinitario sampled here contributes to these differences. The following average genetic distance was observed among clones: 0.62 among clones Trinitario; 0.75 between Trinitario and Forastero; 0.74 among Forastero. Greater variability within the Forasteros as compared to Trinitarios was also observed in other genotypes based on molecular markers (Lecerteau et al., 1997ab). It was found that the largest genetic distances are found both between clones belonging to the same racial group and between the two racial groups. Thus, the parent selection both between and within racial groups may be considered aiming for the selection of economically important traits present in this group of germplasm.
In cluster analysis, hybrids clones (TSH 1188 and CCN 51) and a Trinitario clone (ICS 95) stayed together, while the remaining clones were distributed into different plans (Figure 1). This pattern of group pattern shows that different materials do not form homogeneous groups according to the molecular diversity of racial groups, since the apparent homogeneity of the three genotypes of Trinitario and hybrids of Trinitario can be explained by genealogy and not by racial group. The Forastero (LCTEEN 37 A) remained alone, but in an intermediate position between hybrids and a group formed by tree high Amazon clones (AMAZ 15, LCT EEN 162/1010 and COCA 3370/5). Indeed, the molecular markers alone are not a single security classification of racial groups as observed before by Lecerteau et al. (1997a) and Figueira et al. (1994). But, it is useful to show the genetic diversity of the germoplasm included in breeding.
In this study, when considering the classification of wild and domesticated clones, the formation of groups is also not clearly seen. Wild clones (AMAZ 15, LCT EEN 162/1010, LCT EEN37 A, SCA 24, IMC 47, COCA 3370/5, H 56) did not differ from the cultivated ones (TSH 1188, ICS 95, CCN 51 and GU 261). According to Figueira et al. (1994), the wild group is composed of upper Amazon Forastero, while all others are domesticated, since they suffer a lot of anthropogenic manipulation.
In this work, different clones from Ecuador were grouped with two clones from Trinidad (ICS 95 and TSH 1188) and one Peruvian clone (H 56). This result can be attributed to the structure of the populations grown in Ecuador which are offspring of parents known as "refractario", which in turn are derived from hybridization events between genotypes of three different possible regions and then selected for resistance to witches' broom (Bartley, 2001). Thus, the genetic structure of the Ecuadorian population can be attributed to (i) crosses between genotypes of Trinidad, Venezuela and Colombia, which were introduced in Ecuador in the late XIX century, (ii) National Cocoa and (iii) genotypes of Forastero from part of the Ecuadorian Amazon or Peru, at the middle Napo River (Bartley, 2001). Based on the grouping shown in this study, AMAZ 15 was allocated to the other Ecuadorians clones. Two clones from Peru (SCA 24 and IMC 47), lower Amazon clone of French Guiana (GU 261) are in the second group. According to Risterucci et al. (2000) and Sounigo et al. (2005), genotypes were classified by the acronym AMAZ which can also be classified as material of Peru or Ecuador. Sounigo et al. (2005) classified them as belonging to the Peruvian genotype AMAZ series, because the other genotypes were grouped in the same region of collection (SCA and IMC).
In conclusion, there is divergence between the materials that enable the selection of parents for hybridizations of crosses involving these genotypes. Also, there is no agreement among the clusters obtained at molecular level, geographic regions, level of domestication and racial groups for the germplasm in this work.

Conflicts of Interests
The authors have not declared any conflict of interests.