Genetic characterization of Cape gooseberry (Physalis peruviana L.) accessions in selected counties in Kenya using SSR markers

Cape gooseberry (Physalis peruviana L.) is a neglected high potential crop, knowledge of the genetic diversity of the genotypes domesticated in Kenya is limited. To understand the genetic diversity and structure within and between Cape gooseberry germplasm, 70 accessions from six selected counties were analyzed using 15 pairs of highly polymorphic SSR primers. In this study, a total of 61 polymorphic SSR alleles were identified with mean polymorphic information content (PIC) of 0.43. Analysis of Molecular Variance (AMOVA) revealed that 92.8% of the total genetic variation was within accessions whereas variation among accessions accounted for 7.2% of the total genetic variation. Genetic diversity parameters among the 70 accessions revealed that Cape gooseberry was more diverse than previously recorded. Based on the SSR data, the 70 accessions were classified into five main phylogenetic groups, which corresponded to the county of origin through factorial analysis, principal component analysis (PCA), and phylogenetic analysis. Seven core SSR primer pairs, namely SSR1, SSR2, SSR10, SSR11, SSR123, SSR138, and SSR146 were found to have a wide applicability in genotype identification of cape gooseberry, and thus they are recommended for use in genetic characterization of germplasm collected from other counties not covered by the present study. This study demonstrated the existence of considerable genetic diversity in Cape gooseberry accessions growing in selected counties in Kenya and can therefore be used as a basis for future breeding programs in the development of hybrids with desirable traits. This wider genetic diversity is vital for posterity as it will help cope with unpredictable climatic changes and human needs.


INTRODUCTION
Physalis peruviana L. is a species from the family Solanaceae and genus Physalis, commonly known as Cape gooseberry, ground cherry, Cape gooseberry, or winter cherry. It contains high amounts of vitamins (A, B, *Corresponding author. E-mail: pauline.wanjiru65@gmail.com. Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License and C), micronutrients (iron, phosphorous, and calcium), has anti-inflammatory, antioxidant, and anti-hepatotoxic activities . The vitamin C content in this fruit is reported to be the highest among all other fruits and plants, thus its reference as a "super fruit", Cape gooseberry contain up to twenty times vitamin C as that found in oranges (Villacorta and Shaw, 2013). Owing to such concentrated levels of nutrients in this fruit, Villacorta and Shaw (2013) posit that it is useful for medicinal purposes in restoring vitality and boosting immunity by fortifying the liver, supporting cardiovascular activity, strengthening lungs, and enhancing fertility and food absorption.
Investigation of genetic diversity in both wild and domesticated species is essential. Assessment of available genetic diversity is a preliminary stage in genetic improvement in crop plants (Bhandari et al., 2017). Wild populations of different crop species are known to be a potential source of useful genes and traits which could be introduced into the domesticated gene pool (Campisano et al., 2015). There are various ways in which diversity analysis can be carried out including cytological, morphological, biochemical, and molecular approaches. With the advent of genomic tools, assessment of genetic diversity at the molecular level has proven to be more useful as compared to that at the phenotypic level because the latter entails analysis of data on morphological traits, which are generally influenced by environments (Myers et al., 2000). Different molecular marker systems have been proved to be valuable tools in assessing the genetic diversity of plants between and within species regardless of environmental interferences on the phenotype (Demir et al., 2010). Research has shown that different markers reveal different classes of variation (Virk et al., 2000). Simple Sequence Repeat (SSR) markers have become the marker of choice for many researchers as they offer many advantages including technical simplicity, feasibility of automation, even distribution throughout the genome, higher frequency of polymorphism, rapidity, requirement of little and not necessarily high-quality DNA, and no requirement of prior information of any DNA sequence (Mason, 2015). Molecular marker analysis work would be of great help for analyzing genetic diversity, and exploiting genetic resources for identification, isolation, conservation, and utilization (Ravi et al., 2010).
The availability of enough food to meet 95% of the world's requirements is dependent on only a few crop species which are widely and intensively cultivated crops. These have been developed by extensive selection from available large agro-biodiversity pool (Ochatt and Jain, 2007). There is a great need to expand the exploitation of the plant genetic diversity that would broaden the crop diversity for food supply to feed the ever-growing human population and avoid dependence on few food crops. Wild relatives and neglected crops could become an excellent source of useful gene pool. Muraguri et al. 29 The average yields of Cape gooseberry are still below the maximum potential mainly due to fruit cracking, small fruits, and premature fruit drop (Ali and Singh, 2016). Also, the poor quality of fruits in some cultivars in terms of the levels of total soluble solids (TSS) and total titratable acidity (TTA) make them unattractive for large scale agriculture (Herrera et al., 2011). No improved cultivars have been developed yet although Leiva-Brondo et al., (2001) reported to have used a simple breeding strategy employed in other Solanaceae crops to develop hybrids with superior yield characteristics by exploiting heterosis. Genetic diversity is important in this context as it serves as the reservoir of many novel traits related to yield, quality as well as tolerance to abiotic and biotic stresses. A thorough understanding of the diversity in the Cape gooseberry genome is necessary before implementation of any breeding program in breaking these yield and quality barriers. The objective of this study was to determine the genetic diversity of Cape gooseberry genotypes in Kenya using SSR markers for use in present and future breeding schemes and conservation programs. This information will contribute to understanding the genetic relationship between and within different genotypes and provide basic information on parental selection for Cape gooseberry breeding material.

Experimental materials
Seventy dry leaf samples were collected from accessions of Cape gooseberry collected from six selected counties of Kenya ( Figure  1).

DNA extraction and quantification
Total nucleic acid was extracted from the dry young leaf tissue that had been stored in silica gel for one month using a modified CTAB protocol (Porebski et al., 1997). Modifications involved the introduction of the initial wash stage using 2-Mercaptoethanol to remove the aromatic compounds and phytates. Centrifugation time for initial stages was also increased to 10 min to ensure that cell debris and the proteins were well decanted to minimize contamination. Final centrifugation time was reduced to 3 min while centrifugation speed was increased to 14000rpm to avoid pelleting of carry over impurities. Precipitation time was also increased from the recommended time of 2 to 18 h to increase DNA precipitation and recovery. Nanodrop spectrophotometer (Applied Biosystems) and agarose gel electrophoresis assays were used to determine the purity and concentration of DNA in the samples. Nanodrop spectrophotometry involved reading the concentration of DNA from the absorbance of the sample at 260 nm (1OD (A260) = 50 µg for doubled stranded DNA/µl). The purity of the DNA sample was determined by A260/A280 ratio (1.6±1.8 for pure DNA). Agarose gel electrophoresis quantification involved resolving the samples in agarose gel 0.8% (0.8g agarose and 100ml Sodium Borate) containing 3ul ethidium bromide staining dye at voltage of 100 volts and 400 mA current for 30 min. The DNA was visualized on a UV transilluminator (Applied Biosystems).

Selection and genotyping of SSR markers
A set of 15 SSR markers selected from earlier published reports (Table 1) were used to determine the diversity of Cape gooseberry collections. The SSR markers were selected based on high polymorphic information content (PIC) values (>0.4) in earlier published studies; the maximum number of alleles detected, genome coverage, and distribution on linkage groups. The primers for the SSR markers were synthesized on contract from Inqaba Biotech in South Africa and genotyping of the SSR markers was carried out at Marker Assisted Breeding Laboratory at Kenya Agricultural and Livestock Research Organization (KALRO) Njoro Center.
Polymerase chain reaction (PCR) was done in a 2720, 96 universal gradient thermocycler (Applied Biosystems) in 20 µl final volume containing 10±20 ng DNA template, 5.0 pmol forward and reverse primers, 1x PCR buffer, 2.5mM of each dNTPs, 1.5 mM MgCl and 0.5 U of Taq DNA polymerase (Biolabs). The amplification conditions for PCR profile were: 95°C for 5 min, 35 cycles of 95°C for 30 s, specific annealing temperature for each SSR for 1 min, extension at 72°C for 2 min, and final extension at 72°C for 10 min. The PCR amplicons were run in a 1.8% agarose gel containing ethidium bromide staining dye at a voltage of 80 volts and a current of 400 mA for 1 h and visualized in a UV transilluminator (Applied Biosystems).

Data analysis
SSR marker alleles were scored manually from the gel images using a simple numerical scoring method. When the expected band was present it was scored as 1 while 0 was used to code for absence. Number of alleles, observed heterozygosity (HO) and expected heterozygosity (HE), Shannon's diversity index (I), gene flow (Nm), and gene differentiation coefficient (FST) executed in POPGENE Version 1.32 (Yeh et al., 2000) were used to determine genetic diversity. Chi-squared was used to determine Hardy -Weinberg equilibrium (HWE) while genetic diversity and polymorphic information content (PIC) were computed in Power Marker 3.25 (Liu and Muse, 2005).

DNA extraction and PCR
All the samples yielded good quality and quantity of DNA to enable genotyping (Plate 1). Genotyping of markers resulted in the amplification of expected regions resulting in single or multiple bands (Plate 2a). Minimal contamination of samples by proteins and phenolic compounds was observed in this study. This study evaluated the genetic diversity and population structure of 70 accessions of Cape gooseberry that originated from six selected counties of Kenya (Nakuru, Laikipia, Nyandarua, Kericho, Murang'a, and Kiambu) by 15 SSR primer pairs. The 15 primer pairs amplified a total of 71 polymorphic SSR alleles (Plate 2b). Every primer pair was able to amplify varying the number of SSR alleles ranging from 100 to 300 bp from all accessions tested, regardless of the county of origin.

Marker polymorphism and genetic diversity of Cape gooseberry accessions
The number of major alleles per primer ranged between 2 (SSR54, SSR15, SSR72, SSR77, SSR112, and SSR 126) and 11(SSR 2). A total of 61 alleles were detected by the 15 primers with a mean of 4.07 (Table 2) between 1.10 (SSR112) and 2.00 (SSR1) with a mean of 1.28 (Table 2). The lowest gene diversity was observed for SSR112 (0.08) while the highest was 0.82 for SSR1. All the markers used in the study were polymorphic, SSR1 was the most polymorphic marker (PIC = 0.77) while SSR112 was the least polymorphic (PIC = 0.079) while the mean polymorphic content was 0.432 (Table 2) Table 2). Seven SSR primer pairs (SSR1, SSR2, SSR10, SSR11, SSR123, SSR138, and SSR146) produced more than five alleles among the 70 Cape gooseberry accessions. The seven SSR primer pairs would have a priority of choice in evaluating cape gooseberry because they were more informative in their ability to segregate between the accessions and had PIC values above 0.4. The highest number of SSR loci detected in this study contained dinucleotide (two nucleotide units) and hendecanucleotide (eleven nucleotide units) repeats characteristic of markers located along untranslated regions (UTR). This is in line with the hypothesis by Morgante et al. (2002) which posit that in most plants the SSR loci are found along the UTR's and may have accounted for the markers with low PIC because SSRs found in untranslated regions have been reported to be less polymorphic than genomic markers (Ellis and Burke, 2007). The 61 polymorphic SSR alleles detected in this study were higher than the 6 alleles reported by Chacón et al. (2016) and the 30 polymorphic alleles reported by Simbaqueba et al. (2011). These differences may be due to different accessions used in previous studies or the stringency of scoring. The differences could also be due to the relatively narrow genetic base of commercial Cape gooseberry varieties used in the previous studies.

Genetic distances between gooseberry accessions within counties
Plant breeding applications such as germplasm collections, selection of parental materials, identification of quantitative trait loci, linkage, and association mapping are dependent on previous genetic diversity information (Rao and Hodgkin, 2002;Zhu et al., 2008;Rauf et al., 2010). In this study, the application of SSR markers on the  Mean fixation index (F ST ) ranged between 0.003 and 0.58 for Nakuru and Laikipia respectively. Cape gooseberry accessions in Nakuru and Kericho counties exhibited insignificant genetic differentiation (Fst < 0.05) indicating that the accessions in the two counties are interbreeding freely. Cape gooseberry accessions in Nyandarua, Kiambu, Murang'a, and Laikipia showed significant genetic differentiation (Fst < 0.05). This great genetic differentiation is a sign of geographic isolation and a high inbreeding rate (Table 3). Overall, genetic differentiation reported in the present study is very high (Fst = 0.7333). This may be attributed to the fact that Kenyan Cape gooseberry cultivars have no history of domestication. Lagos et al. (2008) reported that P. peruviana is more than 53% outcrossing and its domestication from the wild did not involve a long process as compared to fruit-bearing relatives such as tomato (Labate et al., 2009;Sim et al., 2012). It is therefore probable that natural selection is still an important factor in retaining heterogeneous populations with broad genetic adaptability and variability (Rauf et al., 2010).

Genetic distances of Cape gooseberry accessions among the six selected counties
For this study, the allele frequency of Cape gooseberry between counties was significantly lower than diversity within the counties (Table 4). Allele-frequency divergence among counties was the highest between Laikipia and Nakuru (0.0067) and the lowest between Kericho and Nakuru (0.00). Overall there was a very small allele divergence observed in Cape gooseberry across counties. This shows that the gooseberry accessions from the selected counties share most of the alleles evaluated, and this is an indicator of low geographic differentiation among these accessions. The findings of this study show that the population showed a slight deviation from the Hardy Weinberg equilibrium (HWE = 0.7). This may be due to the significant effects of natural selection resulting from the limited domestication of Cape gooseberry  Partitioning of the genetic variation of the Cape gooseberry accessions was done using Analysis of Molecular Variance (AMOVA) to determine whether variation observed between the accessions was due to genetic makeup or microclimatic factors. A total of 92.80% of the variation was found among the accessions while a total of 7.20% of the variation was revealed within the accessions (Table 5). This is a further proof that the Cape gooseberry accessions in Kenya have a broad genetic diversity.

Factorial analysis
Factorial analysis was performed to analyze the genetic relationship and population structure of the accessions within and between the counties. A dissimilarity matrix calculated using raw data from the SSR "1" and "0" matrix was used for factorial analysis using Darwin 6.0.21 software. The first five axes accounted for 76.81% of all the variance observed in the test samples, with 41.2, 52.71, and 1.85% explained by PC axes 1, 2, and 3, respectively. The highest variance was observed in Nakuru county (31.90) indicating that cape gooseberry accessions in this county have high genetic diversity and the lowest variance was recorded in Laikipia (-0.21) showing comparatively genetic diversity in the accessions in this county ( Table 6). The high percentage of variation explained by the first three components in the factorial analysis shows that the differentiation of most of the individuals was well captured. However, it is noteworthy to consider the use of a larger number of high-resolution markers and platforms such as SNPs and genotyping by sequencing (GBS) due to the outcrossing nature of the species (Zhu et al., 2008).
Factorial analysis grouped the accessions into five distinct clusters depending on the county of origin. Accessions from Kiambu and Murang'a counties, however, clustered together showing that they were genetically more identical (Figure 2). Individuals were distinct within clusters with little overlap between individuals indicating that the collections are genetically diverse. The collections show a higher level of diversity within and across the clusters (Figure 2).

Phylogenetic analysis
A distance tree was constructed in Darwin 6 using the UPGMA method. The robustness of the node of the phylogenetic tree was assessed from 1000 bootstrap replicates. In this study, the minimum dissimilarity value for the phylogenetic tree was 0.027 while the maximum value was 1. This high dissimilarity value is further proof of high genetic diversity found in the Kenyan gooseberry accessions. Tree length varied between 0 for duplicates The clustering analysis applied was able to detect a geographical distribution pattern. This observation may be due to lack of frequent gene flow through the exchange of seeds among the counties by humans as is often the case in heavily domesticated species. This finding is in disagreement with the findings of Garzon-Martínez et al. (2015) who failed to deduce any geographical clustering of Columbian P. peruviana varieties using SSR and InDels markers. This may be because the study by Garzón-Martínez et al. (2015) used well defined domestic commercial gooseberry accessions. Both factorial and phylogenetic population analyses show that the whole Cape gooseberry population has two different genetic populations. Using the PCA approach, two different populations within the P. peruviana were identified.

CONCLUSION AND RECOMMENDATION
This is the first study in Kenya that used SSR markers to genetically characterize Cape gooseberry. The study established that Cape gooseberry in the six target counties of Kenya have a broad genetic diversity. Based on the SSR data, the 70 accessions were classified into two main phylogenetic groups and six sub-clusters which corresponded to the county of origin through factorial analysis, principal component analysis (PCA), and phylogenetic analyses. The study also established that seven SSR primer pairs with higher polymorphism namely, SSR1, SSR2, SSR10, SSR11, SSR123, SSR138, and SSR146 have a wide applicability in genotype identification and characterization of the population structure of Cape gooseberry. The information generated by this study contributes to understanding diversity and population structure and enhances the management of Cape gooseberry genetic resources in Kenya.
This study enhances understanding of levels of genetic variations among Kenyan Cape gooseberry germplasm and informs the need to introduce commercial Cape gooseberry varieties as sources of genetic variation for breeding and hybridization purposes. The findings of the study also inform on the need to use more advanced molecular platforms such as genome-wide sequencing to establish more diversity in wild and cultivated Cape gooseberry in Kenya.