Simple sequence repeats (SSR) and interspersed sequence repeats (ISSR) markers for genetic diversity analysis among selected genotypes of Gossypium arboreum race ‘bengalense’

Genetic diversity among 65 selected genotypes of Gossypium arboreum race bengalense was explored using 62 simple sequence repeats (SSR) and 73 interspersed sequence repeats (ISSR) markers. The SSR primers produced a total of 170 alleles (all polymorphic), while ISSRs yielded 281 bands of which only 94.3% were polymorphic. Utility of various markers were evaluated by calculating different parameters like polymorphic information content (PIC), marker index (MI), and discriminative ability (D), on the basis of which 21 SSR and 53 ISSRs primers were found very efficient for genetic diversity analysis. ISSR outperformed the SSR for discriminative ability as it yielded higher number of banding patterns (ISSR-658, SSR-175), greater numbers of polymorphic bands/assay (ISSR-3.63 and SSR-2.7) and higher D values (ISSR-0.862 and SSR-0.442). Values of I (SSR-0.740 and ISSR-0.421) and He (SSR0.433 and ISSR-0.262) indicated SSRs as more suitable for characterizing the species in terms of abundance and evenness of alleles. A slight difference was observed in terms of MI values of the SSR (1.20) and ISSR (MI-1.38), showing an edge for ISSR in detecting overall polymorphism among given genotypes. Phylogenetic analysis was carried out by SSR, ISSR as well as combined datasets of markers. The highest value of cophenetic correlation coefficient was obtained for ISSR (r=0.94), followed by combined datasets (r=0.91) and SSR markers (r=0.87).


INTRODUCTION
Cotton (Gossypium spp.) is one of the principal cash crops, providing most of the world's natural textile fiber.
The genus Gossypium (family Malvaceae) comprises nearly 45 diploid and 5 allotetraploid species.Spinnable *Corresponding author.E-mail: Psiwach29@gmail.comAuthor(s) agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License fibers are obtained only from four species; two allotetraploids or new world cotton (Gossypium hirsutum and Gossypium barbedense) and two diploids or Asiatic/old world cotton (Gossypium herbaceum and Gossypium arboreum).
India is the original home of domestication, diversification and development of Asiatic cultivated cottons.From 1500 BC to 1700 AD, India was recognized as the cradle of the cotton industry.The Indian monopoly in cotton muslins was broken up by the industrial revolution in England; new world cotton largely replaced the Asiatic cotton (Mohan et al., 2006).The major cause for this change was the unsuitability of diploid cotton fibers for mechanized spinning because of short length (<23 mm), high coarseness (>5.0 micronaire) and poor strength (<20 g/tex at 3.2 mm gauge) (Kulkarni et al., 2009).At present, tetraploid cotton (dominantly G. hirsutum) occupies a major fraction (>90%) of world cotton cultivation due to its suitability to mechanized harvesting and spinning.However, in marginal and drought-prone environments of Asia, diploid cottons are still popularly cultivated.This is because of certain inherent traits (which the tetraploids lack) like drought and salinity tolerance (Tahir et al., 2011); resistance to several pests including bollworms (Dhawan et al., 1991), aphids and leafhoppers (Nibouche et al., 2008); and diseases like rust, fungal (Wheeler et al., 1999) and viral (Akhtar et al., 2010).
Of the two diploid cultivated species, G. arboreum is more popular due to its suitability to a wider range of environments, and better fiber and plant features (Mohan et al., 2006).From its origin, dispersal and domestication of G. arboreum germplasm in different directions resulted in six races-indicum, burmanicum, cernuum, sinense, bengalense and soudanase.India is the only country where all six races are cultivated, the major share of which is contributed by 'bengalense' (cultivated commonly across central and North India).
G. arboreum germplasm constitutes an indispensable gene pool for modern cotton improvement programs.However, due to continuous selective breeding and selection during the last few decades, the germplasm is facing the constraints of narrow genetic base.Knowledge of genetic variation among G. arboreum germplasm is essential for future developments.Equally essential are the efficient tools which enable the detection of higher levels of genetic diversity (Ulloa et al., 2007).During the last two decades, various molecular markers have been extensively used for genetic diversity studies across species.G. arboreum germplasm has been explored with markers like randomly amplified polymorphic DNA (RAPD) (Deosarkar et al., 2010), interspersed sequence repeats (ISSR) (Bardak and Balek, 2012), simple sequence repeats (SSRs) (Noormohammadi et al., 2013a) etc.; and all studies report low polymorphism.Considering the edge of SSR and ISSR markers in cultivar fingerprinting and diversity studies, the present study was planned to evaluate the utility of these two methods for assessing genetic diversity as well as phylogenetic analysis among elite genotypes of G. arboreum race 'bengalense'.

Plant materials and DNA extraction
Seeds from 65 elite genotypes belonging to race 'bengalense' of G. arboreum (Table 1) were procured from the Central Institute of Cotton research (CICR), Regional Station, Sirsa, Haryana, India.The cotton plants were cultivated in two rows of 6 m length with 30 cm interplant distance in the experimental field of CICR, Sirsa, in a completely randomized design (CRD) with three replications.Fresh and young leaves of randomly selected single plants of each genotype were subjected to total genomic DNA extraction using the cetyltrimethylammonium bromide (CTAB) method (Saghai et al., 1984) with certain modifications.The quality and quantity of extracted DNA was examined by agarose gel (0.8%) electrophoresis and ultra violet (UV)-spectrophotometry, respectively.

SSR amplification
One hundred microsatellite primer pairs were obtained from Brookhaven National Laboratory (BNL), MGHES (M for Mississipi, GH-G.hirsutum, E-EST, S-SSR), CIR (CIRAD), JESPR (named after the names of Principal Investigators), Nanjing Agricultural University (NAU), and MUSS (M-Microsatellite, U-Last name of Principal Investigator, SS-Simple Sequences).Out of 100 primers, only 62 gave polymorphism and reproducible banding patterns and hence were selected for the present study (Table 2).The sequence information of these SSRs is available at http://www.cottonmarker.org.
Polymerase chain reaction (PCR) amplification was performed in a volume of 20 µl containing 2 µl of DNA (50 ng/µl), 0.5 µM of each primer (Sigma-Aldrich), 200 µM of dNTPs (Sigma-Aldrich), 0.5 U Taq polymerase (Sigma-Aldrich) and 1X PCR buffer (Sigma-Aldrich).Thirty five (35) cycles, each consisting of 1 min denaturation at 95°C, 2 min at annealing temperature (optimized separately for each primer pair, generally Tm-5°C) and 1 min polymerization at 72°C, were performed in a thermocycler (Bio-Rad, USA).The PCR products were separated by electrophoresis in a horizontal gel system at 100 V for 4 h in a 4% metaphor agarose gel.A 100 bp ladder (Thermo Scientific) was used for size determination of amplified products.Polymorphism was visualized by staining the gel with ethidium bromide, and it was photographed with the gel documentation system (Bio-Rad, USA).

ISSR amplification
One hundred ISSR primers were used for initial screening, out of which 73 primers gave informative banding patterns with good reproducibility.The selected 73 primers were 15-20-mers which included 54.7% di-nucleotide repeat motif, 31.5% tri-nucleotide repeat motif, 8.21 % tetra-nucleotide repeat motif and 5.47% pentanucleotide repeat motif (Table S1).These were anchored at 5' end or 3' end by zero nucleotides or by one to three partially degenerated selective nucleotides.
PCR amplification was performed in a volume of 20 µl containing 2 µl of DNA (50 ng/µl), 0.4 µM of each primer (Sigma-Aldrich), 200 µM of dNTPs (Sigma-Aldrich), 0.5 U Taq polymerase (Sigma- ISSR 73 Primers (The sequence information along with their annealing temperature is given in Supplementary Table 1) Aldrich) and 1X PCR buffer (Sigma-Aldrich).After a predenaturation step of 5 min at 95°C, amplification reactions were cycled forty times at 95°C for 1 min, at the annealing temperature (optimized separately for each primer pair, generally Tm-5°C) for 2 min and polymerization at 72°C for 1 min in a thermocycler (Bio-Rad, USA).The PCR products were visualized by running on 2% agarose gel, followed by staining with ethidium bromide.Finally, the gel was photographed as above.

Evaluating efficiency of different primers within each marker systems for diversity analysis
Within each marker system, the efficiency of each assay unit (that is, primer) was studied by: a) the number of scorable bands (NSB); b) the number of polymorphic bands (NPB); c) polymorphism information content (PIC); d) marker index (MI); e) the number of patterns (Tp); and, f) discrimination power (D).The formulas used for the above calculations are as follows: The number of scorable bands (NSB) represents the average number of DNA fragments amplified/detected per genotype using a marker system.Of these, some loci (fragments or bands) may be polymorphic (NPB).PIC for SSR markers was calculated according to Anderson et al. (1993).For ISSR markers, PIC of a band (PICi) was calculated as follow: , where fij is the frequency of the j th pattern of the i th band (note that dominant markers have two patterns for a band as being present and absent).Then PIC of each ISSR primer was calculated as: , where n is NPB for that primer.
The utility of a given marker system is a balance between the level of polymorphism detected and the extent to which an assay can identify multiple polymorphisms.Marker index is the product of PIC and effective multiplex ratio (EMR) (Powell et al., 1996).EMR is estimated as: EMR= NSB X ß, where ß is the fraction of polymorphic markers and is estimated after considering the polymorphic loci (np) and non-polymorphic loci (nnp) as ß = np / (np + nnp).Tp and D were calculated according to Tessier et al. (1999).

Comparison of two marker systems for diversity analysis
To compare the discriminating capacity of the SSR and ISSR markers, the following statistical calculations were performed manually according to Belaj et al. (2003): a) the number of assay units (U); b) the number of polymorphic bands (np); c) the number of monomorphic bands (nnp); d) the average number of polymorphic bands/assay unit (np/U); e) the number of Loci (L); f) number of loci/assay unit (nu); g) the number of banding patterns (Tp); h) the average number of patterns/assay unit (I); i) average confusion probability (C); j) average discriminating power (D); and k), the average limit of discriminating power (DL).
Several other genetic diversity parameters viz.effective number of allele (Ne), Shannons index (I) and expected heterozygosity (He) were determined using GenAlex 6.5.

Cluster analysis
For this analysis, each amplified band was treated in terms of binary code, based on the presence (1) and absence (0) of bands.To analyze data obtained from binary matrices, the NTSYS-pc ver 2.2 statistical package (Rohlf, 2000) was used.Three data sets were utilized, viz.SSR, ISSR and combined datasets of SSR and ISSR.The binary qualitative data matrices were then used to construct similarity matrices based on Jaccard similarity coefficients (Jaccard, 1908).The similarity matrices were then used to construct a dendrogram using the unweighted pair group method with arithmetic average (UPGMA).To compare SSR and ISSR based dendrograms, cophenetic matrices were derived from dendrograms using COPH (cophenetic values) program, and the goodness-of-fit of the clustering to the 2 data matrices was calculated by comparing the original similarity matrices with the cophenetic value matrices using the Mantel matrix correspondence test (Mantel, 1967) in the MXCOMP program.Similarly, a dendrogram was also constructed for combined dataset of SSR and ISSR markers.
Marker index (MI), considered to be an overall measure of the efficiency to detect polymorphism, was obtained in the range of 0.06-4.85(average 1.20).The 21 informative primers, designated so on the basis of high PIC, also exhibited a high marker index value (more than 1.5).Primer NAU-3008 yielded highest MI value (4.85), which was obvious because it had the highest PIC and EMR.The discriminating power (D) of a primer depends on the number of fragments it generates as well as the frequency of the banding patterns.In the present study, the maximum value of discrimination power (D) observed was 0.927 (NAU-3008) while the lowest was 0.44 (NAU-3675, BNL 1434, BNL 1694, NAU 923 andMUSS 439); with an overall mean value of 0.442 for the 62 SSR loci.The 21 above-mentioned informative primers pairs also exhibited high discrimination power (values of D more than 0.6) and thus these 21 primers were categorized as highly informative and discriminative primers (Table 3).
On the basis of higher values of MI (more than 1), D (more than 0.8) and PIC (more than 0.3), 53 primer pairs were identified as very efficient for the present genetic diversity analysis (Table 4).Further, in addition to these 53 primer pairs, 7 more primer pairs viz.ISSR-15, 16, 59, 80, 87, 96 and 103 exhibited higher values of D (more than 0.8), though MI values were considerably low for some.

Comparison of marker systems
Performance of the two marker systems was compared based on two main aspects: The discriminating capacity (that is, efficiency of discrimination) between any two genotypes at random from the studied genotypes; and, the overall efficiency in detecting polymorphisms in all the studied genotypes.
Overall, SSR markers were more polymorphic (100% polymorphic bands) than ISSR (94.3% polymorphic bands), however, the number of polymorphic bands per assay unit was higher in ISSR (3.63) as compared to SSR (2.7).SSR markers are locus specific so only 1 loci was analyzed per assay, and 62 loci overall.ISSR primer pairs produced 281 bands, with each band considered as one locus, resulting in an average of 3.84 loci per assay unit.ISSR produced a higher number of banding patterns (658) than SSR (175) and so the average number of banding pattern per assay unit was also higher for ISSR (9.01) than for SSR (2.8).
The number of effective alleles (Na) in all 65 genotypes examined was higher in SSR (2.112) than in the ISSR assay (1.397), while the average discriminating capacity (D) was distinctly higher for ISSR (0.862), compared to SSR (0.442) (Figure 1).The average limits of discriminating powers (D L ) for both the markers were found to be very close to the actual value of the discriminating powers (D) of both.
A higher value for the Shannon index (I) was obtained for SSR (0.740), ISSR yielding a comparatively low value of I (0.421) (Figure 1).The average expected heterozygosity (He) values calculated for SSR and ISSR came  out to be 0.433 and 0.264 respectively.The average PIC was found to be the same for both SSR and ISSR markers (0.38) in the studied genotypes, while the Average MI was slightly higher for ISSR compared to SSR markers.

Cluster analysis
A dendrogram obtained using the UPGMA method based on SSR, ISSR and SSR + ISSR data set (Figure 2) clearly distinguished all the genotypes of the race 'bengalense' of G. arboreum.Genetic similarity coefficients were obtained in the range of 0.62-0.82for the SSR marker, 0.56-0.86 for ISSR markers and 0.59-0.80 for the combined data of SSR and ISSR markers.
Five main clusters were formed in all three dendrograms.Each cluster consists of a different number of genotypes with different genetic similarity coefficients.
In dendrograms based on SSR, ISSR and SSR + ISSR, the first cluster consists of 11, 14 and 9 genotypes, respectively, in which CISA-6-187 have been found to be more distant than the other genotypes in all three dendrograms.The second cluster consists of 23, 21 and 21 genotypes, respectively, showing almost similar groupings of genotypes but with some differences in the similarity coefficient between different genotypes.For example, with SSR markers, DLSA-17 and CISA-6-256 exhibited a maximum similarity coefficient value of 0.82, while with ISSR the maximum value (0.86) was for CISA-6 and CISA-8.For combined datasets, a maximum similarity coefficient (0.785) within cluster 2 was obtained for CISA-614 and RG-541.The third cluster of SSR-based dendrograms consisted of 23 genotypes, while in dendrograms based on ISSR and combined data, the third cluster consisted of nine genotypes each.Similar observations were made for cluster four, which consisted of 4, 16 and 19 genotypes in SSR, ISSR and SSR+ISSR based dendrograms, respectively; CISA-7 and CISA-294 were found to be closer than the rest of the genotypes in the case of ISSR and SSR + ISSR dendrograms; but, in the case of SSR, these two genotypes were present in cluster 3. Cluster five consisted of almost similar number of genotypes, that is, 4, 5 and 7, in the three dendrograms formed.In this cluster, LD-1010 was found to be more distant than the rest of genotypes in all three of the dendrograms obtained.
Cophenetic correlation coefficients for individual techniques based on genetic similarity value matrices were obtained using the Mantel matrix correspondence test.High correlation coefficient values were obtained for ISSR markers (r = 0.94), for combined data set (SSR + ISSR) marker (r = 0.91) and for SSR markers (r = 0.87).All three dendrograms showed almost similar groupings with some differences in the genetic similarity

DISCUSSION
During the past few decades, molecular markers have been commonly used for assessing genetic diversity, which is the basis for the genetic improvement of any given species.The important criteria of selecting the right molecular marker depends on the specific application, presumed level of polymorphism, presence of sufficient technical facilities, time constraints and financial limitations (Kumar et al., 2009).Sometimes the combined use of two or more markers for the study of genetic diversity has been found to be better than respective individual markers (Anna Serra et al., 2007).In the past, a variety of molecular markers like RAPD, ISSR and SSR have been used for estimating the genetic diversity in G. arboreum (Dongre et al., 2011;Bardak and Bolek, 2012;Noormohammadi et al., 2013a).SSR are locus specific, co-dominant markers, and are considered ideal for fingerprinting; while ISSR are multi-locus, dominant markers, and have been found very efficient for diversity analysis.Therefore, the present study documents the comparative utility of these maker types for genetic diversity studies in accessions belonging to G. arboreum race bengalense.

Marker polymorphism
Both SSR and ISSR markers were found to reveal a similar level of polymorphism as revealed by the same average value of PIC (0.38) obtained for each.The average PIC value for SSR markers obtained during the present study was less than that obtained by Kantartzi et al. (2009) (average PIC 0.42) while genotyping various G. arboreum genotypes with SSR markers, though their highest PIC obtained was less (0.75) than that obtained during the present study (0.809).ISSRs are dominant markers and therefore a maximum PIC value of 0.50 can be expected for a given ISSR loci.During the present investigation, for one marker ISSR-82, this threshold was reached while values very close to the threshold were obtained for ISSR-1, ISSR-62, 0.45,0.47 and 0.45,respectively).A PIC range of 0.00 to 0.5 with an average of 0.321 was also obtained previously in another study using ISSR markers for some tetraploid cotton (Noormohammadi et al., 2013b).
In addition to PIC, certain other parameters such as MI and D have been documented as very useful for evaluating the efficiency of molecular markers (Belaj et al., 2003;Myskow et al., 2010).The utility of any given marker is found in a balance between the level of polymorphism it can detect and its capacity to identify multiple polymorphisms (Powell et al., 1996).The MI is considered to be an overall measure of the efficiency of a marker to detect polymorphism, and is related to EMR value.Discriminating power is considered as a good estimator of the efficiency of a primer or locus.It depends not only on the number of patterns generated, but also on their relative frequency (Tessier et al., 1999).On the basis of these factors, a core set of 21 SSR primers (Table 3) were identified as highly informative markers with high PIC, very good discriminative power and MI.Likewise, 54 ISSR primer pairs could be identified on the basis of higher MI values.Multi locus marker systems like ISSR are expected to produce higher EMR and MI than single locus SSRs (Belaj et al., 2003).Markers with higher EMR and MI values are better for analysis of both interspecific and intraspecific genetic diversity (Singh et al., 2014).Several studies report such identification of a core set of highly polymorphic and discriminative markers to be very helpful for varietal identification and genetic diversity assessment (Masi et al., 2003;Jain et al., 2004;Kantartzi et al., 2009).

Comparative utility of marker system
The selection of a particular type of molecular marker is important and critically depends on the intended use (Gupta et al., 2002).The discriminative abilities of both marker systems were compared using certain selected parameters which have also been used earlier for such purposes in some studies (Mukherjee et al., 2013).The presence of rare bands/alleles can produce low frequency of patterns and result in lower D values.ISSR markers exhibited considerably higher number of banding patterns, more polymorphic bands/assays and higher discriminative powers compared to SSR during the present investigation.The similar edge of ISSR over SSR in terms of discriminative capability for a given set of genotypes has also been observed in certain other studies (Singh et al., 2014).
SSR markers are locus specific, multi-allelic and codominant in nature.These have been found to detect higher levels of polymorphism and so, generally, are the markers of choice in plant genetics and breeding (Kantartzi et al., 2009).ISSR are bi-allelic (hence supposed to be less informative) and are locus unspecific, but are more randomly distributed throughout genome than SSR (Kumar et al., 2009).This abundance of ISSR sometimes compensates for their bi-allelic nature and may make them very informative for a given germplasm (Vijayan, 2005).Further, the low development and running cost makes ISSR more suitable than SSR (Vijayan, 2005).
During the present study, SSR markers outperformed the ISSR in terms of Ne, I and He parameters.Ne represents the number of equally frequent alleles it would take to achieve a given level of gene diversity.The Shannon index (I) is a diversity index that is used to characterize species diversity and is an indicator of both the abundance and evenness of the species present.The reason for high heterozygosity in case of SSR markers is due to its co-dominant nature, which permits the detection of a high number of alleles per locus as these are multi-allelic as compared to ISSR markers, which are biallelic in nature (Belaj et al., 2003).
During the present study, the average PIC value for SSR was on the lower side (0.38) as SSR, being codominant, yielded PIC values in the range of 0 to 1.0.On the other hand, ISSR markers yielded a higher value of average PIC (0.38), while for dominant markers the range is 0 to 0.5.Further, ISSR also showed better utility in detecting multiple polymorphisms as revealed by high MI and high EMR (Table 4).1).

Figure
Figure S1.a) SSR profile with primer BNL2960, b) ISSR profile with primer ISSR 40, of the selected 65 genotypes (numbers are as per Table1).

Table 2 .
Different SSR and ISSR primers used for present study.

Table 3 .
Description of 21 selected SSR markers for all the studied genotypes of G. arboretum.
NSB, Number of scorable bands; NPB, number of polymorphic bands (NPB); PIC, polymorphic information content; MI, marker index; Tp, number of banding patterns; D, discriminative ability.

Table 4 .
Description of 53 ISSR markers for all the studied genotypes of G. arboretum.
NSB, Number of scorable bands; NPB, number of polymorphic bands (NPB); PIC, polymorphic information content; MI, marker index; Tp, number of banding patterns; D, discriminative ability.