Characterization of novel developed expressed sequence tag ( EST )-derived simple sequence repeat ( SSR ) markers and their application in diversity analysis of eggplant

A total of 101,270 eggplant expressed sequence tag (EST) sequences at public databases were used to search for simple sequence repeats (SSRs) and 405 potential SSR loci were identified from 388 sequences. The highest proportion (34.07%, 138) was represented by trinucleotide, followed by dinucleotide (19.51%, 79) and hexanucleotide (15.8%, 64). Among the dinucleotide repeats, AG/CT was the most common (55.69%), followed by AT/AT (31.64%) and AC/GT (12.66%). Further, 288 pairs of primers were developed from these sequences. A random set of 100 EST-SSR primers were amplified in 12 eggplant accessions and 88 successfully amplified expect PCR products. 32 markers revealed 83 polymorphic alleles among the 42 cultivated accessions and the number of allelles per locus varied between 2 and 6 (mean 2.6). Polymorphism information content (PIC) values among the 42 cultivated types were calculated and varied from 0.045 to 0.701 (mean 0.289). The markers showed low frequency transferability in Solanaceae. The 32 SSRs were used to evaluate genetic diversity. These SSRs will be valuable markers for future genetic study, such as genetic diversity estimation, linkage mapping, association mapping and molecular breeding.

Eggplant (Solanum melongena L.), a member of Solanaceae, is an important vegetable in many countries.It is a good source of minerals and vitamins, and some polyphenols which show potent antioxidant activity (Nisha et al., 2009;Sudheesh et al., 1999).Despite the widespread cultivation and economic importance, its molecular genetics studies were behind those of tomato, potato, and pepper, especially in the aspect of high density linkage map construction (Nunome et al., 2009).Several linkage maps had been reported in eggplant (Barchi et al., 2010;Cao et al., 2006;Doganlar et al., 2002a, b;Nunome et al., 2001Nunome et al., , 2003Nunome et al., , 2009;;Sunseri et al., 2003;Wu et al., 2009), however, the populations for mapping were limited.Up till now, all the interspecific F 2 populations were developed from the cross between Solanum linnaeanum and S. melongena (Doganlar et al., 2002a, b;Sunseri et al., 2003;Wu et al., 2009).The other maps were all constructed on the intraspecific populations.This was because of the hybrid obstacle between interspecific species.SSR markers had been developed and used for mapping in eggplant (Nunome et al., 2009), however, the level of intraspecific DNA marker polymorphism were rather limited.To construct a high density linkage map, much more markers are needed.Because the eggplant genetic research lagged behind other crops, little sequences were available and used from public data banks.Fukuoka et al. (2009) submitted volumes of EST sequences to the public databases, including 98,086 pieces of EST containing 50,438,137 bp nucleotides, which enriched the eggplant databases and gave much information for SSR searching and developing.
In this study, we reported the characterization of novel eggplant EST-SSR markers developed from public databases and their application in genetic diversity evaluation.

Plant materials and DNA extraction
Forty-eight (48) morphologically different eggplant accessions were selected in this study (Table 1).The 1st to 6th eggplants were wild relatives, while the 7th to 48th eggplants were cultivated accessions.The accessions in sample panel 1 were used to test PCR amplification and detect polymorphisms.SSRs displaying polymorphism in six cultivated accessions (in panel 1) were subsequently tested against panel 1 and 2 for data acquisition and analysis.Accessions in panel 3 including 3 tomato accessions, 2 pepper accessions and 1 potato accession were used to study the transferability of eggplant EST-SSRs (Table 1).DNA extraction followed an improved procedure (Paterson et al., 1993), however without DIECA addition in the DNA extraction buffer.

Eggplant EST data retrieval and SSR detection
A total of 101,270 eggplant EST sequences were retrieved from the NCBI and Solanaceae Network database (SGN).Of all these ESTs, 98,089 ESTs were gotten from the NCBI website.The other 3,181 ESTs were retrieved from the SGN (http://solgenomics.net).PolyA and polyT tracts were removed using the EST-trimmer software (http://pgrc.ipk-gatersleben.de/misa/),by applying the criterion that no 50 bp window contain a run of five A's or five T's.The ESTs were assembled using the CAP3 assemble software (Huang and Madan, 1999).Identification and localization of microsatellites were carried out by MISA software (http://pgrc.ipk-gatersleben.de/misa/)(Thiel et al., 2003;Zhang et al., 2002).SSR motifs were searched with the criteria as follows: 20 repeats for mononucleotide, 10 repeats for dinucleotide, 7 repeats for trinucleotide, 5 repeats for tetranucleotide, 4 repeats for penta-and hexanucleotide, and 3 repeats for heptanucleotide.Primer pairs were designed from the flanking sequences, using PRIMER3 software (Rozen and Skaletsky, 2000) in batch mode via the p3_in.pland p3_out.plPerl5 scripts within the MISA package.The target amplicon size was set as 100 to 400 bp.Melting temperatures ranging from 55 to 59°C were tested, and the optimal temperature was found to be 57°C.The redundant primer pairs were analyzed using the BLAST (Altschul et al., 1990) software.The main parameters were "-p blastn -m 8 -F F -e 0.32".The result was analyzed using script of PERL.It extracted primer pairs located in the same sequence.The match had no gaps.If several primer pairs were located in the same sequence with the same SSR site, we thought these primer pairs were redundant.

PCR amplification and polyacrylamide gel analyses
PCR was carried out in a 10 µl reaction mixture containing 20 ng template DNA, 0.1 µM of forward and reverse primers each, 2.5 mM MgCl2, 0.2 mM dNTPs, 1×Taq buffer and 1 U Taq DNA polymerase (Shanghai Promega).Amplification was performed in a 96 well thermocycler (Eppendorf AG 6321).Cycles were programmed as follows: one cycle of 95°C for 2 min, 30 cycles of 94°C for 45 s, 55°C for 45 s and 72°C for 60 s, and one cycle of 72°C for 7 min, stored at 4°C.The PCR products were separated on 5% polyacrylamide gel and visualized by silver staining (Zhang et al., 2002).The motifs with a frequency of <0.5% were not listed in the table.

Data acquisition and statistical analysis
The amplified bands of EST-SSRs were recorded as present or absent (0), thus generating a binary data matrix.Polymorphism information content (PIC) value of each EST-SSR marker was calculated by Anderson et al. (1993).PIC indices were calculated for the information content in the 42 cultivated genotypes.Cluster analysis was performed using the unweighted pair-group method with arithmetic averages (UPGMA) and a dendrogram was constructed using the NTSYS-pc software version 2.10t (Rohlf, 2000).

Development of EST-SSR markers and their polymorphisms
All the 388 sequences were used for primer design by the software PRIMER3 and only 289 (74.48%)EST-SSR primers were obtained from these sequences.Among the 289 primers, only one pair of primers was redundant, so 288 non-repeated primers were produced and these 288 primers were different from the eggplant SSR primers reported before (Nunome et al., 2003(Nunome et al., , 2009;;Stagel et al., 2008;Tumbilen et al., 2011).Out of the 288 primer pairs, a random set of 100 EST-SSR primers was selected for PCR optimization, characterization and amplification with 48 selected eggplant accessions.The result showed that 88 (88%) primer pairs were successfully amplified expect the PCR products (Figure 1).The remaining 12 primer pairs (12%) failed to amplify or were amplified weakly.Among the 88 primers, 9 had no polymorphism among 12 accessions in panel 1, while the remaining 79 markers revealed 323 polymorphic alleles.As for the wild relatives, 64 of the 79 markers revealed 155 polymorphic alleles; as for the cultivated accessions, 32 of 79 markers revealed 83 polymorphic alleles.The details of the 32 EST-SSR primers are listed in Table 3.Of the 32 primers, 2 (EES019 and EES033) could not amplify any product in the wild relatives but could reveal polymorphisms among the cultivated accessions.Another 4 primers (EES020, EES035, EES076 and EES084) showed no polymorphism among the wild relatives but did in the cultivated accessions.A total of 83 alleles were amplified from 42 cultivated accessions, with the number of alleles per locus varying between 2 and 6 (mean 2.6).PIC values among the 42 cultivated types were calculated and varied between 0.045 and 0.701 (mean 0.289).Primer EES038 had the highest PIC, while EES033 and EES067 had the lowest.The correlation coefficient between PIC and SSR length was 0.03.Our study showed that 32% of the EST-SSRs were polymorphic.

Transferability of EST-SSRs in Solanaceae
The transferability of the developed EST-SSR markers was evaluated with three Solanaceous crops including tomato, pepper and potato.Out of the 100 selected primer pairs, 31 (31%) primers could amplify PCR products from at least one of the three species successfully, 27 (27%) in tomato, 27 (27%) in potato and 24 (24%) in pepper.Of the 27 primers that could produce amplicons in tomato, 3 primers (EES043, EES080 and EES081) showed polymorphism in the three tomato accessions.In this study, low transferability was obtained for tomato, potato and pepper.

Diversity analysis
A dendrogram based on the similarity coefficients of the 48 accessions was constructed (Figure 2).The dendrogram scale varied from 0.21 to 0.91.The dendrogram indicated a clear separation between the cultivated species and the wild relatives.Hong Qie, Congo Qie, Guangshang Qie and Jiló were grouped into cluster I, which belonged to Solanum aethiopicum.Cluster II contained two wild species, Suanjie Qie (Solanum sisymbriifolium) and Huangguo Qie (Solanum xanthocarpum).Cluster III consisted of the cultivated species (S. melongena).Cluster III-1 consisted of the accession of Bai Qie, which is a prickly plant that bears small, green, striped and round fruit.Thus, Bai Qie had a close relationship with the wild accessions.Cluster III-2 consisted of the other cultivated accessions with a mean similarity of 0.62.The cluster distant to the cultivated group was the S. aethiopicum accessions with a mean similarity of 0.25.The cluster closest to the cultivated group contained both S. sisymbriifolium and S. xanthocarpum, with a mean similarity of 0.30.

Characterization of EST-SSRs
The result that trinucleotide, dinucleotide and hexanucleotide repeats represented majority of EST-SSRs was in agreement with the previous observations of SSR repeat units in barley, maize, rice, sorghum, wheat and grape (Huang et al., 2010;Kantety et al., 2002).The Figure 2. A dendrogram constructed based on Jaccard's similarity coefficient and UPGMA clustering.The samples are labeled with the codes liste trinucleotide motifs represented the most common class in the expressed sequences (Stagel et al., 2008).The dominance of trinucleotide and hexanucleotide SSRs was viewed as the result of the frame shift in the size of an amino acid read, or the three nucleotides, a selection against possible frame shift mutations (Huang et al., 2010).Fukuoka et al. (2010) also reported that AG/CT was most common, followed by AT/AT and AC/GT among the dinucleotide repeats.The same result agreed with that of pepper.The result that AG/CT ranked first agreed with that of grape (Huang et al., 2010) and cotton (Lu et al., 2010).In eggplant, Stagel et al. (2008) also reported that trinucleotide represented the most common repeat, with AAG/CTT the most frequent.In summary, many researchers had gotten similar results in Solanaceous crops (Fukuoka et al., 2010;Nunome et al., 2003Nunome et al., , 2009;;Stagel et al., 2008).But, the results were not the same, perhaps because the criteria and the size of the sequence dataset were different.

Development of EST-SSR markers and their polymorphisms
The correlation coefficient between PIC and SSR length indicated that there was no clear correlation between SSR length and informativeness.Similar observations had been reported in pepper and some other species (Nagy et al., 2007).However, in other studies, researchers had found correlation between repeat length and informativeness (Frary et al., 2005;Stagel et al., 2008;Tumbilen et al., 2011).Studies have reported that of the 25 studies across a variety of plant species, on average, 17.7% of loci producing PCR products were monomorphic (Squirrell et al., 2003).Low frequency of DNA polymorphism of most of the SSR markers had been observed in cultivated eggplants.Nunome et al. (2003Nunome et al. ( , 2009) ) had reported that 56.7% of genomic SSRs and 30.3% of EST-SSRs were polymorphic.Studies by Stagel et al. (2008) showed that only 28.2% SSRs were informative among the cultivated eggplant.Our result also indicated that only 32 EST-SSRs were informative among the cultivated eggplant, which was in agreement with the earlier mentioned researches.The reason might be because of the intensive breeding efforts and a narrow genetic background (Nunome et al., 2003).

Transferability of EST-SSRs in Solanaceae
In this study, low transferability was obtained for tomato, potato and pepper.In Solanaceous plants, transferability between potato and tomato, and from tomato to eggplant had been confirmed (Nunome et al., 2003;Stagel et al., 2008) and also it had been reported that only few of tomato SSRs (15/600) can be applied to eggplant (Li et al., 2010), which was in agreement with this study.The transferability frequency was low when compared with that of cucumber (Hu et al., 2010) and previous studies in Solanaceae (Nunome et al., 2003(Nunome et al., , 2009;;Stagel et al., 2008), all of which reached 50%.The reasons might be because of the far genetic relationships between eggplant and the other Solanaceous plants and the larger amount of EST numbers used than before.Though the transferability was not high enough, it can be exploited as anchor markers for Solanaceous comparative mapping.

Diversity analysis
The genetic relationships between the 48 eggplant genotypes as displayed by genetic similarity at the SSR level were in good agreement with prior taxonomic classification based on AFLP markers (Furini and Wunder, 2004), SRAP markers (Li et al., 2010) and SSR markers (Stagel et al., 2008).The result that S. aethiopicum accessions were not the closest to cultivated accessions agreed with the study by Stagel et al. (2008) but not with the research by Furini and Wunder (2004) and Tumbilen et al. (2011).Tumbilen (2011) had reported that genetic diversity analysis based on molecular data was highly dependent on the number and type of marker chosen and the plant accessions tested.Also, the interpretation of the genetic relationships will also depend on the point of view of the scientist (breeder vs. taxonomist vs. molecular geneticist) performing the analysis, and thus, should be performed with caution.

Figure 1 .
Figure1.Genotypes of the 30 accessions at EES066 loci.M, marker.Genotypes and their order are shown in Table1.

Table 2 .
Number and percent of major repeat motifs of EST-SSR in eggplant.