Development of microsatellite markers for use in breeding catfish, Rhamdia sp

1 Laboratório de Engenharia Genética Animal – LEGA/Instituto de Biologia/Programa de Pós Graduação em Zootecnia. Universidade Federal de Pelotas; Pelotas, RS. Brasil. 2 Programa Pós-graduação em Genética e Biologia Molecular – Universidade Federal do Rio Grande do Sul. LEGA/ Universidade Federal de Pelotas; Pelotas, RS. Brasil. 3 Laboratório de Engenharia Genética Animal – LEGA/Instituto de Biologia. Universidade Federal de Pelotas; Pelotas, RS. Brasil. 4 Universidade Federal do Rio Grande do Sul. Departamento de Zootecnia, Universidade Federal do Rio Grande do Sul; Porto Alegre, RS, Brasil.


INTRODUCTION
The Brazilian aquaculture has expanded its production in the past years based on exotic species including the tilapia.However, recently there is an interest in incorporating native fish species of Brazil in this production system.As an example we can mention the tambaqui species (Colossoma macropomum) and cachara (Pseudoplatystoma corruscans) that are being targets of developments in strains Aquabrasil project.However, the species cited above are not applied for pisciculture of Rio Grande do Sul (RS), either restricted in legislation or even environmental.It must be emphasized that the pisciculture of RS is based on exotic species, including carp (Ctenopharyngodon idella, Cyprinus carpio, Hypoththalmichthys molitrix and Aristichthys nobilis) and tilapia (Oreochromis noloticus) (Brazil, 2012).The catfish (Rhamdia sp.) is the native species most representative in the state of RS, by presenting a production more than 2,000,000 fingerlings (Brazil, 2012).Researches believe that the catfish is the most promising native species for intensive production in the state, because of its characteristics such as: easy to adapt in different environments, weather and artificial diets, the handling is very simple and have a good commercial acceptance (Baldisserotto, 2004;Pouey et al., 2011).
Although studies in cytogenetic will be developed for catfish (Huergo and Zaniboni-Filho, 2006;Silva et al., 2007;2011), the genetics of the reproducers stock of Rhamdia sp. in the South region and Southeast of Brazil is currently not known.Knowledge of the genetic variability and standards of population structure are prerequisites for the strategies development for future genetic improvement programs.However, studies of genetic variability in catfish require the development of molecular markers.Among the currently available markers, the microsatellite markers (Simple Sequence Repeats, SSR) are a tool satisfactorily used in studies of population structure, species conservation and management of genetic resources (An et al., 2012).
Microsatellites present codominance and high polymorphism, being possible to studied the genetic differences between closely related populations (Na-Nakorn et al., 2010) are therefore considered a valuable tool for population genetics.The development of microsatellites markers coming from model species formerly required a very expensive technical effort, with lengthy and costly procedures.These proce-dures include techniques such as creating libraries enriched for SSR loci, cloning, hybridization to detect positive clones, plasmid isolation and sequencing of Sanger (Castoe et al., 2012).However advances in DNA sequencing technology has provided more efficient and cost effective methods to develop molecular markers for species that do not have available data (Buschiazzo and Gemmell, 2006), currently known as Next-Generation Sequencing (NGS).
Studies indicate that this new technology will replace the conventional protocols for isolation of microsatellites (Abdelkrim et al., 2009), and there are increasing reports employing NGS microsatellite markers in studies of species not models (Saarinen and Austin, 2010;Yu et al., 2011).This study has the objective to develop catfish microsatellites markers through NGS, aiming to understand the genetics of this species.

Animals and DNA extraction
Blood samples were collected from catfishes from the Chasqueiro Pisciculture Station, located between the coordinates 32º02'15'' and 32º11'07'' of south latitude and 52º57'46'' and 53º11'18 '' of west longitude, belonging to the Federal University of Pelotas, in the municipality of Arroio Grande -RS, Brazil.For DNA extraction, the Blood Genomic DNA Miniprep Kit was used according to the manufacturer's instructions (Axygen Bioscience, USA).The quality of extraction was checked in 1% agarose gel, stained with Gelgreen (Biotium, USA) and visualized in white light transilluminator (Clare Chemical, USA).The total concentration of DNA was measured using NanoDrop 2000c spectrophotometer (Thermo Scientific, USA).

Preparation of genomic library
A single shotgun paired-end library was prepared from genomic DNA of catfish according to standard Illumina Nextera DNA Library Kit, Kit protocol with double index (Illumina, USA  (Rozen and Skaletsky, 2000) for drawing primers.To calculate the GC content, allocation of base ("N"), level of the duplicated sequences and quality of the sequences, the FASTQC v0.10.0 _ program was used.

Drawing of primers
The following criteria for the primers design were used: 1) GC content higher than 30%, 2) melting temperatures of 58° -65°C with a maximum of 2°C difference between the primers, 3) the last two nucleotides on the end 3' is G or C, and 4) Maximum poly-N of 4 nucleotides.If all other criteria are achieved, a single primer pair is chosen presenting the highest score assigned by Primer3, besides the larger size of the region of amplification of the repeated sequence.For each loci, primers had one of the incorporation of the M13 sequence (5 'TGT AAA ACG ACG GCC AGT 3').The addition of this sequence allows the indirect identification of allele sizes facilitating the subsequent genotyping (Brownstein et al., 1996).

PCR and SSR amplifications
From potentially amplifiables loci (PALs) obtained, a group of 12 loci was selected and amplified for subsequent microsatellite fragments obtaining by Sanger-type sequencing.Amplifications were performed in a total volume of 25 μl including 7.5 pmol of each primer, ~ 30 to 50 ng template DNA, 0.2 mM dNTP, 1 unit of Taq *Corresponding author.E-mail: nunes.mdnunes@gmail.compolymerase Dream (Fermentas), 1.5 mM MgCl2, and 1 × PCR buffer.The annealing temperature was tested for each of the loci for gradient in one Eppendorf Mastercycler Gradient thermocycler (Eppendorf, Germany).PCR products were verified by electrophoresis on 1% agarose gel and visualized by staining with GelGreen (Biotium, USA).The PCR products were sequenced using MegaBace sequencing kit (Amersham Biosciences, Uppsala, Sweden) in a capillary sequencer MegaBace 1000 (Amersham Biosciences, Uppsala, Sweden).Sequences were analyzed using Finch TV 1.4.1 software (Geospiza, Inc, USA).To confirm the polymorphism of microsatellite loci, four tetranucleotide (Rq68040, Rq137981, Rq164109 and Rq51373) and one dinucleotide (Rq91253) were chosen and tested in six populations of catfish reproducers, located in the cities of São João do Polêsine, Cruzeiro do Sul, Passo do Sobrado, Três de Maio, Seberi and Mato Leitão in the state of Rio Grande do Sul (RS).A total of one hundred and seventy-two animals (172) were genotyped.The samples were genotyped in 10% polyacrylamide gel for 3 h at 100 V/cm.DNA bands were stained with silver nitrate (Qu et al., 2005) and individual genotypes were defined according to the standards of the bands.Number of alleles of each loci, observed heterozygosity (Ho) and endogamy index (Fis) were analyzed using GENEPOP version 4.0 software (Rousset, 2008).

RESULTS AND DISCUSSION
Large amounts of genomic sequences of species are possibly generated presently as a result of new technologies instead of genomic model.For catfish, native species and promising development of microsatellite markers allowed available large-scale data, assisting in the advancement of research for the species.Through the HiSeq sequencer (Illumina), five million paired reads (paired-end) were obtained, all quality scores by base showed values above Q28 and mean quality for reading was excellent; Q38.A number of 6,331 microsatellite loci potentially amplifiable (PALs) of the total readings were found in catfish, 4,755 of which were dinucleotide, 728 trinucleotide, 729 tetranucleotide, 117 pentanucleotídeo and 2 were hexanecleotídeo (Figure 1).In comparison with other genomes, catfish shows a number of microsatellite loci found in humans (5.264 microsatellites) (Dib et al., 1996) and discrepant of zebrafish, Danio rerio (116.915microsatellites) (Rouchka, 2010) demonstrating the oscillation in microsatellites sequences.For Henichorynchus siamensis a freshwater teleost of great economic importance in the Mekong River Basin (Southeast Asia), 65.954 sequences were obtained with the Roche 454 GS-FLX platform, out of the total sequences obtained, 1.837 were SSRs (Iranawati et al., 2012).Although, it is a teleost, the number of sequences obtained was indeed smaller than those found in catfish, and the divergence may be due to size of the genome of each species or differences in the platforms used and its specific characteristics, such as coverage of the genome, sequence number and size of readings.Iranawati et al. (2012) using the same platform for Megalobrama Pellegrini, a fish native to China, obtained 257.497 reads, with 49.811 PALs (Wang et al., 2012).Compared to the number obtained with the catfish, this is almost 20 times lower and almost nine times greater of PALs, relating these differences the distinctions between the two platforms.The GC content for each reading was equivalent to the expected theoretical distribution to catfish (41%), a positive result, because the unknown the actual GC content in the genome.A similar value has been found for Parus major, (40.7%) (Santure et al., 2011).However, for most animals, the percentage of GC varies, the values shown between 35 and 45% of the genome (Meglécz et al., 2012).The GC dinucleotide is rare present in all studied genomes (Tóth et al., 2000).The length of the sequences, the mean value expected to catfish was 100 bases.In these analyzes it was not observed that an error allocation base ("N") in any position reading, which is normal at the end of the readings occur in late positions.The level of duplicate sequences stayed around 9.56%, meaning that the library partial had a good coverage.From the total 6,331 PALs obtained, the most common motifs found for catfish (75.1%) were dinucleotide also obtained in the Schizothorax biddulphi (Luo et al., 2012) and H. siamensis (Iranawati et al., 2012), respectively, 77.08 and 74.41%.To H. siamensis 9.53% of the sequences were trinucleotide, 16.06% tetranucleotíde and repetitions type penta-and hexanucleotide were not detected (Iranawati et al., 2012).In the case of S. biddulphi similar to catfish, penta-and hexa presented themselves at low frequency (0.65 and 0.22%, respectively) (Luo et al., 2012).
In contrast, for Raja pulchra fish of total PALs obtained (312.236),18% were dinucleotide repeats and 0.11% of trinucleotide type (Kang et al., 2012.).Although the authors have not submitted the results obtained for tetra-, penta-and hexanucleotide, it is possible to observe that the dinucleotide repeat type is not presented as the most frequent.The most frequent motif in dinucleotide repeats for catfish was TC and AC (Figure 2), similar to results obtained for Fugu rubripes (Edwardsa et al., 1998), Ictalurus punctatus (Somridhivej et al., 2008), Etheostoma okaloosae (Saarinen and Austin, 2010), R. pulchra (Kang et al., 2012), Schizothorax biddulphi (Luo et al., 2012), but in different parts of Argopecten irradians (TA) (Zhan et al., 2005), C. carpio (AC/TG) (Wang et al., 2007), Crassostrea virginica (Wang and Guo, 2007) and Perca flavescens (Zhan et al., 2009) (AG/TC).In the same context, according to Meglécz et al. (2012), the most common dinucleotide motifs in Chordata depending on species are AC and TC, which concurs with the results obtained for some fish, birds and plants.In relation to trinucleotide, the most frequent SSR motif found to catfish was ATT (Figure 3).Similarly, according to Calabuig et al. (2012) ATT motif was also more frequent in birds (Coscoroba coscoroba) and different to other fish as R. pulchra (AAT) (Kang et al., 2012), C. carpio (AAT/ATC) (Wang et al., 2007) and Coreoperca whiteheadi (CCT/GGA) (Tian et al., 2012).According to Meglécz et al. (2012) in studies with over 130 species of eukaryotes, the AAT trinucleotide motif was the most common.In spite of the catfish, also Chordata, did not show the same tendency, because some factors may influence the microsatellites composition for each species, such as mechanisms of mutation, types of microsatellites (allele length, repeat unit of length, composition), genomic context and natural selection (Buschiazzo and Gemmell, 2006).
For tetranucleotídeos, the most frequent SSR motif found was ATGG (Figure 4 2012), for Chordata in general the most common tetranucleotide motif is AGAT and in plants AAAT, different results in both studies and groups of species, suggests that the vast variation that can be found from among the repeats microsatellite loci.The number of repetitions of motifs with pentanucleotides type and hexanucleotides presented are relatively low (Figure 1), with variations.For the pentanucleotídeo 17 repetitions were the most common presenting motif ATAGG (Figure 5) and repetitions of hexanucleotides type were obtained only two different motifs ATGTGT and ATTAGG (Figure 6).The data obtained with more than 130 species of Chordata also demonstrate a low number of motifs of penta-and hexanucleotides, making it difficult to provide good estimation of its proportions (Meglécz et al., 2012).According to the authors no pattern appeared in the relative frequencies of motifs, both for plants as for the group of Chordata, and stressing that although these motifs it characterized GC-rich, there is no other relations between them.Two main mechanisms have been proposed to explain the formation of microsatellites (Buschiazzo and Gemmell, 2006): spontaneous formation     of unique sequences by substitution of insertion (Dieringer and Schlötterer, 2003) by creating the proto-microsatellite, then microsatellite of elongation or propagation proto-or complete or transposable elements (Wilder and Hollocher, 2001).The formation of proto-microsatellite, is less likely for lengthy motifs than for shorter (Meglécz et al., 2012), which would explain why dinucleotide motifs are the most frequents in most taxons, and because pentanucleotides and hexane-cleotides motifs are rare.Data demonstrated that beyond of variations in the frequency of microsatellite and types of repeats between taxon, the specificity can be explained, in part, by the interaction of evolutionary mechanisms through the differential selection in regions of the genome and in different species.This suggests that, motifs microsatellites can be specific and characteristic of the species and their supposed genomic evolution (Meglécz et al., 2012).Based on the 6,331 PALs obtained for catfish through the HiSeq platform (Illumina), a group of twelve (10 tetra and 2 dinucleotide) loci were selected with similar criteria Castoe et al. (2012), and specific primers are designed for obtaining fragments and subsequent sequencing.Among 12 loci, five tetranucleotídeos (Figure 7) and one dinucleotide (Figure 8) were amplified successfully in the initial evaluation of the primers.The remainder primers have not generated the desired amplification products under the PCR conditions tested.The sequences of the primers, locus name, motifs of repeats, annealing tem-perature and the size of the PCR product are summarized in Table 1.
For further analysis, PCR fragments (Polymerase Chain Reaction) obtained from the developed microsatellites loci were sequenced by the Sanger method, which allows the knowledge of the complete sequence of microsatellites resulting from next-generation sequencing through HiSeq platform (Illumina).Five loci tetranucleotídeos and one dinucleotide were tested; fragments purified with Miniprep PCR Clean-up Axygen kit (United States) were sequenced in triplicate.The amplifications obtained from PCR were sequenced with the sequencing MegaBace (Amersham Biosciences, Uppsala, Sweden) kit in one capillary sequencer MegaBace 1000 (Amersham Biosciences, Uppsala, Sweden).The microsatellites loci for catfish were named for subsequent publication in Gene Bank: Rq158947, Rq164109, Rq137981, Rq68040, Rq51373 and Rq91253.As shown in Figures 9 and 10, we can see that four of the five loci tetranucleotídeos and one of the loci dinucleotide chosen for analysis showed repeat sequences with the motifs obtained through Table 1.Microsatellites primers designed for Rhamdia sp.(catfish), through the Primer3 software (version 2.0.0)(Rozen and Skaletsky, 2000).

Primer
Sequence of primer (5' -3')   HiSeq platform (Illumina).However, the locus Rq51373 showed a different result of the remaining sequenced loci tetranucleotides.The nucleotide sequence did not show any to microsatellite, however it is possible that the region where it was-repeat of sequence SSR the sequencing not carried out the reading (Figure 11).Further analysis should be conducted for this locus, in order to get more accurately verify results of the existence of the microsatellite.To evaluate the polymorphism of microsatellites loci, four tetranucleotides (Rq68040, Rq137981, Rq164109 and Rq51373) and one dinucleotide were chosen and tested in six populations of reproducers of catfish located in different regions of the state of Rio Grande do Sul.A total of 172 individuals were analyzed.The samples were submitted to genotyping in 10% polyacrylamide gel and stained with silver nitrate (Qu et al., 2005) (Figure 12).The mean number of alleles per locus was 6.14.The index of endogamy with positive values suggests a deficit of heterozygous in all loci analyzed in populations (Table 2).However, on locus Rq164109 two populations exhibited negative Fis values indicating that there was no occurrence of endogamy in these populations.From then on, it can be inferred that the populations showed genetic  variation and that the developed microsatellites markers are polymorphic (Figure 12).Although the both loci Rq51373 and Rq158947 has presented variations within populations analyzed, the banding patterns were not satisfactory.Thus, the locus Rq51373 in some samples could not be genotyped, resulting in a value of observed heterozygosity (Ho) equal to zero and one Fis equal to 1 (Table 2).
Analyzing genetic variability in two broodstocks catfish in Santa Catarina, Virmond et al. (2013) observed high level of polymorphism in three microsatellites loci, with 63 alleles genotyped in a total of 71 individuals.The endogamy coefficient with negative values in the two populations indicating that there was no occurrence of inbreeding and populations exhibited genic differentiation and genotypic significant.Differences between the values found by Virmond et al. (2013) and the data found in the populations of RS is mainly due to the fact of being broodstocks reunited from different regions of the state of Santa Catarina, with the objective of bringing together the greatest possible genetic variability to assemble families.In RS, according to the EMATER (Empresa de Assistência Técnica e Extensão Rural) and of the producers themselves, there are matrices exchange, that is, exchange of reproducers between the properties that produce catfish fingerlings, causing a probable occurrence of consanguinity among the populations and reducing the genetic variability.

Conclusions
Through the data that were generated in this study, we can define that the sequencing strategy through library shotgun paired-end HiSeq platform (Illumina) is effective for catfish (Rhamdia sp.), generating 6,331 potentially amplifiable microsatellites loci.The microsatellite markers developed have variation within populations, so the nextgeneration sequencing presented as fast and inexpensive way to develop microsatellites markers for species not models, such as catfish, providing large-scale data, enabling the advancement of research for the species.

Figure 1 .
Figure 1.Potentially amplifiable microsatellite loci in catfish (Rhamdia sp.) according to the size of the motif.Results was based in five million read paired-end Illumina (100 to101 pb).

Table 2 .
Genetic characterization of the five microsatellites loci developed for Rhamdia sp.(catfish) in six populations.