The cichlid 16 S gene as a phylogenetic marker : Limits of its resolution for analyzing global relationship

The phylogenetic utility of the 16S gene in cichlids is assessed. Eighty-six (86) partial sequences belonging to 37 genera of cichlids from the Genbank was analyzed. The alignment had four hundred and sixty three (463) basepairs with 337 conserved sites and 126 variable sites. Base compositional bias is similar to that found in higher organism with Adenine having the highest average of 30.3%, followed by cytosine, guanine and thiamine with the average values of 26.1, 21.9 and 21.7% respectively. The most suitable evolutionary model is the K2+G+I model as this had the lowest Bayesian Information Criterion. There were 4 major indels at basepair positions 328 which is unique to the Heterotilapia buttikoferi, position 369 unique to Gramatoria lemarii, position 396 which is shared by Tilapia sparrmanii, T. guinasana and T. zilli. The indel at position 373 was found in all tested species except the Oreochromis mossambicus. The Tilapine general is the basal group in Cichlids. The 16S gene separates the Tilapia genera without any ambiguity but there were phylogenetic overlaps in the Sarotherodon and Oreochromis. More finite molecular and statistical methodology may be needed to distinguish the Sarotherodon and Oreochromis. The diversity of cichlids is generally very low due to a common ancestry with little differentiation genetically. The grouping of the Oreochromis and Sarotherodon genera together in the same clade is not unconnected with the preservation of genetic beacons that the group retained as it evolved.


INTRODUCTION
Freshwater fishes of the family Cichlidae live throughout Africa, the Neotropics, Madagascar and India.This distribution indicates that the ancestral Gondwana-wide range dating back to about 130 million years (Ma) and the age of the group in light of the available fossil evidences hold (Lundberg, 1991).Morphological characters have been the basis for assessing the phylogeny of the group.Based on this, Kaufman and Liem (1982) and Stiassny (1987) suggested the monophyly of cichlids.The use of cichlids as a model for evolutionary and diversity studies is as old as the history of research into the many aspects of evolution.Cichlids are a wide array of fishes that have been studied for their adaptive radiation and distribution in various water bodies around the world.At least about 1,700 species have been described scientifically (Fishbase, 2012) making them one of the largest vertebrate families.Species that are new are daily being discovered because of the unrestricted admixing of cichlids in the water bodies where they are found.Speciation is rife within the group and usually classification based on morphology and molecular techniques are sometimes conflicting.Several variation also exists in term of reproduction ranging from open brooding, mouth brooding, ovophile and larvophile mouth brooding.These variations have evolutionary implications especially as it concerns the availability of food and favourable breeding conditions.Farias et al. (1998Farias et al. ( , 1991) ) concluded that the use of the 16S in the resolution of phylogeny is also not new, although can so far be described as being usually inconclusive.Fragments of the mitochondrial 16S rRNA gene for 34 South American genera were sequenced in a similar research work.They identified Neotropical cichlids as a monophyletic group with further suggestions that Heterochromis and Retroculus are the most basal taxa of their African and Neotropical cichlid clades respectively.The scheme of relationships among Neotropic genera obtained by Nagl et al. (2001) and Klett and Meyer (2002) was the first to analyse mitochondrial DNA of more than 30 tilapiine taxa.While the first study focused on Oreochromis, the latter included a pan African sample of 39 tilapiine as well as 19 non tilapiine, mostly East African species in their analysis.Brown (1985) and Boore (1991) noted that the mitogenome of vertebrates are usually circular molecules containing 13 protein-coding genes, two rRNA genes (rRNAs), 22 tRNA genes (tRNAs) and a putative control region.The simplicity of the structure, constant gene content, rapid evolution rate, and maternal inheritance, mtDNA makes it a suitable tool for studying population genetics (Li et al., 2012), biogeography (Xiao et al., 2001) and phylogenetics (Miya et al., 2003).It also serves very great purposes in offering genome and sequence level information such as gene rearrangement and the evolutionary patterns.Lowe-McConnell (2009) noted that the establishment of a relationship among taxas using molecular methods has been very frustrating because of the persistence of ancestral polymorphism within and between species.
The 16S gene is regarded as the good molecular clock and its wide use in evolutionary, phylogenetic studies and taxonomic studies is established.The abundance of suitable primers and the presence of large volumes of partial sequences of the 16S gene in the many databases results in unambiguous classification.
Information regarding the development and use of suitable biomarkers for population structure, phylogeny and phylogeographical studies are of utmost importance for the development of species boundaries, interrelationship between and within species, proper identification of species especially in very speciose organisms like the cichlids.The large numbers of sequences now available for this gene allow detailed phylogenetic discrimination of cichlids based on the 16S.The objective of this study is to test the phylogenetic utility of this gene more fully by estimating relationships of the speciose group collectively called cichlids which has been the basis of several studies to further the understanding of several principles and processes in evolution.

Taxon sampling and DNA methods
A very comprehensive taxonomic sampling of cichlid species with 86 species and 37 genera is used to examine the phylogenetic importance and utility of the 16S gene sequence.The fish taxa included in this study are listed in Table 1.The basis of selection was the availability of 16S rRNA data, geographical location and a 95% sequence similarity.Sequences that were exact the same were excluded as this will amount to duplication, therefore 86 unique sequences was used for this analysis.

Molecular phylogenetic analysis
DAMBE version 5.0 (Xia, 2013) was used to initial check for similarities in the sequences.Sequences that were found to be the same were removed from the analysis.Eighty six (86) sequences aligned using the Clustal W multiple sequence alignment (MSA) program.The web platform Phylogeny (www.phylogeny.fr)(Dereeper et al., 2008) was used to determine the phylogenetic relationship within and between the species using the advanced mode option with multiple aligned using MUSCLE, alignment curation using Gblocks and the construction of phylogenetic tree using maximum likelihood.The optimized phylogenetic tree was used as the consensus.Phylogenetic inferences were then discussed.Frequently used statistical indices in phylogenetic studies were assessed.Nucleotide composition and frequency was also determined.Genetic distances were calculated by Kimura's two-parameter method (Kimura, 1980) and phylogenetic reconstruction using the neighbor-joining method Saitou and Nei (1987) was performed by the MEGAsoftware, Version 6.0 (Kumar et al., 1993) with the pairwise deletion option for gaps.Felsenstein (1985) bootstraping method was used to test the reliability of the tree topology using 500 bootstrap replications.The substitution model for nucleotides with maximum composite likehood including transitions and transversions and a uniform rate was used.Gaps and missing data were treated as complete deletions.

RESULTS
A total of at most 463 base pairs were left after trimming the edges of the alignment.Since the 16S gene is not a protein coding genes the gaps which were due to insertions or deletions were considered in the analysis.337 sites (72.7%) were conserved.126 (27.21%) were variable sites.78 of these sites were parsimoniously informative and 48 were singletons.102 of sites are CpG sites.These sites are one of the many important sites in assessing gene polymorphism and amino acid methylation.There is a large conserved section of the alignment from basepairs 19 to 82, a total of sixty-three bases.This conserved section is common to all cichlids used in this analysis.This segment of the gene is ideal for the design of cichlids specific primers for population studies using the 16S gene.The phylogenetic tree is shown in Figure 1.The base compositional bias was assessed.This is described as the unequal proportion of the four bases (G, A, T and C), which is common in DNA sequences.As is typical, the purine Adenine has the highest average occurrence of 30.3% followed by Cytosine with 26.1 % and Guanine and Thiamine with 21.9 and 21.7, respectively.This pattern is found in most genes of higher organisms.The best model of evolution for the 16S gene of the assemblage of cichlids used in the analysis is the K2+G+I model as it had the lowest Bayesian Information Criterion scores.The Transition/Transversion Ratio (R) value was 4.36 with a bias towards transitions.
There are four (4) major indels in the aligned sequence.The indel on location 328 is unique to Heterotilapia buttikoferi while indels 369 is unique to Gramatoria lemarii.Indel 373 is found in all tested species except Oreochromis mossambicus and indel 396 is shared by Tilapia sparrmanii, T. guinasana and T. zilli.The monophyly of the Cichlid group is confirmed by the fact that Indel 373 is found in almost the species except O. mossambucus.

DISCUSSION
The resolution of the phylogeny of cichlids has always been challenging.Because of the speciose nature of the group, the resolution of closed group like those found in the African Great Lakes and riverine haplochromines are comparative well understood (Seehausen, 2006;Kocher, 2004;Salzburger et al., 2005;Koblmuller et al., 2008), but a large scale phylogenetic classification is fraught with a lot of controversies.With the Tilapine being described as the basal group of the cichlids and are unarguably the precursors of the current cichlid radiation.Thys van den and Audenaerde (1968) noted the monophyletic origin of the Genus Tilapia is also supported containing such species as Tilapia busumana, T. zillia, T. tholoni, T. rheophila, T. buttikoferi, T. sparrmanii, T. guinasana, T. bilineata and T. ruweti and is supported by this research finding.Further, Klett and Meyer (2002) finding of the formation of two lineages further supports the basal position of the Tilapia with these three factors of chance, contingency and historical determinism and the role they can jointly play to determine the rate of adaptive radiation, was noted in the contributions of the three genera of tilapiines namely the Tilapia, Oreochromis and Sarotherodon to cichlids diversity.
The bootstrap concensus tree creates seventeen major clusters or clades.Clades 1, 2 and 3 is constituted by the Tilapine group only.Clade 4 is also made up of the Oreochromis species exclusively namely the Oreochromis niloticus, O. tanganicae, O. andersonii, O. mossambicus, O. variabilis, O. esculentus and two other variants.Clade 5, 6 and 7 is an admixture of Sarotherodon and Oreochromis species.This clade reemphasizes the relatedness between the Sarotherodon and Oreochromis genus.Clusters 1, 2, 3 and 4 showed pure lineages of Tilapia, Oreochromis and Sarotherodon species.Dunz and Schliewen (2013) while checking the root of the East African cichlids radiation was also grouped of the Sarotherodon and Orechromis genera together, they however separated the Tilapine into a separate clade.They further grouped the Tilapia group collectively as the Boreotilapini, whilst the Sarotherodon and Oreochromis grouping was classified as Oreochromini.This finding is similar the phylogenetic Figure 1.The evolutionary history was inferred using the Neighbor-Joining method.The bootstrap consensus tree inferred from 500 replicates is taken to represent the evolutionary history of the taxa analyzed Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed.The evolutionary distances were computed using the Maximum Composite Likelihood method and are in the units of the number of base substitutions per site.All positions containing gaps and missing data were eliminated.There were a total of 450 positions in the final dataset.Evolutionary analyses were conducted in MEGA6.
grouping in that was obtained using the 16S gene.The clade 6 is an admixture of Orechromis and Sarotherodon.A finding accentuated by Klett and Meyer (2002) using the mitochondrial ND2 marker.The clusters 1, 2 and 3 are the most stable because of the homogeneity and high bootstrap values which are clear indications that the groupings are not due to chance.The low genetic diversity of the cichlid group is also evident as indicated by the 0.005 units substitutions per site obtained from the phylogenetic tree.The scheme of relationship between the Tilapine, Orechromis and Sarotherodon from the study is highly resolved with the 16S gene separating the Tilapine and the Oreochromis/ Sarotherodon species.
Other major and unique groupings are found in the clades 13, 14 and 15 where the Tropheus duboisi and Tropheus moorii, Haplochromis burtoni, and Maylandia species are exclusively found.Though cichlids are highly speciose, occurring in almost all water bodies and demonstrating great morphological variations, a larger fraction of their total genetic variation is preserved and represented in the family.Lowe-McConnell (2009) notes that phylogenetic studies such as this deepens the knowledge of what species of fish is present, the ecology and behavior of the individual species, and importantly the limnological conditions governing their life cycle.The 16S gene is a good biomarker for the separation of the genus Tilapia from both the Sarotherodon and Oreochromis without any ambiguity.It also depicts well the established evolutionary history of cichlids as documented by several other researches.There is however a need for the development of more finite statistical and molecular techniques for the resolution of the population differences between the Sarotherodon and the Oreochromis species.One limitation of the study is the evolutionary process which is ongoing in all species especially the Cichlids and the continuous interbreeding within the group.

Conclusion
As the development of molecular techniques and statistical classifiers progresses, the resolution of the ambiguities in the very speciose cichlids may become resolved.The initial difficulties with resolution along boundaries of species is orchestrated by the ease with which inter and intra breeding within the group occurs and the continuous change and evolution in the group.The preservation of ancestral genetic relics in terms of gene segments that may not be clearly indicated at the morphological level is also a major constraint.The 16S gene is a good indicator of the evolutionary history of the cichlids at all scales and also a good molecular marker for the separation of the three major genera in cichlids.The development of species specific primers is also clearly a possibility.The modification of general primers considering major and minor variations and species differences is a proven tool for the resolution of ambiguities in higher organisms.

Table 1 .
Showing the Genus with respective species (Family Cichilidae) investigated in the study.