Identification of accelerated evolution in the metalloproteinase domain of snake venom metalloproteinase sequences ( SVMPs ) through comparative analysis

Computational protein sequence analysis is one of the most important tools used for understanding the evolution of closely related proteins sequences including snake venom metalloproteinase sequences (SVMPs) which give valuable information regarding genetic variations. The fundamental objective of the present study is to screen the evolution distributed in metalloproteinase domain regions of protein sequences among different SVMPs in snake species which are involved in a range of pathological disorders such as arthritis, atherosclerosis, liver fibrosis, cardiovascular, cancer, liver and neurodegenerative disorders. In fact, SVMPS are responsible for hemorrhage and may also interfere with the hemostatic system. A comparative characterization of the metalloproteinase sequences has been carried out to analyze their multiple sequence alignment, phylogenic tree, homology, physicochemical, secondary structural and functional properties. DNAMAN software was used for multiple sequence alignment, phylogenic tree and homology and Expasy’s Prot-param server was used for amino acid composition, physico-chemical and functional characterization of these SVMPs sequences. Studies of secondary structure of these SVMPs were carried out by computational program. Based on the observed patterns of occurrence of atypical features, we hypothesize that amino acids of metalloproteinase domain region (66.63% identity) of protein sequences are highly changeable; whereas, signal peptide region (93.98% identity) is the lowest changeable protein sequence and the remaining other three domains such as propeptide region (87.36% identity), desintegrin domain region (78.63% identity) and cysteine-rich domain region (75.70% identity) show moderate changeable protein sequence. SVMPs might be an accelerated evolution, which is a key player in causing diseases. From the data, it can be suggested that over -changed metalloproteinase domain regions in snake venom metalloproteinase might be responsible for the generation of functional variation of proteins expressed, which in turn may lead to different disorders in humans after snake bite. The results of this study would be an effective tool for the study of mutation, drugs resistance mechanisms and development of new drugs for different diseases.


INTRODUCTION
Metalloproteinase is a ubiquitous enzyme that exists in nearly all organisms from animal to plants.However, apart from its different expression sites in different plants and animals for performing distinct physiological roles, metalloproteinase also exists in the toxin/venom of several venomous creatures (snake, caterpillar, scorpion etc.) to cause agony, suffering and even death of the prey/victim.Among them, snake venom is a very rich source of metalloproteinase and they are termed as snake venom metalloproteinases (SVMPs).Several diseases are shown to be associated with metalloproteinase.For example, genetic polymorphisms in matrix metalloproteinase genes MMP1, MMP9 and MMP12 are shown to be important in the development of chronic obstructive pulmonary disease (COPD) (Wallace and Sandford, 2002).Metalloproteinases also play role in the development of renal cyst (Obermüller et al., 2001), uterine cervical carcinoma (Libra et al., 2009), angiogenesis (Pepper, 2001) and various inflammatory diseases of the central nervous system such as bacterial meningitis.
SVMPs are more abundantly found in viper snake venom; however, they are also from few elapid families (Birrell et al., 2007;Fry et al., 2003).They are synthesized as zymogens in the venom gland and contain a propeptide which is cleaved off during maturation.They have a common zinc-binding motif with a consensus sequence of HEXXHXXGXXH (Bode et al., 1993).They are classified into different types (PI to P-IV) on the basis of the other domains that are present in these complexes (Hite et al., 1994).These families of enzymes are responsible for haemorrhagic, local myonecrotic, antiplatelet, edema-inducing and other inflammatory effects.Recently, it has been shown that SVMPs are potential tools in the development of drugs for the prevention and treatment of several illnesses.These enzymes are extensively used in the treatment and prevention of thrombotic disorders, since they serve as defibrinogenating agents (Costa et al., 2010;Bjarnason and Fox, 1994).Animal models of septic shock have also delivered proof-of-concept that MMPs can be of therapeutic interest (Vanlaere and Libert, 2009).
Evolution and diversification of snake venom is a very interesting phenomenon.Snake venom glands are believed to have evolved by the modification of the salivary glands, and various body proteins have been recruited in the venom gland and adapted to attack and damage various physiological system of the prey (Reza et al., 2006).Therefore, study of the expressed venom protein among and within a particular family enables us to understand the mode and direction of evolution of that gene family.A lot of variation is evident in the SVMPs among the species even within the same species with indication of accelerated evolution of this particular venom component.Therefore, this study was undertaken to perform a detailed bioinformatics analysis of the different domains of snake venom metalloproteinase sequences in order to understand their pattern of accelerated evolution.

Multiple sequence alignment
Twelve (12) SVMPs sequences from different venomous snake species were used for multiple sequence alignment of the species, with the aid of DNAMAN software.After multiple sequence alignment of SVMPs sequences of the different snake species, black color indicates positions with fully conserved residue; pink color indicates that one of the following high scoring groups is fully conserved; cyan color indicates that one of the following moderate scoring groups is fully conserved; white color indicates that one of the following 'weaker' scoring groups is fully conserved.

Classification of SVMPs sequences
SVMPs sequences were divided into five domains: signal peptide, propeptide, metalloprotease, disintegrin and cysteine-rich domain based on their domain organization after multiple sequence alignment.Signal peptide sequence (about 18 residues long), propeptide (about 176 residues long), metalloproteinase (about 205 residues long), desintegrin (about 95 residues long) and cyestinerich (about 194 residues long) domains are aligned separately.Cysswitch sites (PKMCGV) and Zn 2+ binding motifs (HEXXHXXGXXH) are marked in the box of black color.

Phylogenetic tree and homology construction
Phylogenetic tree of 12 SVMPs sequences was done using *Corresponding author.E-mail: mahmudulgebru@gmail.com.
Author(s) agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License  (Tamura et al., 2007), with UPGMA method.Each node was tested using the bootstrap approach by taking 1,000 replicates; the bootstrap analysis indicates strong support.Homology of 12 SVMPs sequences was done using DNAMAN software.

Analysis of physico-chemical properties
The SVMPs sequences were utilized as the input data type to compute the percentage of amino acid composition (%) (Islam et al., 2013), molecular weight, theoretical isoelectric point (pI), number of positively and negatively charged residues, extinction coefficient, instability and aliphatic index, Grand Average of Hydropathy (GRAVY), using Expasy Protparam tool (http://web.expasy.org/protparam).

Analysis of secondary structure
SOPMA tool (Self-Optimized Prediction Method with Alignment) of NPS@ (Network Protein Sequence Analysis) server was used to characterize the secondary structural features of the proteins such as, alpha helix, 310 helix, Pi helix, beta bridge, extended strand, beta turn, bend region, random coil, ambiguous and other states (Geourjon and Deleage, 1995;Roly et al., 2014a, Islam et al., 2015).

Analysis of functional properties
The analysis of the selected 12 SVMPs sequences was done with the help of Motif scan (http://myhits.isb-sib.ch/cgibin/motif_scan)tool (Roly et al., 2014b).The input data type was in FASTA format and scanned against 'PROSITE Patterns' which is a selected protein profile.

RESULTS AND DISCUSSION
In our present investigation, the NCBI database was used as source to collect SVMPs sequences from different venomous snake species with accession number (Table 1).A total of 12 SVMPs sequences (after removing the duplicates and partial sequences) were obtained from different venomous snake species.SVMPs sequences were reckoned into five domains: signal peptide, propeptide, metalloproteinase, desintegrin and cyestinerich domains based on their domain organization.Some researchers reported same result (Brust et al., 2013;Casewell, 2012;Ryan et al., 2003).Signal peptides of all the sequences are highly conserved and they are nearly identical.There are 18 residues in signal peptide which show 93.98% identity (Figure 1).However, the 13th residues in five sequences (A. c. laticinctus, D. acutus, B. jararaca, B. insularis, C. v. viridis) are Alanine while the remaining two (A. p. leucostoma, G. halys, S. c. edwardsi, N. n. atra, P. flavoviridis, B. multicinctus, C. atrox) is valin.However, as the properties of these two amino acid residues are almost the same we do not expect any change in signaling the secretion of the protein or in the removal of the signal peptide after secretion of the protein.Propeptide sequences of all the sequences are highly conserved and they are nearly identical.They are about 176 residues long showing 87.36% (Figure 2).The Cys-switch site (PKMCGV) within the propeptide is in the position of 165th residues (Figure 2).Cys-switch site (PKMCGV) is a short peptide of prodomain and is blocking the active site of metalloproteinase.When this peptide is removed, metalloproteinase is active.Metalloproteinase domains are 205 residues long and they have 66.63% identity (Figure 3).Desintegrin domains are approximately 95 residues long and have 78.63%identity.Same sort of grouping like metalloproteinase is also evident in the Desintegrin domain.The cystine-rich domains are 194 residues long and show 75.70% identity.In this study we showed that amino acids of metalloproteinase domain    region were more changeable due to synonymous and non-synonymous mutation (Figure 3) and have very low identity; whereas signal peptide domain region was very less changeable and has the highest percentage similarity among different SVMPs sequences.The remaining other three domains: propeptide (Figure 2), desintegrin (Figure 4) and cyestine-rich domains (Figure 5) were moderately changeable and showed moderate percentage identity.
Phylogenetic tree and homology indicate that metalloproteinase domain has very high distance relationship (Figure 8A and B) among twelve SVMPs; on In the analysis of amino acid composition, the percentage of cysteine residues in majority of the SVMPs sequences lies in the range of 6.1-6.7%;SVMPs sequences of B. multicinctus, C. v. viridis and A. p. leucostoma show a significant increase with values of 6.7, 6.6 and 6.5 percent, respectively (Table 2).The highest quantity of cysteine residues in B. multicinctus and C. v. viridis SVMPs sequences might be correlated with presence of cysteine switch motif and role of these SVMPs s in pathological conditions.These gelatinases have been early associated with several disorders such as carcinomas, cardio-vascular and so on.Highly significant presence of cysteine suggests its role as a critical residue for SVMPs activity and thus these SVMPs may be investigated for possible role in diseased   4.9 5.9 4.1 0.7 6.1 6.2 P. flavoviridis 7.5 3.9 6.5 6.4 6.4 4.9 5.6 6.4 2.8 5.2 6.9 6.4 2.5 2.9 4.6 4.9 5.4 0.5 4.4 6.0 B. multicinctus 6.0 6.0 5.9 5.2 6.7 3.4 6.0 6.7 2.3 5.5 7.3 8.3 1.6 2.4 4.7 5.7 5.0 0.5 4.6 6.0 conditions.Further analysis of the amino acid composition can help to place amino acid presence at remarkable level and be correlated with precise pathological conditions (Shckorbatov et al., 2008).
Furthermore, the Motif Scan tool predicts the presence of a cysteine switch, a zinc protease and desintegrin motif in SVMPs sequences which have been the subject of discussion in various literatures (Table 5).The cysteine switch regulates activity of SVMPs sequences via complex formation between cysteine residue of prodomain and zinc atom of catalytic domain (Van Wart et al., 1990).Cys-switch site (PKMCGV) motif is present in the propeptide (Figure 2) and blocks the active site of metalloproteinase domain; and when this peptide is removed metalloproteinase is active.The primary sequence motif HExxH is present in the catalytic domain of zinc-dependant SVMPs sequences.The two conserved histidine residues coordinate the zinc atom and the glutamic acid residue is a member of the active site of enzyme (Devault et al., 1988).The zinc binding region signature has been characterized as (uncharged)-(uncharged)-H-E-(uncharged) -(uncharged)-H-(uncharged)-(hydro phobic) (Jongeneel et al., 1989).Zinc protease motif is present within the catalytic domain (metalloproteinase domain) of SVMPs sequences (Table ), playing a pivotal role in the collagen binding region of these enzymes.

Conclusion
Intensive characterization and comparative analysis of the SVMPs sequence of proteins with the help of numerous bio-computational tools yielded new insights and perspectives which can be used to identify accelerated evolution of SVMPs sequence of proteins of different venomous snake species that play a crucial role in pathological conditions.In this study, multiple sequence alignment, phylogenetic tree, homology, physico-chemical, secondary structural and functional analysis of SVMPs sequence of proteins of different venomous snake species was carried out.The findings through this study may be used by researchers working on metalloproteinase of SVMPs in the context of any experimental system.So, from the identity comparison we can say that metalloproteinase domain is more diverse and under the evolutionary pressure.The amino acid composition shows a considerably high percentage of cysteine residues in B. multicinctus and C. v. viridis of SVMPs sequences, which might be a key player in pathological conditions.Future studies with the help of experimental research and test need to be carried out to validate this proposal.This study may be taken as a prototype for similar in silico investigational studies with regard to other large proteins families, where such comparative analysis might aid in giving a direction and help to rationalize the conduct of experimentation; it will also be very helpful to develop new drugs.

Figure 1 .
Figure 1.Multiple sequence alignment of signal peptide (93.98% identity) of SVMPs sequences: black color indicates positions with fully conserved residue; pink color indicates that one of the following high scoring groups is fully conserved; cyan color indicates that one of the following moderate scoring groups is fully conserved; white color indicates that one of the following 'weaker' scoring groups is fully conserved.

Figure 2 .
Figure 2. Multiple sequence alignment of propeptide (87.36% identity) of SVMPs sequences: black color indicates positions with fully conserved residue; pink color indicates that one of the following high scoring groups is fully conserved; cyan color indicates that one of the following moderate scoring groups is fully conserved; white color indicates that one of the following 'weaker' scoring groups is fully conserved.

Figure 3 .
Figure 3. Multiple sequence alignment of metalloproteinase domain (66.63% identity) of SVMPs sequences: black color indicates positions with fully conserved residue; pink color indicates that one of the following high scoring groups is fully conserved; cyan color indicates that one of the following moderate scoring groups is fully conserved; white color indicates that one of the f ollowing 'weaker' scoring groups is fully conserved.

Figure 4 .
Figure 4. Multiple sequence alignment of desintegrin domain (78.63% identity) of SVMPs sequences: black color indicates positions with fully conserved residue; pink color indicates that one of the following high scoring groups is fully conserved; cyan color indicates that one of the following moderate scoring groups is fully conserved; white color indicates that one of the following 'weaker' scoring groups is fully conserved.

Figure 5 .
Figure 5. Multiple sequence alignment of cyestine-rich domain (75.70% identity) of SVMPs sequences: black color indicates positions with fully conserved residue; orange color indicates that one of the following high scoring groups is fully conserved; blue color indicates that one of the following moderate scoring groups is fully conserved; white color indicates that one of the following 'weaker' scoring groups is fully conserved.

Figure 8
Figure 7. (A) Phylogenetic tree construction using by DNAMAN software of propeptide of 12 SVMPS sequences from different venomous snake species using the bootstrap approach by taking 1,000 replicates and the bootstrap analysis indicates strong support (B) Homology construction of signal peptide using by DNAMAN software of 12 SVMPS sequences from different venomous snake species.

Figure 10
Figure 9. (A) Phylogenetic tree construction using by DNAMAN software of desingtegrin domain of 12 SVMPS sequences from different venomous snake species using the bootstrap approach by taking 1,000 replicates and the bootstrap analysis indicates strong support (B) Homology construction of signal peptide using by DNAMAN software of 12 SVMPS sequences from different venomous snake species.

Table 1 .
The name of twelve SVMPs sequences of different venomous snake species with number of accession and

Table 2 .
Amino acid compositions of twelve SVMPs sequences of different venomous snake species (in %).

Table 3 .
Physico-chemical parameters of twelve SVMPs sequences of different venomous snake species.

Table 4 .
Secondary structural features of twelve SVMPs sequences of different venomous snake species (in %).