Molecular cloning and characterization of a new cDNA encoding a trypsin-like serine protease from the venom gland of Scolopendra subspinipes mutilans

untranslated region. The precursor nucleotide sequence of Ssmase was deduced to encode a prepropeptide of 19 residues and a mature protein of 240 residues. The 19 amino acid residues prepro-peptide of Ssmase putatively composed of 14 amino acids of pre-peptide and 5 amino acids of propeptide (QGSSA). The mature protein of Ssmase contained the typical domain of a trypsin-like serine protease, where His61, Asp108 and Ser208 were the principal residues of the catalytic center. The cysteine residues at 46 to 62, 141 to 214, 180 to 195 and 204 to 233 possibly formed four pairs of disulfide bridges. Ssmase was found to have five N-glycosylation sites (N-Xaa-T/S). To the best of our knowledge, Ssmase was a trypsin-like serine protease firstly characterized from centipede venoms. Ssmase represented a new family of trypsin-like serine protease with four disulfide bridge motif.


INTRODUCTION
Centipedes are one major group of venomous arthropods, which nearly occur all over the world.They prey on many other species of arthropods, earthworms, snails, and other small animals, mainly killing them with their toxicognanths (Antoniazzi et al., 2009;Undheim and King, 2011).Envenomations by centipedes are characterized by an instant, local burning pain that ranges in intensity from excruciating to mild, and in some cases radiates or spreads to other parts of the victim's body (Bush et al., 2001;Acosta and Cazorla, 2004).Several centipede species are capable of inflicting severe symptoms in humans, including myocardial ischemia and infarction, hemoglobinuria and hematuria, hemorrhage, and rhabdomyolysis (Gomes et al., 1982;Acosta and Cazorla, 2004;Medeiros et al., 2008).The venom of the centipede is composed of many active ingredients including enzymes (acid/alkaline phosphatases, esterases, hyaluronidases, etc), non-enzymatic proteins (cardiotoxins, myotoxins, etc), other non-peptidic active components (histamine and serotonin) and several neurotoxins, which has been reported to have many biochemical and physiological effects (Mohamed et al., 1983;Stankiewicz et al., 1999;Rates et al., 2007;Undheim and King, 2011;Liu et al., 2012;Yang et al., 2012).Among these enzymes, both Otostigmus pradoi and Scolopendra viridicornis venoms showed weak *Corresponding author.E-mail: miaolixia@whu.edu.cn.Tel: 86-27-68759795.Fax: 86-27-68759795.
fibrinogenolytic activity (Malta et al., 2008), while the venom of Scolopendra subspinipes has anticoagulant activity.A phospholipase A 2 "Scol/Pla" was purified and cloned from the centipede Scolopendra viridis (Gonzalez-Morales et al., 2009).Although several proteases have been recorded from centipede venom, the venom of centipedes is poorly characterized, compared with those of other venomous animals (Rates et al., 2007;Qiu, 2012).
Serine proteases, found in many organisms, are of broad interest because they have diverse physiological functions, affecting processes such as digestion, immune response, complement activation, cellular differentiation, and hemostasis (Qiu, 2012).Trypsin-like serine protease (Tryptase) is a member of serine proteases and plays an important role in some physiological processes including wound healing, inflammatory reaction, blood clotting, regeneration, etc (Jin et al., 2002;D´ora Dienes a and Gunnar Lid´en d, 2007;Yuan et al., 2012).Snake venom is very rich in various types of enzymes such as metalloproteinases, serine proteinases, phospholipases, and hyaluronidases (Jin et al., 2002;Cidade et al., 2006;Rojnuckarin et al., 2006).Serine proteases from snake venom were well-characterized (Serrano and Maroun, 2005).A serine protease gene was cloned from the centipede body tissue (You et al., 2004), but there is few report on serine proteases derived from centipede venom so far.In this study, a new cDNA encoding a serine protease was cloned and characterized from the venom gland of the centipede S. subspinipes mutilans.Sequence analysis showed that the encoded protease had a conserved C-terminal domain belonging to the trypsin-like serine protease superfamily.Ssmase shared the principal residues of the catalytic center with most serine proteases (His61, Asp108 and Ser208).However, Ssmase, as the first serine protease from centipede venoms, displays unique features, e.g.no significant similarities to other serine proteases.

Construction of cDNA library
S. subspinipes mutilans was collected from Xiangfan, Hubei province in China.The venomous glands connected to the first pair forceps of centipedes were stimulated using a 3 V alterative current.After 48 h, the venomous glands were dissected and grinded into liquid nitrogen.RNA was extracted using Trizol Reagent (Gibco) and mRNA was purified by PolyA Tract mRNA kit according to the manual.About 10 µg of mRNA was used for the synthesis of the double strand cDNAs.The cDNA Library was constructed by SMART TM cDNA Library Construction Kit according to the protocols.The ligated products were transformed into Escherichia coli DH5a strain.A cDNA library was constructed from the venomous gland of the centipede S. subspinipes mutilans.

Screening of cDNA library
Polymerase chain reaction (PCR) method was used to screen the positive clones from the venomous gland cDNA library.The clones of the venomous gland cDNA library were amplified and used as the templates for PCR.The forward and reverse primers used for PCR were 5'-TGT AAA ACG ACG GCC AGT-3' and 5'-CAG GAA ACA GCT ATG ACC-3', respectively.The positive clones were randomly selected for DNA sequencing.

Sequence analysis
The open reading frame (ORF) for the precursor of Ssmase was analyzed using an ORF finder tool from (http://www.ncbi.nlm.nih.gov/gorf/gorf.html).The deduced amino acid sequences of the Ssmase were received using National Center for Biotechnology Information (NCBI) online search tool (BLAST) (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi).The signal peptide was predicted at http://www.cbs.dtu.dk/services/SignalP/.The protein structure and function domain were analyzed using online analysis software ExPASy (http://www.expasy.org/prosite/)and SMART (http://smart.embl-heidelberg.de/),respectively.The theoretical isoelectric point (pI) and molecular weight (Mw) of the protein were predicted using ExPASy Proteomics Server (http://web.expasy.org/compute_pi/).Multiple sequence alignment analysis of proteases from different organisms was performed by ClustalX1.83program.The gaps were added and adjusted by manuals to acquire the maximum homology.

RESULTS AND DISCUSSION
Using cDNAs obtained from mRNA of S. subspinipes mutilans venomous glands, a cDNA library was specific constructed for its venomous gland.Over two hundred clones from the venomous gland cDNA library were amplified and prepared, and the plasmids were used as the templates for PCR screening.The primers used for PCR were universal in the vector of the cDNA library.Thus, the clones from the venomous gland cDNA library facilely characterized their size of inserted fragments by PCR method.
Sixty clones with different sizes from the constructed centipede venom gland cDNA library were randomly selected to sequence and analyze (data not shown).A 1029 bp full-length cDNA sequence was characterized as shown in Figure 1.The ORF of this cDNA was analyzed using an online ORF finder tool.The cDNA sequence contains a 780 bp ORF, a 105 bp 5 ' untranslated region, and a 144 bp 3 ' untranslated region.The translation initiation site was assigned to the methionine codon at nucleotides 106 to 108, and the termination codon (TGA) was found at nucleotides 883 to 885.The "AT" content of ORF is 64%, significantly lower than those of 5' untranslated region (77.2%) and 3' untranslated region (81.6%), which suggested that a specific secondary structure was formed for its transcription.Although NCBI Basic Local Alignment Search Tool (BLAST) tool showed that the cDNA molecule had no high homology with known genes, the results of theoretical protein revealed that it had sequence identities with trypsins, tryptases, and peptides belonging to the trypsin family of serine protease (so named as Ssmase).The cDNA of Ssmase was theoretically deduced to encode a precursor protein The pre-peptide residues were underlined, while the pro-peptide residues were shaded with green.Boxed amino acid residues were the conserved catalytic triad in serine protease.The cysteine residues were highlighted in red color.
with 259 amino acid residues, a calculated molecular mass of 28722.74Da and a predicted isoelectric point of 4.77.Venom is a key element in the predatory behaviors of centipedes, because centipede venom is a complex mixture of various molecules.It has been proposed that centipede venom also contains digestive enzymes used to soften-up the flesh of the prey which is subsequently sucked-up (Undheim and King, 2011).Numerous studies reported on the venom components and their functions from snake, scorpion, spider, etc.
But only a few reports focused on the venom of centipedes, which was a neglected group of venomous animals (Rates et al., 2007;Undheim and King, 2011).Although some active peptides from centipede venom were identified, most of the results were obtained using milked venom and gel electrophoresis analysis (Peng et al., 2010;Malta et al., 2008;Antoniazzi et al., 2009).Many proteins or peptides with molecular masses ranging from 1.3 to 22.6 kDa were found by 2D chromatographic analysis of S. viridicornis nigra and Scolopendra angulata venoms.N-terminal sequencing of 13 and 11 of these protein molecules from S. viridicornis nigra and S. angulata, respectively, yielded a total of 10 protein families (Rates et al., 2007).However, few full-length sequences of these venom proteins were clearly identified.The reason may be that the amount of the milked venom is not enough to sequence and analyze.So, it is an important and easy path that the venomous gland cDNA library was constructed and used to screen the venom protein/peptide genes for acquiring the complete sequences (Peng et al., 2010;Rates et al., 2007).Especially, it would be most relevant to survey comprehensively the active proteins/peptides in the venom through the cDNA cloning method.So S. subspinipes dehaani venoms were systematically investigated by transcriptomic and proteomic analysis coupled with biological function assays.The purified proteins/peptides showed different pharmacological properties, including platelet aggregating, anticoagulant, phospholipase A(2) and trypsin inhibiting activities (Liu et al., 2012).
Serine proteases are common constituents of venom proteomes and venom gland trancriptomes of viperid species (Francischetti et al., 2004;Kashima et al., 2004;Cidade et al., 2006;Jin et al., 2007;Vilca-Quispe et al., 2010).Abundant serine proteases have been isolated and characterized from snake venoms (Serrano and Maroun, 2005).But there is no report that serine proteases were isolated and cloned from the venom gland of S. subspinipes mutilans.In the present paper, we obtained the complete cDNA sequence encoding a new tryptase, Ssmase, which was cloned and characterized from the venom gland of S. subspinipes mutilans for the first time.
The homology of Ssmase was searched through the non-redundant protein sequences (nr) and Swiss-Prot databases.The results showed that 40% significant similarities were found between Ssmase and Scolonase, which was purified and characterized from the tissue of the Korean centipede, S. subspinipes mutilans (You et al., 2004).The ORF of Ssmase cDNA was found to be composed of 259 amino acid residues, including a signal peptide sequence of 19 amino acid residues and a mature protein of 240 amino acid residues, using the online SignalP 4.0 server.The data indicated that Ssmase should be a typical secretory protein.Residues 1 to 19 represent a signal peptide and residues 20 to 259 possess the typical domain of a tryptase.Based on the assignment proposed for batroxobin (Itoh et al., 1987), the 19 amino acid residues prepro-peptide of Ssmase putatively composed of 14 amino acids of pre-peptide and 5 amino acids of propeptide.Moreover, two putative cleavage sites were found in the prepro-peptide region of Ssmase, as shown in Figure 2. The five-residue propeptide between the signal peptide and the mature enzyme is a putative activation peptide.Among the serine proteases from viperid and crotalid snake venoms, they share a highly conserved activation peptide sequence: QR/KSSDR (Itoh et al., 1987;Serrano and Maroun, 2005).Three of these residues (15Q, 17S and 18S) in Ssmase are conserved, compared with other snake venom serine proteases (SVSPs), which these three residues may play an important role in executing the function of activation peptide.
No apparent match to any of the deposited nucleotide sequences was found in the current GenBank/EMBL databases, indicating that the cDNA clone encodes a new serine protease distinct from the known enzymes.Serine proteases are among the best-characterized components of living organisms.All tryptases share a substrate preference for a basic P1 residue, lysine (Lys) or arginine (Arg).This is mainly caused by the presence of a negatively charged Asp189 at the bottom of the S1 pocket (Serrano and Maroun, 2005).SVSPs possess an identical trypsin containing the conserved catalytic triad, His57, Asp102 and Ser195 (Vitorino-Cardoso et al., 2006).Multiple alignment analysis of Ssmase and serine proteases revealed that Ssmase had low amino acid sequence homology with typical tryptases.However, the amino acid residues in their functional domains were almost identical and remarkably conserved in the carboxy-terminal regions.As was shown in Figures 1 and  3, Ssmase also contains the conserved His61, Asp108 and Ser208, which are the principal residues of the catalytic center.Recently, a number of sequence motifs surrounding serine proteinase evolutionary markers and active site residues for the S1 family of clan SA were identified (Krem and Di Cera, 2001;Vitorino-Cardoso et al., 2006).These motifs were 54TAAHC58 surrounding the catalytic His57, 102DIAL105 at Asp102, and 193GDSGGP198 around Ser195 (Vitorino-Cardoso et al., 2006), which were also highly conserved in the Ssmase as shown in Figure 3. Together, Ssmase is a new member of trypsin-like serine protease from the venom of the centipede S. subspinipes mutilans.
Despite the sharing of similar structural features, venom serine proteases display a highly diverse pharmacological profile, which includes actions on proteins of the coagulation cascade, such as thrombinlike activity on fibrinogen, activation of factor V, activation of protein C, fibrinogenolysis, activation of plasminogen, and induction of platelet aggregation (Serrano and Maroun, 2005).Venom serine proteinases are commonly glycosylated, where the carbohydrate moiety is usually Asn-linked.The extent of N-or O-glycosylation appears to be significant but quite variable, and the functional   of this variation is incompletely understood (Serrano and Maroun, 2005).In the case of Ssmase, five potential N-glycosylation sites, Asn-Xaa-Ser/Thr, were located at amino acid residues 40 to 42, 102 to 104, 160 to 161 and 188 to 190.
Snake venom thrombin-like enzymes are serine endopeptidases, which are structurally constrained by the presence of six highly conserved disulfide bridges, five of which are common to all S1 serine proteinases (Vilca-Quispe et al., 2010).However, eight half-cysteine residues were found for the mature Ssmase, suggesting the presence of four disulfide bonds in this protein (Figure 4).Recently, twenty-six neurotoxin-like peptides were identified from the centipede venoms, S. subspinipes mutilans L. Koch by peptidomics combined with transcriptome analysis.These neurotoxins each contain 2 to 4 intra-molecular disulfide bridges, and in most cases the disulfide framework is different to that found in neurotoxins from the venoms of spiders, scorpions, marine cone snails, sea anemones, and snakes (Yang et al., 2012).
The online ExPASy analysis of Ssmase conserved domain showed that the cysteine residues of Ssmase at 46 to 62, 141 to 214, 180 to 195 and 204 to 233 form four disulfide bonds, which may be very important and indispensable for Ssmase to maintain the dimensional structure and the activity as a tryptase.Thus, Ssmase represents a new family of trypsin-like serine proteinase with four disulfide bond motif from centipede venom, significantly distinct from the known serine proteinase with 5/6 disulfide bonds.
Majority of proteolytic activity in centipede venoms seems to originate from metalloproteases, gelatinolytic activity from non-metalloproteases, and most likely serine proteases, detected in the venoms of O. pradoi and Conus iheringi (Malta et al., 2008;Undheim and King, 2011).Ssmase has low amino acid sequence homology with tryptases, but it shares the amino acid residues of functional domains and the carboxy-terminal regions with serine proteases.
The finding led us to consider that novel serine proteinases remain to be unveiled in the venom of the centipede S. subspinipes mutilans.To gain insights into the structures and activities of the novel enzyme, studies to obtain enough amounts of the protein are in progress using both yeast and E. coli expression systems.

Figure 1 .
Figure1.The deduced amino acid sequence of Ssmase was shown below the nucleotide sequence.5' and 3' untranslated region nucleotide sequences were indicated in capital letters, while ORF nucleotide sequences were indicated in lowercase letters.The initial codon (atg) and end codon (tga) were shaded with purple.The pre-peptide residues were underlined, while the pro-peptide residues were shaded with green.Boxed amino acid residues were the conserved catalytic triad in serine protease.The cysteine residues were highlighted in red color.

Figure 2 .
Figure 2. The predicted cleavage sites of the Ssmase pre-peptide peptide and pro-peptide signals using the online Signal P 4.0 server.The cleavage site of Ssmase between the signal peptide and the pro-peptide was predicted to be 14 to 15.The cleavage site of Ssmase between the pro-peptide and the mature protein was predicted to be 19 to 20.

Figure 4 .
Figure 4.The trypsin domain analysis of the Ssmase using the online software ExPASy.Cysteine residue (C) was indicated with purple color, while active residues (H61, D108 and S208) were blue and highlighted with blue five-pointed star.Four pairs of disulfide bridges of Ssmase were linked by red lines.