ABSTRACT
Providence virus is the only member of the family Carmotetraviridae and carries a positive single stranded RNA genome that encodes three open reading frames. The smallest open reading frame encodes the structural proteins. The largest open reading frame encodes a large putative protein, p130. The second overlapping open reading frame encodes two non-structural proteins; p40, a proposed accessory protein and p104, the replicase, containing the RdRp domain. Till date, p130 and p40 are not associated with any related open reading frames in the databases. The purpose of this study is to identify sequences within these non-structural proteins with potential roles in replication and evolution using computational tools. Our results revealed that p130 has a putative arginine-rich sequence which lies in the disordered region also found in the Umbravirus, Groundnut rosette virus p27. Analysis of the p40 revealed a sequence with a coiled-coil conformation and surface-exposed characteristics comparable to the interaction domain of Tombusvirus, Tomato bushy stunt virus p33 accessory protein. The hypothetical two transmembrane helix topology of PrV p104 oriented the putative localization signal at the N-terminus, the same way the localization signal of Tomato bushy stunt virus p92 is oriented. This study concluded that Providence virus non-structural proteins are structurally related to Tombusvirus and Umbravirus accessory proteins and contain sequences with predicted functions in replication. Findings from this study have led us to propose a co-evolutionary event between an insect and plant virus resulting in a hybrid virus with the potential to infect and replicate in both host plant and animal systems.
Key words: Providence virus, non-structural proteins, p40, p130, sequence comparison, replication, evolution.
Providence virus (PrV) represents the only member within the family Carmotetraviridae and carries a carmo-(RdRp) motifs conserved among members of Tombusviridae and Umbraviridae (Walter et al., 2010). The positive single stranded RNA (+ssRNA) virus was initially discovered in a persistently infected Helicoverpa zea (H. zea) midgut cell line and is the only tetravirus able to replicate in tissue culture (Pringle et al., 2003; Jiwaji et al., 2016). Tetraviruses are insect viruses classifying into three families: Alphatetraviridae, Permutotetraviridae, and Carmotetraviridae, according to the nucleotide sequence of the viral replicase (Dorrington et al., 2011). Tetraviruses are +ssRNA ,non-enveloped with a characteristic T = 4 capsid symmetry and limited host range with order Lepidoptera and Chiroptera (Moore, 1991; Bawden et al., 1999; Pringle et al., 2003; Kemenesi et al., 2016). The monopartite genome of PrV differs from other tetraviruses in that it encodes three open reading frames (ORF) instead of the typical two ORFs (Walter et al., 2010). The putative viral replicase ORF (p104) and viral capsid precursor ORF (p81) are conserved among all tetraviruses (Walter et al., 2010). The presence of a read through stop type 1 signal, UAGCAACUA, within the replicase results in the production of the accessory protein, p40 and full-length protein, p104 characterized with RdRp motifs, required for the establishment of infection (Walter et al., 2010). The third and largest ORF, p130, overlaps the replicase gene and is unique to PrV. The protein consists of a putative 2A-like processing site (PrV-2A1) whose activity is predicted to produce two translation products of 17 kDa and 113 kDa and is functional in in vitro studies (Walter et al., 2010; Luke et al., 2008).
The translational control system for the expression of PrV replicase resembles that typically observed in tombusviruses. For instance, the expression of the Tomato bushy stunt virus (TBSV) genome results in the replicase (p92) and accessory protein (p33) via the ribosomal readthrough amber termination signal (Scholthof et al., 1995). Within p33 is a short p33:p33/p92 interaction domain important for mediating protein-protein between itself and p92 (Panavas et al., 2005; Rajendran and Nagy, 2004, 2006). The RNA binding sequence, RPRRRP, present in p33 is important for binding genomic RNA (Rajendran and Nagy, 2003). The two transmembrane domains (TMD), TMD1, TMD2 and peroximal targeting signals are responsible for membrane anchorage and localization of p33 and p92 onto the surface of the peroxisomal membrane, the site for assembly of the replication complex and viral RNA synthesis (McCartney et al., 2005).
The Umbraviridae genome comprises four ORFs: The ORF1 (accessory protein), ORF2 (RdRp), ORF3 (RNA chaperon protein) and ORF4 (movement protein) (Ryabov et al., 2011). The Groundnut rosette virus (GRV) ORF3 proteins possess three functions that include RNA chaperone activities, protection of viral RNA against plant defensive RNA silencing systems and mediating long distance movement through the phloem (Taliansky et al., 2003). Unique to umbraviruses is the absence of a capsid gene and therefore lack the ability to produce conventional virus particles in infected plants (Taliansky and Robinson, 2003; Taliansky et al., 2003). Within the host, umbraviruses use ribonucleoproteins, made from complexes of GRV ORF3 protein and genomic RNA, as alternatives to classic capsid proteins to shuttle viral RNA via long distance movement through the phloem and to establish systemic viral infection (Taliansky et al., 2003; Kim et al., 2007).
So far, aspects of tetravirus replication have been limited to studies by subcellular localization with viral RNA and replication proteins. These studies have shown that the replicase of Helicoverpa armigera stunt virus, Alphatetraviridae, associates with membranes derived from endosomes while PrV replicase associates with membrane vesicles from the Golgi apparatus and secretory pathway (Walter, 2008; Short et al., 2010, 2013).
As a first step towards understanding the replication biology of PrV, this study seeks to identify sequences within PrV non-structural proteins with potential functions in replication using computational tools. Till date, no identifiable ORFs or known peptide homologues have been reported in p130 and p40 and their evolutionary origins remain unknown. The aim of this study is to begin to functionally characterize PrV non-structural proteins using computational tools with regards to their potential roles in replication and shed light on their evolutionary origins.
Prediction of potentially functional sequences in PrV p130
The BLAST search engine at https://www.ncbi.nlm.nih.gov was employed to search for protein homologues of PrV p130. The putative RNA binding region was identified by visual inspection of a sequence that resembles the amino acid residues of the RNA binding arginine-rich motif of GRV ORF3 (Taliansky et al., 1996). The p130 sequence was submitted to IUPRED program (http://iupred.enzim.hu/) for identification of putative disordered or unstructured sequences. The long disorder prediction parameter was selected for this analysis. The putative loop structures were identified by Predict Protein server site at http://www.predictprotein.org/.
Sequence comparison of the putative interaction sequence in PrV p40
The search for conserved domains in PrV p40 was performed by BLAST search engine at https://www.ncbi.nlm.nih.gov. The Protscale program hosted by the Expert Protein Analysis System (ExPASy) (http://web.expasy.org/protscale) was used to identify putative surface-exposed and hydrophilic sequences in PrV p40. The Kyte and Doolittle and Hopp and Woods scales were selected along with a window size of 5 for both methods. Potential coiled-coil sequence in p40 was identified by the Predict Protein server at http://www.predictprotein.org/. The sequence was submitted to the server site and analyzed using amino acid window sizes 14, 21, 28.
Potential RNA binding sequence in PrV p104
The PrV arginine-rich sequence, RRRRYA, at amino acids position 476 to 481, shares amino acids R, Y and A with Tombusvirus p33 arginine/proline rich motif, was analyzed for its surface-exposed and hydrophilic properties using the Protscale program (http://web.expasy.org/protscale) using a window size of 5. The scales Kyte and Doolittle and Hopp and Woods were selected for this function.
Prediction of transmembrane helix and topology of PrV p104
Putative transmembrane helix was generated using TMpred (http://www.ch.embnet.org/software/TMPRED_form.html) and Protscale Kyte and Doolittle scale at http://web.expasy.org/protscale, based on window size 19. This window size type displays hydrophobic membrane-spanning sequences with strong signals above the threshold value of 1.5. The topology of PrV p104 was derived at by comparison analysis with the membrane topology of TBSV p92 (Scholthof et al., 1995), given that both viral proteins share evolutionary and functional similarities (Walter et al., 2010).
Putative subcellular localization signals of PrV non-structural proteins
PrV p104 and p130 sequences were queried using the Signal IP 4.0 program at http://www.cbs.dtu.dk/services/SignalP/, in order to predict putative signal sequences with subcellular targeting functions.
Identification of the putative RNA binding sequence and disordered region in PrV p130
The BLAST search engine did not identify any protein homologues or conserved domains in PrV p130 amino acid sequence. The evolutionary relatedness between PrV, tombusviruses and umbraviruses (Walter et al., 2010) led to the search for amino acid sequences shared between PrV p130 and the non-structural proteins of tombusviruses and umbraviruses. An arginine-rich sequence, RRRRRPRDNLR, was found at amino acids 1047 to 1057 of PrV p130 (Figure 1). This sequence shares arginine residues with the arginine-rich sequence, RPRRRAGRSGGMDPR, at amino acid positions 108 – 122, of Umbravirus GRV ORF3 that encodes p27 (GRV p27). This protein is an RNA binding protein (Taliansky et al., 2003).
The putative disordered or unstructured region within PrV p130 was detected by the IUPRED programme. The returned graphical output displayed a region that spans amino acids 517 - 1220 with scores above the threshold value of 0.5 that is considered statistically significant for being a putative disordered region (Figure 2A). The arginine-rich sequence translated in the PrV p130 ORF lies within the disordered region (Figure 2A), a feature that might have an important biological function. For comparison, the GRV p27, an RNA binding protein, was analysed by the IUPRED programme and the results displayed almost the entire protein sequence with a score above the threshold value 0.5 (Figure 2B). The arginine-rich sequence of GRV p27 (Kim et al., 2007) resides within the predicted disordered region (Figure 2B). In both proteins, the arginine-rich sequences were located within the disordered region.

The predicted secondary structure type generated by Predict Protein program of the amino acids 517 - 1220 of the disordered region of p130 consisted mainly loop structures (Supplementary Figure S1). Similarly, the Predict Protein-generated results for GRV p27 showed that the protein sequence was entirely covered with loop structures (Supplementary Figure S2)
Prediction of a protein-protein interaction sequence in PrV p40
Since the BLAST search engine did not indicate any functional domains in p40 amino acid sequence, the evolutionary relatedness between PrV and tombusvirus (Walter et al., 2010) led to search for similar sequences in p40 and non-structural proteins of tombusviruses. Sequence comparisons were carried out to identify sequences in p40 with similar characteristics to tombusvirus, TBSV, interaction domain (ID). The TBSV p33 ID (Panavas et al., 2005; Rajendran and Nagy, 2004; 2006) was examined for its surface-exposed characteristics using Kyte and Doolittle scale and the results were used to identify a similar sequence in PrV p40 with comparable characteristics. The TBSV p33 ID located at the C-terminal amino acids, 241-282, displayed peaks with the highest signal showing a negative score of -3.0 of being surface-exposed (Figure 3A). Examination of PrV p40 using a similar scale displayed peaks with the highest signal showing a negative score of -2.5 of being surface-exposed in the region that spans amino acids 292-332 located at the C-terminal region of p40 (Figure 3B).

The Predict Protein program consolidates a range of methods and databases for predicting protein structural features. The NCOILS method detects sequences with the potential to adopt coiled-coil conformations. The p40 sequence was submitted to the server site and generated text results showing amino acids, 317 to 330, with a 30% probability of adopting a coiled-coil conformation. The putative sequence with a coiled-coil conformation was detected at the window size of 14 and not 21 or 28 (Supplementary Figure S3). The sequence was identified at the C-terminal region of p40 and coincides with the surface-exposed region predicted by ProtScale program.
Potentially functional RNA binding sequence in PrV p104
Within PrV p104 and downstream of the read through stop codon is an arginine-rich sequence, RRRRYA, at amino acids 476 to 481. A related sequence with the sequence, RPRRRPYA, is an RNA binding motif (RBR 1) within TBSV p33 and constitutes surface-exposed and hydrophilic properties (Rajendran and Nagy, 2003). A ProtScale analysis of the PrV p104 arginine-rich motif returned a strong negative peak with a score of -4.2 of being potentially surface-exposed and a score of 2.45 of being potentially hydrophilic using the Kyte and Doolittle (Figure 4A) and Hopp and Woods scales (Figure 4B).
Putative transmembrane helix in PrV p104
Two programs, TMpred and Kyte and Doolittle were used to analyze putative transmembrane (TM) helix in PrV p104. The returned output by TMpred displayed four distinct peaks with scores above 500 and is considered statistically significant for TM helix prediction. The peaks were displayed at amino acid positions 52 - 72 (TM1), 225 - 247 (TM2), 437 - 456 (TM3) and 647 - 661 (TM4) with scores of 1101, 1949, 1213 and 1638 respectively (Figure 5A). To confirm TMpred interpretations, the Kyte and Doolittle scale based on the window size of 19 revealed three strong peaks that were well conserved and occupied the same position as those revealed by TMpred. These were 225 - 247 (TM2), 433 - 456 (TM3) and 647 - 661 (TM4) with scores 1.989, 1.642 and 1.779 respectively (Figure 5B). The putative TM1 was not detected by Kyte and Doolittle (Figure 5B) and this could possibly be a result of the differences between algorithms used. Overall, the predicted transmembrane helix in p104 would potentially traverse the lipid membrane four or three times.
The Predict Protein program consolidates a range of methods and databases for predicting protein structural features. The NCOILS method detects sequences with the potential to adopt coiled-coil conformations. The p40 sequence was submitted to the server site and generated text results showing amino acids, 317 to 330, with a 30% probability of adopting a coiled-coil conformation. The putative sequence with a coiled-coil conformation was detected at the window size of 14 and not 21 or 28 (Supplementary Figure S3). The sequence was identified at the C-terminal region of p40 and coincides with the surface-exposed region predicted by ProtScale program.
Potentially functional RNA binding sequence in PrV p104
Within PrV p104 and downstream of the read through stop codon is an arginine-rich sequence, RRRRYA, at amino acids 476 to 481. A related sequence with the sequence, RPRRRPYA, is an RNA binding motif (RBR 1) within TBSV p33 and constitutes surface-exposed and hydrophilic properties (Rajendran and Nagy, 2003). A ProtScale analysis of the PrV p104 arginine-rich motif returned a strong negative peak with a score of -4.2 of being potentially surface-exposed and a score of 2.45 of being potentially hydrophilic using the Kyte and Doolittle (Figure 4A) and Hopp and Woods scales (Figure 4B).
Putative transmembrane helix in PrV p104
Two programs, TMpred and Kyte and Doolittle were used to analyze putative transmembrane (TM) helix in PrV p104. The returned output by TMpred displayed four distinct peaks with scores above 500 and is considered statistically significant for TM helix prediction. The peaks were displayed at amino acid positions 52 - 72 (TM1), 225 - 247 (TM2), 437 - 456 (TM3) and 647 - 661 (TM4) with scores of 1101, 1949, 1213 and 1638 respectively (Figure 5A). To confirm TMpred interpretations, the Kyte and Doolittle scale based on the window size of 19 revealed three strong peaks that were well conserved and occupied the same position as those revealed by TMpred. These were 225 - 247 (TM2), 433 - 456 (TM3) and 647 - 661 (TM4) with scores 1.989, 1.642 and 1.779 respectively (Figure 5B). The putative TM1 was not detected by Kyte and Doolittle (Figure 5B) and this could possibly be a result of the differences between algorithms used. Overall, the predicted transmembrane helix in p104 would potentially traverse the lipid membrane four or three times.


The author has not declared any conflict of interests.
REFERENCES
Bawden AL, Gordon KHJ, Hanzlik TN (1999). The specificity of Helicoverpaarmigera stunt virus infectivity. Journal of Inverterbrate Pathology 74(2):156-163.
Crossref
|
|
Dorrington RA, Gorbalenya AE, Gordon KHJ, Lauber C, Ward VK (2011). Tetraviridae. In Virus Taxonomy: Classification and Nomenclature of Viruses: Ninth Report of the International Committee on Taxonomy of Viruses, Edited by King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ. San Diego: Elsevier Academic Press pp. 1091-1102.
|
|
|
Herschlag D (1995). RNA chaperones and the RNA folding problem. Journal Biological Chemistry 270(36):20871-20874.
Crossref
|
|
|
Jiwaji M, Short JR, Dorrington RA (2016). Expanding the host range of small insect RNA viruses: Providence virus (Carmotetraviridae) infects and replicates in a human tissue culture cell line. Journal General Virology 97(10):2763-2768.
Crossref
|
|
|
Kim SH, Ryabov EV, Kalinina NO, Rakitina DV, Gillespie T, MacFarlane S, Haupt S, Brown JWS, Taliansky M (2007). Cajal bodies and the nucleolus are required for systemic infection of a plant virus. EMBO Journal 26(8):2169-2179.
Crossref
|
|
|
Kemenesi G, Földes F, Zana B, Kurucz, K., Estók, P, Boldogh S, Görföl T, Bányai K, Oldal M, Jakab F (2016). Genetic Characterization of Providence Virus Isolated from Bat Guano in Hungary. Genome Announcements 4(3):e00403-16.
Crossref
|
|
|
Luke GA, de Felipe P, Lukashev A, Kallioinen SE, Bruno EA, Ryan MD (2008). Occurrence, function and evolutionary origins of '2A-like' sequences in virus genomes. Journal General Virology 89(4):1036-1042.
Crossref
|
|
|
Lupas AN, Gruber M (2005). The structure of alpha-helical coiled coils. Advances in Protein Chemistry 70:37-78.
Crossref
|
|
|
McCartney AW, Greenwood JS, Fabian MR, White KA, Mullen RT (2005). Localization of the tomato bushy stunt virus replication protein p33 reveals a peroxisome-to-endoplasmic reticulum sorting pathway. Plant Cell 17:3513-3531.
Crossref
|
|
|
Moore NF (1991). The Nudarelia β family of insect viruses (1991). Viruses of Invertebrates, 277-285. Edited by Kursak E, Mecel D, New York.
|
|
|
Panavas T, Hawkins CM, Panaviene Z, Nagy PD (2005). The role of the p33:p33/p92 interaction domain in RNA replication and intracellular localization of p33 and p92 proteins of cucumber necrosis tombusvirus. Virology 338(1):81-95.
Crossref
|
|
|
Pringle FM, Johnson KN, Goodman CL, McIntosh AH, Ball LA (2003). Providence virus: a new member of the Tetraviridae that infects cultured insect cells. Virology 306(2):359-370.
Crossref
|
|
|
Rajendran KS, Nagy PD (2004). Interaction between the replicase proteins of tomato bushy stunt virus in vitro and in vivo. Virology 326(2):250-261.
Crossref
|
|
|
Rajendran KS, Nagy PD (2006). Kinetics and functional studies on interaction between the replicase proteins of tomato bushy stunt virus: requirement of p33: p92 interaction for replicase assembly. Virology 345(1):270-279.
Crossref
|
|
|
Rajendran KS, Nagy PD (2003). Characterization of the RNA binding domains in the replicase proteins of tomato bushy stunt virus. Journal Virology 77(17):9244-9258.
Crossref
|
|
|
Ryabov EV, Taliansk ME, Robinson DJ, Waterhouse PM, Murant AF, de Zoeten GA, Falk BW, Vetten HJ, Mark J Gibbs (2011). Umbraviridae. In: Virus Taxonomy: Ninth Report of the International Committee on Taxonomy of Viruses, pp. 1191-1195. Edited by King AMQ, Adam MJ, Carstens EB, Lefkowitz EJ. San Diego: Elsevier Academic Press.
|
|
|
Scholthof K-BG, Scholthof HB, Jackson AO (1995). The tomato bushy stunt virus replicase proteins are coordinately expressed and membrane associated. Virology 208(1):365-369.
Crossref
|
|
|
Short JR, Dorrington RA (2012). Membrane targeting of an alpha-like tetravirusreplicase is directed by a region within the RNA-dependent RNA polymerase domain. Journal General Virology 93:1076-1716.
Crossref
|
|
|
Short JR, Knox C, Dorrington RA (2010). Subcellular localization and live-cell imaging of the Helicoverpaarmigerastunt virus replicase in mammalian and Spodoptera frugiperda cells. Journal General Virology 91:1514-1523.
Crossref
|
|
|
Short JR, Nakayinga R, Hughes GE, Walter CT, Dorrington RA (2013). Providence virus (family: Carmotetraviridae) replicates vRNA in association with the Golgi apparatus and secretory vesicles. Journal General Virology 94:1073-1078.
Crossref
|
|
|
Taliansky ME, Robinson DJ (2003). Molecular biology of umbraviruses phantom warriors. Journal General Virology 84:1951-1960.
Crossref
|
|
|
Taliansky M, Roberts IM, Kalinina N, Ryabov EV, Raj SK, Robinson DJ, Oparka KJ (2003). An umbraviral protein, involved in long distance RNA movement, binds viral RNA and forms unique, protective ribonucleo protein complexes. Journal Virology 77(5):3031-3040.
Crossref
|
|
|
Taliansky ME, Robinson DJ, Murant AF (1996). Complete nucleotide sequence and organisation of the RNA genome of groundnut rosette umbravirus. Journal General Virology 77(9):2335-2345.
https://doi.org/10.1099/0022-1317-77-9-2335
|
|
|
Tompa P, Csermely P (2004). The role of structural disorder in the function of RNA and protein chaperons. FASEB Journal 18(11):1169-1175.
https://doi.org/10.1096/fj.04-1584rev
|
|
|
Walter CT, Pringle FM, Nakayinga R, de Felipe P, Ryan MD, Ball LA, Dorrington RA (2010). Genome organization and translation products of Providence virus: insight into a unique tetravirus. Journal General Virology 91(11):2826-2835.
https://doi.org/10.1099/vir.0.023796-0
|
|
|
Walter CT (2008). Establishment of experimental systems for studying the replication biology of providence virus. Rhodes University. PhD thesis.
|
|
|
Wang Y, Zhang X, Zhang H, Lu Y, Huang H, Dong X, Chen J, Dong J, Yang X, Hang H, Jiang T (2012). Coiled-coil networking shapes cell molecular machinery. Molecular Biology of the Cell 23(19):3911-3922.
https://doi.org/10.1091/mbc.e12-05-0396
|
|