Computational sequence analysis and in silico modeling of a stripe rust resistance protein encoded by wheat TaHSC 70 gene

TaHSC70 gene of Triticum sp. is an associate of the heat shock protein family and plays a significant role in stress-related and defense responses educed by contagion with stripe rust fungus through a Jasmonic acid dependent signal transduction pathway. Hence, understanding molecular structure and function of the protein coded by this gene is of paramount importance for plant biologists working on stripe rust. The present study was aimed at sequence and in silico structural analysis of Hsp70 protein coded by this gene, through comparative modeling approach. Validation of the overall folds and structure, errors over localized regions and stereo chemical parameters was carried out using PDBSum server. Structure was a monomer with seven sheets, 1 β-α-βunit, 12 hairpins, 13β-bulges, 29 strands, 21 helices, 16 helix-helix interacs, 44 β-turns and 1 Υ-turn. Two major domains were detected belonging to Hsp70 family while neural network analysis revealed protein to be highly phosphorylated at serine and threonine residues.


INTRODUCTION
Stress impacts plants negatively and hinders proper activity.Stress protective roles in plants are played by Hsp70 family, which are induced in response to potential detrimental simulations (Efeoglu et al., 2009).TaHSC70 demonstrates a decisive role in protecting plant cells against heat stress (Guo et al., 2014).Heat stress is one of the reasons behind pollen sterility, drying of stigmatic fluid/shrivelled seeds in wheat, pseudo-seed setting and empty endosperm pockets.The defence mechanisms of wheat to cope up with these conditions consists of heat responsive miRNAs, signalling molecules, transcription factors and stress associated proteins like heat shock proteins (HSPs), antioxidant enzymes etc (Kumar and Rai, 2014).TaHSC70 gene (70-kDa heat-shock cognate) is a constitutively expressed Hsp70 family member (Duan et al., 2011;Usman et al., 2014) in wheat.Furthurmore, it is involved in protein-protein interactions, assisting the folding of de novo synthesized polypeptides and the import/translocation of precursor proteins (Feng et al., 2013;Wang et al., 2014).Heat shock proteins (HSPs) exist in nearly all living organisms (Feng et al., 2013).The major Hsps vary in molecular weights and are E-mail: zarrin.iiui@gmail.com.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0International License synthesized in eukaryotes belonging to six structurally distinct classes: Hsp100,Hsp90, Hsp70, Hsp60 (or chaperonins), ∼17-30 kDa small Hsps and ~8-5 kDa ubiquitin (Safdar et al., 2012).Hsp70 family chaperones are considered to be the most highly conserved heat shock proteins (Jego et al., 2013).In plants, many Hsp70 proteins have been identified in different species (Daugaard et al., 2007).The Arabidopsis genome contains at least 18 genes encoding members of the Hsp70 family, Rice genome contains 32 (Sarkar et al., 2012), while, around 12 Hsp70 members have been found in the spinach genome (Guy and Li, 1998).The Hsp70 in wheat was reported by Duan et al. (2011) in expression profile analysis of the Arabidopsis and spinach.HSP70 has been observed to be increased in thermotolerant wheat variety so it is anticipated that HSP70 modulates the thermotolerance level of wheat (Triticum aestivum) pollen under heat stress (Kumar and Rai, 2014).This reveals that the over expression of Hsp70 genes correlates positively with the acquisition of thermo tolerance.HSPs are expressed in response to environmental stress conditions such as heat, cold and drought, as well as to chemical and other stresses (Daugaard et al., 2007) and results in enhanced tolerance to salt, water and high-temperature stress in plants (Alvim et al., 2001).However, the cellular mechanisms of Hsp70 function under stress conditions are not fully understood.
3D structure and conserved domain analysis can shed light on the function of a protein.The 3D structure of the wheat heat shock protein has not been modeled previously.Modeling is ground principally on alignment of query protein to the target (known structure or template).Prediction method may entail fold assignment, targettemplate alignment, model building followed by model evaluation (Marti-Renom et al., 2000).Comparative modeling approach has been utilized in this study to predict the three-dimensional structure of a given protein sequence (target) harnessing the bioinformatics tools.Functional analysis has also been attempted using a battery of computational tools and webservers.

MATERIALS AND METHODS
The 690 amino acid protein sequence encoded by the gene TaHSC70 with Accession ACT65562 was retrieved from the NCBI database.

Sequence analysis
Physiochemical properties of the protein were computed by ProtParam tool (http://web.expasy.org/protparam/).The parameters computed by ProtParam included the molecular weight, theoretical pI, instability index, aliphatic index, and grand average of hydropath icity (GRAVY).Subcellular localization of any protein aids understanding protein function.

Structure analysis
Blast (Altschul et al., 1990) search was performed with this query sequence against the Protein Data Bank (Berman et al., 2000).Query and template protein sequence were aligned using BioEdit program.Modeller (Fiser and Sali, 2003) was used to build a protein model using automated approach to comparative protein structure modeling by satisfaction of spatial restraints (Sali and Blundell, 1993;Eswar et al., 2008).The structure was energy minimized by SwissPDB viewer (Guex and Peitsch, 1997) using GROMOS96 force field and rendered in PYMOL (Delano, 2002).PDBSum analysis for secondary structure analysis was followed by PROCHECK (Laskowski et al., 1998) verification of the model by checking stereo chemical quality.Ramachandran plot (Ramachandran et al., 1963;Morris et al., 1992) was generated and the quality of the structure was computed in terms of percentage of residues in favourable regions, percentage of non Proline, glycine residues etc. ERRAT webserver (Colovos and Yeates, 1993) was also used to access quality of structure.

RESULTS AND DISCUSSION
Availability of plethora of quality tools and webservers has enabled computational biologists to perform reliable analysis of protein sequence and structure.The present study was aimed at sequence analysis and homology modeling of the wheat Hsp70 protein to shed light on its function.

Sequence analysis
Swiss protParam tool revealed the protein to be of ~73.5 KDa with theoretical pI value of 5.01.Total number of negatively charged residues (Asp + Glu) were 99 while total number of positively charged residues (Arg + Lys) were 82.The instability index was computed to be 29.0,classifying the protein as stable.Aliphatic index was found to be 86.33 while Grand average of hydropathicity (GRAVY) index was calculated as -0.272 demonstrating amino acid to be of soluble protein.CELLO results showed that the wheat Hsp70 protein is localized in the chloroplast.This is suggestive of the fact that chloroplast is the major site of function for wheat Hsp70 and the protein may be associated with the thermostability of chloroplast membranes.This can be allied to a study conducted by Bhadula and colleagues demonstrating association of 45 kD Hsps with heat stability of chloroplast membranes in a drought and heat resistant maize line (Bhadula et al., 2001).
Two major domains were detected in the sequence HSPA9-Ssq1-like_NBD (residues: 51-427) and PLN03184 (residue: 21-688).HSPA9-Ssq1-like_NBD or nucleotide-binding domain of HSPA9 belongs to the heat shock protein 70 (Hsp70) family of chaperones that contribute to protein folding and assembly and degrada- tion of incompetent proteins.Typically, Hsp70s have a nucleotide-binding domain (NBD) which hosts nucleotide and a substrate-binding domain (SBD) which increases rate of ATP-hydrolysis.NBD site (17 residues), nucleotide exchange factor (NEF) co-chaperone interaction site (19 residues) for regulation of HSP70 and SBD interface (11 residues) existing on the conserved domain HSPA-9-Ssq1-like-NB were detected on the query protein.
Protein phosphorylation is a type of post-translational modification which can turn a protein on and off, thus modifying its function and activity.Phosphorylation generally occurs on serine, threonine, tyrosine and histidine residues in eukaryotic proteins.Artificial neural networks have been extensively used in biological sequence analysis (Wu, 1997;Blom et al., 1999) for phosphorylation analysis.Regions of wheat Hsp70 sequence showed extensive phosphorylation on serine and threonine residues (Table 1) while no phosphorylation capability of tyrosine residues was predicted.This result is in accordance with the study conducted by May and Soll (2000) that chloroplastdestined precursor proteins are phosphorylated on serine or threonine residues.This finding can be further validated  in the Lab and also tested for glycosylation that can further deepen our insight of the post translational modifications associated with wheat Hsp70.

Structure analysis
Homology modeling has gained popularity due to increas-ing accuracy of the predictions using computational tools.
For homology modelling, the suitable template structure selected was based X-ray structure of E-coli HSP70 protein (PDB ID:2KHO) (Bertelsen et al., 2009), having 55% identity with the query sequence and an E value of zero.Sequence was aligned to observe the residue conservation (Figure 1).Then, MODELLER was used to generate 3D structure.Predicted structure was a monomer with molpdf score of 3662.76880,DOPE score value of -60344.10156and a GA341 score of 1.00000.Protein consisted of 7 sheets, 1 beta alpha beta unit, 12 hairpins, 13 beta bulges, 29 strands, 21 helices, 16 helixhelix interacs, 44 beta turns and 1 gamma turn (Figure 2).Total number of bonds were 5216 while number of atoms were 5157.Structure validation of the predicted structures was done by feeding the predicting structure into the ERRAT protein verification server.The overall quality factor obtained was 74.671.The comparative peaks of DOPE scores of both template and model obtained from Modeller output demonstrate that there is no defect in the loop regions in the residues.So in the present case the loop refinement method was not required for the model (Figure 3).The validation of the model was carried out using Ramachandran plot calcula-tions computed with the PROCHECK program.The Φ and Ψ distributions of the Ramachandran plots of non-Glycine, non-Proline residues are summarized in Figure 4. Altogether 99.2% of the residues were in favoured and allowed regions.
The overall G-factor used was computed as -0.1 which is good as compared to the typical value of -0.4.This is an initial attempt in modelling the structure of wheat Hsp70 and understanding its function.It is believed that this work has practical significance as it provides a foundation to not only the structure but also post translational modification of this protein.Post translational modification analysis can be further expanded to obtain new insights into the underpinnings of conformational changes in not only the cellular environment but also the chaperone itself.Structure can be utilized for interaction study with

Conclusion
The wheat heat shock protein is one of the most important protein which provides the natural resistance against the stress due to stripe rust fungus.In the present work, sequence analysis has been conducted to shed light on post translational modification of Hsp70 domains associated with this protein and 3D structure study of the protein.Computational study conducted can serve as a baseline source of information and can be further validated in the lab.
The validated protein model proposed in this study may be used further to dock with possible co-factors or relevant protein interactors to understand the potential mechanism of anti-stress and defense properties of this protein.

Figure 1 .
Figure 1.Aligned sequences of query and target protein visualized in BioEdit.

Table 1 .
Phosphorylation profile of analysed Hsp70 protein using neural network approach.Specific residue positions in the query protein are shown to be phosphorylated based on a significant score.*S* refers to phosphorylation on serine residue and *T* refers to phosphorylation on threonine residue.