In silico characterization of beta-galactosidase using computational tools

β-galactosidase (EC.3.2.1.23) is an important enzyme, mainly used in the preparation of lactose hydrolyzed milk suitable for people with lactose intolerance. It is essential to understand the structural and functional aspects of various β-galactosidase produced from different sources. The present work deals with the use of bioinformatics to describe the physiochemical, functional and structural properties of β-galactosidase enzymes on Bacillus sp. selected from the gene bank of NCBI. The grand average hydropathy (GRAVY) and low range of AFY63015.1 value indicates the possibility of better interaction with water and instability index were computed to characterise YP_004205251.1, ZP_10511829.1, BAL72724.1, AFY63015.1, NP_242888.1 stating that they are stable and disulfide bridges, CYS_REC recognizes the presence of 38 cysteine residues in β-galactosidase sequences and predicted most probable SS bond patterns of pairs in YP_004981461 and 1AFY63015.1. The selfoptimized prediction method (SOPM) was used to predict the secondary structure. The SOPM results indicated the presence of alpha helix is more dominated in sequences AFY63015.1 and YP_004981461.1. Overall this represents in silico analysis of sequence, structural and functional information of β-galactosidase of Bacillus species.


INTRODUCTION
Lactase also known as β-galactosidase (E.C 3.2.1.23)is an enzyme that hydrolyzes lactose (abundant disaccharide found in milk) into glucose and galactose and has a potential importance in the dairy industry (Voget et al., 1994;Domingues et al., 2005;Gouripur et al., 2013).β-Galactosidase has tremendous potential in research and application in various fields like food, pharmaceutical, bioremediation, biosensor, diagnosis and treatment of disorders.It is used in the preparation of lactose hydrolyzed milk suitable for lactose intolerant people.(Kaur et al., 2006;Patil et al., 2011).
Author(s) agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License animals, bacteria, yeasts (intracellular enzyme), fungi and moulds(extracellular enzyme).Among these sources, bacteria are preferred because of their simplicity in fermentation with optimum activity and good stability (Sani et al., 1999).Some strains have been proved to have probiotic activity enhancing the digestion of lactose (Vinderola and Reinheimer, 2003).
The lactose, a disaccharide specifically known as 4-Oβ-D-galactopyranosyl-D-glucose is found exclusively in milk.The nutritional value of lactose is limited due to the fact that a large portion, approximately 50% of the world's inhabitants lacks this enzyme and cannot utilize lactose, therefore, developing lactose maldigestion or intolerance (Vasiljevic and Jelen, 2002).This, however, creates a potential market for the application of β-galactosidase.The current share of food enzymes is 37% of total enzyme sales corresponding to 720 million dollars in the year 2004.This value has increased to 863 million dollars by the year 2009, increasing the demand for the discovery of new species, producing enzymes such as βgalactosidase with novel characteristics, which will be of great value to the enzyme industry for different applications (Cortes et al., 2005).β-galactosidase differ in their physicochemical properties, structures, specific activities, thermostability and yields, thus providing a great deal of choices in their potential usages.Therefore, the present investigation deals with the in silico analysis and characterization of β-galactosidase from Bacillus species.

Sequence retrieval
The β-galactosidase sequences were obtained from NCBI and retrieved from Swiss-Prot, a public domain protein database (Boeckmann et al., 2003).A total number of seven sequences were retrieved from Swiss-Prot by random selection.Protein sequences of β-galactosidase from Bacillus sp. were retrieved in FASTA format and used for further analysis.The phylogenetic tree was constructed using the aligned sequences by the neighborjoining (NJ) method using kimura 2-parameter distances in the MEGA beta 5.1 software (Tamura et al., 2011).The method of analyses and statistics used for phylogeny test was by bootstrap, with its bootstrap replication number 1000 as shown in the Figure 1.

Secondary structure prediction
Self-optimized prediction method (SOPM) was employed for calculating the secondary structural features of the selected target protein sequences considered for this study (Table 4).The identification of transmembrane regions of a protein was identified by server SOSUI (Classification and Secondary Structure Prediction of Membrane Proteins).It represents the transmembrane regions identified for β-galactosidase proteins (Hirokawa et al., 1998).

Functional characterization
The predicted transmembrane helices were visualized and analyzed using the self-optimized prediction method (SOPM) (Geourjon and Deleage, 1995).The SOPM is used to improve the success rate in the prediction of the secondary structure of proteins.SOPM parameters such as W indow width-17, Similarity threshold-7, Number of states-4 were computed, and the computational methods were applied for determining disulphide bonds.Disulphide bonds are important in determining the functional linkages, so, S-S bonds were analyzed using the primary protein sequence data with the help of CYS_REC.Motifs in the considered sequences were scanned using Motif Search.SOSUI server was used to predict the transmembrane tendency of the proteins.Kyte and Doolittle mean hydrophobicity score was calculated, and the plot was obtained using Kyte and Doolittle method, keeping a window size of 9 (Hirokawa et al., 1998).

RESULTS AND DISCUSSION
In silico analysis of β-galactosidase sequences from Bacillus sp.BLAST searches of the NCBI database using β-galactosidase sequences as queries showed that 7 other β-galactosidases share 75-100% identity with Phylogenetic tree β-galactosidases for Bacillus sequence.Distance options according to the kimura 2parameter model and clustering with the neighbor-joining were performed using the software package mega (Molecular Evolutionary Genetics Analyses) ver.5.1.Bootstrap percentage (50%) based on 1000 replications are given at branch points (Bar 0.02) (Figure 1) (Tamura et al., 2011).The sequence differences among βgalactosidases suggest differences in their enzymatic properties and biological functions.The present results (Table 1) showed that the 7 βgalactosidase sequences of Bacillus sp. were retrieved from SWISS-PROT (Bairoch and Apweiler, 2000).The primary structure was analysed, and different parameters computed using ExPasy ProtParam tool was tabulated (Tables 2 and 3).The results suggest that βgalactosidase sequences are mostly hydrophobic, and their hydrophobic nature is due to the presence of high content of non-polar residues (Sivakumar et al., 2007).The average molecular weight of β-galactosidase is 79709.0Dalton (Table 3).The isoelectric point (pI) is the pH at which the charge covers the surface of the protein, but the net charge of the protein is zero.At pI, proteins are stable and compact.The computed pI value of all β-galactosidase sequences had pI < 7 indicating that βgalactosidase is acidic in nature.Amino acid composition determines the fundamental properties of the enzyme while the amino acid composition of xylanase sequences the isoelectric point (pI) values of all protein sequences are acidic in nature (Neelima et al., 2009).The computed isoelectric point (pI) will be useful for developing buffer systems for purification of the recombinant proteins by the isoelectric focusing method (Gasteiger et al., 2005).Although Expasy's Protparam computes the extinction coefficient (EC) for a range of (276, 278, 279, 280 and 282 nm) wavelength, 280 nm is favoured because βgalactosidase absorb this wavelength strongly thus interference from other substances in proteins can be minimised.The EC of β-galactosidase at 280 nm ranges from 150940 to 162525 M -1 cm -1 with respect to the concentration of Cys, Trp and Tyr (Table 3).The high EC value of YP_004981461.1 indicates the presence of high concentration of Cys, Trp and Tyr.The computed EC values will help in the quantitative study of protein-protein and protein-ligand interactions (Gill and Von Hippel, 1989).A protein whose instability index is smaller than 40 are predicted as stable, and a value above 40 predicts that the protein is unstable.The instability index existing at Expasy's Protparm classifies Stability and instability (Sivakumar, 2010).β-galactosidase for the following sequence were YP_004205251.1,ZP_10511829.1,BAL72724.1,AFY63015.1,NP_242888.1 which indicates that they are stable.On the other hand, ZP_10508490.1 and YP_004981461.1 are slightly unstable as their instability index is <40 (Table 3).This result was almost similar to the instability index, which gives clues about the stability of a protein in vitro can be calculated.All the considered sequences were classified as stable with a value ranging from 13.57 to 37.23 as a value > 40 indicates an unstable protein (Guruprasad et al., 1990).
The aliphatic index (AI) which is defined as the relative volume of a protein occupied by aliphatic side chains is regarded as a positive factor for the increase of thermal  3) ranged from 74.14to 80.45.The very high aliphatic index of all β-galactosidase sequences indicates that these β-galactosidases may be stable for a wide temperature range.This result was almost similar with an aliphatic index of antifreeze proteins that ranged from 57.89 to 125.23 among sequences of different varieties (Sivakumar et al., 2007).The GRAVY value for a peptide or protein was calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence.GRAVY indices of βgalactosidase ranged from -0.517 to -0.452.This low range of AFY63015.1 value indicates the possibility of better interaction with water.Similarly, compared GRAVY value of tyrosinases ranged from -0.660 to -0.191.The very low GRAVY index of tyrosinases infers that these tyrosinases could result in a better interaction with water (Sivakumar et al., 2007).The secondary structure indicates whether a given amino acid lies in a helix, strand or coil.The secondary structure features as predicted using selfoptimized prediction method and is represented in Table 4.The results reveal that alpha helix dominated among secondary structure elements followed by, extended strand, beta turns and random coils while an extended strand outnumbered random coils in between secondary structure elements.The secondary structural elements consequence prediction result for protein's alpha helix is more dominated in   Although cyclooxygenases have mixed secondary structure, that is α-helices, β-strands and coils, the αhelices are dominant features in the protein structure.
The very high coil structural content of cyclooxygenases is due to the rich content of more flexible glycine and hydrophobic proline amino acids.Proline has a special property of creating kinks in polypeptide chains and disrupting ordered secondary structure (Combet et al., 2000).A set of conserved amino acid residues located in the district that provides clues to the functions is termed as a motif.Motifs predicted using Motif search found that all protein ID contained EF-hand calcium-binding domain motif.The average length of the motif predicted was 20 starts and 32 ends.Motifs could be predicted for all protein sequences.Besides all other physicochemical characterization, functional characterization of βgalactosidase proteins were also performed, including transmembrane (TM) region identification and prediction of disulphide bonding pairs.The SOSUI server performed the identification of transmembrane helices with their corresponding length and differentiates membrane proteins from stable proteins.SOSUI distinguishes between the membrane and soluble proteins; it further predicts the transmembrane helices from amino acid sequences quickly with high precision.β-Galactosidase protein by SOSUI server and all others β-galactosidase were predicted to be soluble proteins (Hirokawa, et al., 1998).
CYS_REC identifies the positions of cysteines, the total number of cysteines present and computes the most probable S-S bond pattern of the pairs in a protein sequence.Possible disulphide bond pairing and patterns with probability were predicted by CYS_REC from primary sequence and S-S bonds were identified.The tool CYS_REC recognizes the presence of 38 cysteine residues in β-galactosidase sequences and predicted that the most probable SS bond patterns of pairs in YP_004981461 and 1AFY63015.1 were cysteine residues.Similarly, CYS-REC tool performing the prediction of S-S bonding states of cysteines and their location in proteins were also reported (Sivakumar et al., 2007).
The presence of cysteine residue disulphide bonds in corresponding β-galactosidase proteins is shown (Table 5).The predicted transmembrane region was found to be rich in hydrophobic amino acids in which many points lie above the zero baseline, and a clear peak was observed in a plot that indicates the plausible transmembrane region (Figures 2 to 8).The hydrophobicity score of BAL72724 β-galactosidase was -2.778 minimum and 2.778 maximum; a range (Table 6) as disulphide bridges play a significant role in determining the thermal stability of enzymes (Hirokawa et al., 1998).

Conclusions
To obtain desirable results in an industrial application, it needs to influence the characteristic properties of an enzyme which is a tedious task.Protein engineering techniques used to bring about this goal need a sound knowledge about the protein both at sequence and structure level.In the present study, 7 β-galactosidase sequences were selected to determine the physicochemical properties and various protein structure levels using in silico techniques.Primary structure analysis revealed that most of the β-galactosidase employed in the current study was hydrophobic in nature, and three of them contain disulphide linkages.The secondary structure analysis confirmed that in most of the sequences, random coils dominated followed by an alpha helix, extended strand and beta turns.The presence of Cys residues in β-galactosidase indicates the presence of disulfide bridges which is further confirmed using CYS_REC.This study provides the insight into physiochemical properties and functions of βgalactosidase, thus aiding in formulating their uses as an individual molecule study of β-galactosidase which has shown various unknown properties.Thus, research and development of β-galactosidase finds vast applications in several industries.

Figure 1 .
Figure 1.Phylogenetic tree β-galactosidases for Bacillus sequence.Distance options according to the kimura 2parameter model and clustering with the neighbor-joining were performed using the software package mega (Molecular Evolutionary Genetics Analyses) ver.5.1.Bootstrap percentage (50%) based on 1000 replications are given at branch points.Bar 0.02.

Table 1 .
β-galactosidase sequences retrieved from Swiss-Prot database for Bacillus Species.

Table 3 .
Parameters computed using Expasy's ProtParam tool of β-galactosidase for Bacillus species.

Table 4 .
Secondary structure elements of Beta-galactosidase sequences for Bacillus species.

Table 5 .
Patterns of Cystine -Cystine binding of β-galactosidase for Bacillus species.

Table 6 .
Hydrophobicity score and plot of β-galactosidase for Bacillus species.