Journal of
Bioinformatics and Sequence Analysis

  • Abbreviation: J. Bioinform. Seq. Anal.
  • Language: English
  • ISSN: 2141-2464
  • DOI: 10.5897/JBSA
  • Start Year: 2009
  • Published Articles: 46

Full Length Research Paper

In silico characterization of beta-galactosidase using computational tools

Gangadhar. C. Gouripur
  • Gangadhar. C. Gouripur
  • Department of Studies and Research in Biotechnology and Microbiology, Karnatak University, Dharwad - 580 003, India.
  • Google Scholar
Rohit. B. Kaliwal
  • Rohit. B. Kaliwal
  • Department of MCA, Vishwsvaraya Technological University, P.G Centre Regional Office, Gulbarga- 585106, India.
  • Google Scholar
Basappa. B. Kaliwal
  • Basappa. B. Kaliwal
  • Department of Studies in Biotechnology and Microbiology, Karnatak University, Dharwad, Karnataka, India.
  • Google Scholar

  •  Received: 21 February 2015
  •  Accepted: 25 August 2015
  •  Published: 31 July 2016


β-galactosidase (EC. is an important enzyme, mainly used in the preparation of lactose hydrolyzed milk suitable for people with lactose intolerance. It is essential to understand the structural and functional aspects of various β-galactosidase produced from different sources. The present work deals with the use of bioinformatics to describe the physiochemical, functional and structural properties of β-galactosidase enzymes on Bacillus sp. selected from the gene bank of NCBI. The grand average hydropathy (GRAVY) and low range of AFY63015.1 value indicates the possibility of better interaction with water and instability index were computed to characterise YP_004205251.1, ZP_10511829.1, BAL72724.1, AFY63015.1, NP_242888.1 stating that they are stable and disulfide bridges, CYS_REC recognizes the presence of 38 cysteine residues in β-galactosidase sequences and predicted most probable SS bond patterns of pairs in YP_004981461 and 1AFY63015.1. The self- optimized prediction method (SOPM) was used to predict the secondary structure. The SOPM results indicated the presence of alpha helix is more dominated in sequences AFY63015.1 and YP_004981461.1. Overall this represents in silico analysis of sequence, structural and functional information of β-galactosidase of Bacillus species.
Key words: Bacillus sp., β-galactosidase, in silico analysis, physico-chemical characterization, proteomics tools.


Lactase also known as β-galactosidase (E.C is an enzyme that hydrolyzes lactose (abundant disaccharide found in milk) into glucose and galactose and has a potential importance in the dairy industry (Voget et al., 1994; Domingues et al., 2005; Gouripur et al., 2013). β-Galactosidase has tremendous potential in research and application in various fields like food, pharmaceutical, bioremediation, biosensor, diagnosis and treatment of disorders. It is used in the preparation of lactose hydrolyzed milk suitable for lactose intolerant people. (Kaur et al., 2006; Patil et al., 2011).

The commercial enzymes used for lactose hydrolysis are β-galactosidase obtained from various origins (Jurado et al., 2002). Sources of β-galactosidase include plants, animals, bacteria, yeasts (intracellular enzyme), fungi and moulds(extracellular enzyme). Among these sources, bacteria are preferred because of their simplicity in fermentation with optimum activity and good stability (Sani et al., 1999). Some strains have been proved to have probiotic activity enhancing the digestion of lactose (Vinderola and Reinheimer, 2003).

The lactose, a disaccharide specifically known as 4-O- β-D-galactopyranosyl-D-glucose is found exclusively in milk. The nutritional value of lactose is limited due to the fact that a large portion, approximately 50% of the world’s inhabitants lacks this enzyme and cannot utilize lactose, therefore, developing lactose maldigestion or intolerance (Vasiljevic and Jelen, 2002). This, however, creates a potential market for the application of β-galactosidase. The current share of food enzymes is 37% of total enzyme sales corresponding to 720 million dollars in the year 2004. This value has increased to 863 million dollars by the year 2009, increasing the demand for the discovery of new species, producing enzymes such as β- galactosidase with novel characteristics, which will be of great value to the enzyme industry for different applications (Cortes et al., 2005). β-galactosidase differ in their physicochemical properties, structures, specific activities, thermostability and yields, thus providing a great deal of choices in their potential usages. Therefore, the present investigation deals with the in silico analysis and characterization of β-galactosidase from Bacillus species.


Sequence retrieval

The β-galactosidase sequences were obtained from NCBI and retrieved from Swiss-Prot, a public domain protein database (Boeckmann et al., 2003). A total number of seven sequences were retrieved from Swiss-Prot by random selection. Protein sequences of β-galactosidase from Bacillus sp. were retrieved in FASTA format and used for further analysis. The phylogenetic tree was constructed using the aligned sequences by the neighbor-joining (NJ) method using kimura 2-parameter distances in the MEGA beta 5.1 software (Tamura et al., 2011). The method of analyses and statistics used for phylogeny test was by bootstrap, with its bootstrap replication number 1000 as shown in the Figure 1.



Physico-chemical characterization

The physicochemical properties were calculated from the primary structure of β-galactosidase. Molecular weight, theoretical isoelectric point (pI), total number of positive and negative r esidues, extinction coefficient (Gill and Von Hippel, 1989), instability index (Guruprasad et al., 1990), aliphatic index (Ikai et al., 1980) and grand average hydropathy (GRAVY) (Kyte and Doolo-ttle, 1982) were computed using the Expasy’s Prot-Param (Gasteiger et al., 2005) ( /tools/protparam.html) prediction server.


Secondary structure prediction

Self-optimized prediction method (SOPM) was employed for calculating the secondary structural features of the selected target protein sequences considered for this study (Table 4). The identification of transmembrane regions of a protein was identified by server SOSUI (Classification and Secondary Structure Prediction of Membrane Proteins). It represents the transmembrane regions identified for β-galactosidase proteins (Hirokawa et al., 1998).



Functional characterization

The predicted transmembrane helices were visualized and analyzed using the self-optimized prediction method (SOPM) (Geourjon and Deleage, 1995). The SOPM is used to improve the success rate in the prediction of the secondary structure of proteins. SOPM parameters such as W indow width-17, Similarity threshold-7, Number of states-4 were computed, and the computational methods were applied for determining disulphide bonds. Disulphide bonds are important in determining the functional linkages, so, S-S bonds were analyzed using the primary protein sequence data with the help of CYS_REC. Motifs in the considered sequences were scanned using Motif Search. SOSUI server was used to predict the transmembrane tendency of the proteins. Kyte and Doolittle mean hydrophobicity score was calculated, and the plot was obtained using Kyte and Doolittle method, keeping a window size of 9 (Hirokawa et al., 1998).



In silico analysis of β-galactosidase sequences from Bacillus sp. BLAST searches of the NCBI database using β-galactosidase sequences as queries showed that 7 other β-galactosidases share 75-100% identity with Phylogenetic tree β-galactosidases for Bacillus sequence. Distance options according to the kimura 2- parameter model and clustering with the neighbor-joining were performed using the software package mega (Molecular Evolutionary Genetics Analyses) ver. 5.1. Bootstrap percentage (50%) based on 1000 replications are given at branch points (Bar 0.02) (Figure 1) (Tamura et al., 2011). The sequence differences among β- galactosidases suggest differences in their enzymatic properties and biological functions.

The present results (Table 1) showed that the 7 β-galactosidase sequences of Bacillus sp. were retrieved from SWISS-PROT (Bairoch and Apweiler, 2000). The primary structure was analysed, and different parameters computed using ExPasy ProtParam tool was tabulated (Tables 2 and 3). The results suggest that β- galactosidase sequences are mostly hydrophobic, and their hydrophobic nature is due to the presence of high content of non-polar residues (Sivakumar et al., 2007). The average molecular weight of β-galactosidase is 79709.0 Dalton (Table 3). The isoelectric point (pI) is the pH at which the charge covers the surface of the protein, but the net charge of the protein is zero. At pI, proteins are stable and compact. The computed pI value of all β- galactosidase sequences had pI < 7 indicating that β- galactosidase is acidic in nature. Amino acid composition determines the fundamental properties of the enzyme while the amino acid composition of xylanase sequences the isoelectric point (pI) values of all protein sequences are acidic in nature (Neelima et al., 2009). The computed isoelectric point (pI) will be useful for developing buffer systems for purification of the recombinant proteins by the isoelectric focusing method (Gasteiger et al., 2005).





Although Expasy’s Protparam computes the extinction coefficient (EC) for a range of (276, 278, 279, 280 and 282 nm) wavelength, 280 nm is favoured because β-galactosidase absorb this wavelength strongly thus interference from other substances in proteins can be minimised. The EC of β-galactosidase at 280 nm ranges from 150940 to 162525 M-1 cm-1 with respect to the concentration of Cys, Trp and Tyr (Table 3). The high EC value of YP_004981461.1 indicates the presence of high concentration of Cys, Trp and Tyr. The computed EC values will help in the quantitative study of protein-protein and protein-ligand interactions (Gill and Von Hippel, 1989). A protein whose instability index is smaller than 40 are predicted as stable, and a value above 40 predicts that the protein is unstable. The instability index existing at Expasy’s Protparm classifies Stability and instability (Sivakumar, 2010). β-galactosidase for the following sequence were YP_004205251.1, ZP_10511829.1, BAL72724.1, AFY63015.1, NP_242888.1 which indicates that they are stable.

On the other hand, ZP_10508490.1 and YP_004981461.1 are slightly unstable as their instability index is <40 (Table 3). This result was almost similar to the instability index, which gives clues about the stability of a protein in vitro can be calculated. All the considered sequences were classified as stable with a value ranging from 13.57 to 37.23 as a value > 40 indicates an unstable protein (Guruprasad et al., 1990).

The aliphatic index (AI) which is defined as the relative volume of a protein occupied by aliphatic side chains is regarded as a positive factor for the increase of thermal stability of globular proteins. Aliphatic index for the β-galactosidase sequences (Table 3) ranged from 74.14to 80.45. The very high aliphatic index of all β-galactosidase sequences indicates that these β-galactosidases may be stable for a wide temperature range. This result was almost similar with an aliphatic index of antifreeze proteins that ranged from 57.89 to 125.23 among sequences of different varieties (Sivakumar et al., 2007). The GRAVY value for a peptide or protein was calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence. GRAVY indices of β-galactosidase ranged from -0.517 to -0.452. This low range of AFY63015.1 value indicates the possibility of better interaction with water. Similarly, compared GRAVY value of tyrosinases ranged from -0.660 to -0.191. The very low GRAVY index of tyrosinases infers that these tyrosinases could result  in a better interaction with water (Sivakumar et al., 2007).The secondary structure indicates whether a given amino acid lies in a helix, strand or coil. The secondary structure features as predicted using self-optimized prediction method and is represented in Table 4. The results reveal that alpha helix dominated among secondary structure elements followed by, extended strand, beta turns and random coils while an extended strand outnumbered random coils in between secondary structure elements. The secondary structural elements consequence prediction result for protein’s alpha  helix  is  more  dominated  in sequences AFY63015.1 and YP_004981461.1.



This result was almost similar  with the cyclooxygenases that has rich alanine content. Although cyclooxygenases have mixed secondary structure, that is α-helices, β-strands and coils, the α-helices are dominant features in the protein structure. The very high coil structural content of cyclooxygenases is due to the rich content of more flexible glycine and hydrophobic proline amino acids. Proline has a special property of creating kinks in polypeptide chains and disrupting ordered secondary structure (Combet et al., 2000).

A set of conserved amino acid residues located in the district that provides clues to the functions is termed as a motif. Motifs predicted using Motif search found that all protein ID contained EF- hand calcium-binding domain motif. The average length of the motif predicted was 20 starts and 32 ends. Motifs could be predicted for all protein sequences. Besides all other physicochemical characterization, functional characterization of β- galactosidase proteins were also performed, including transmembrane (TM) region identification and prediction of disulphide bonding pairs. The SOSUI server performed the identification of transmembrane helices with their corresponding length and differentiates membrane proteins from stable proteins. SOSUI distinguishes between the membrane and soluble proteins; it further predicts the transmembrane helices from amino acid sequences quickly with high precision. β-Galactosidase protein by SOSUI server and all others β-galactosidase were predicted to be soluble proteins (Hirokawa, et al., 1998).

CYS_REC identifies the positions of cysteines, the total number of cysteines present and computes the most probable S-S bond pattern of the pairs in a protein sequence. Possible disulphide bond pairing and patterns with probability were predicted by CYS_REC from primary sequence and S-S bonds were identified. The tool CYS_REC recognizes the presence of 38 cysteine residues in β-galactosidase sequences and predicted that the most probable SS bond patterns of pairs in YP_004981461 and 1AFY63015.1 were cysteine residues. Similarly, CYS-REC tool performing the prediction of S-S bonding states of cysteines and their location in proteins were also reported (Sivakumar et al., 2007).

The presence of cysteine residue disulphide bonds in corresponding β-galactosidase proteins is shown (Table 5). The predicted transmembrane region was found to be rich in hydrophobic amino acids in which many points lie above the zero baseline, and a clear peak was observed in a plot that indicates the plausible transmembrane region (Figures 2 to 8). The hydrophobicity score of BAL72724 β-galactosidase was -2.778 minimum and 2.778 maximum; a range (Table 6) as disulphide bridges play a significant role in determining the thermal stability of enzymes (Hirokawa et al., 1998).











To obtain desirable results in an industrial application, it needs to influence the characteristic properties of an enzyme which is a tedious task. Protein engineering techniques used to bring about this goal need a sound knowledge about the protein both at sequence and structure level. In the present study, 7 β-galactosidase sequences were selected to determine the physicochemical properties and various protein structure levels using in silico techniques. Primary structure analysis revealed that most of the β-galactosidase employed in the current study was hydrophobic in nature, and three of them contain disulphide linkages. The secondary structure analysis confirmed that in most of the sequences, random coils dominated followed by an alpha helix, extended strand and beta turns. The presence of Cys residues in β-galactosidase indicates the presence of disulfide bridges which is further confirmed using CYS_REC. This study provides the insight into physiochemical properties and functions of β- galactosidase, thus aiding in formulating their uses as an individual molecule study of β-galactosidase which has shown various unknown properties. Thus, research and development of β-galactosidase finds vast applications in several industries.


The authors have not declared any conflict of interests.


Authors are profusely thankful to the Department of Biotechnology (DBT), Ministry of Science and Technology, Government of India, New Delhi, for funding the Bioinformatics Infrastructure Facility Project (BT/BI/25/001/2006 VOL II date 05-03-2012) and also the Interdisciplinary Program for Life Science Project (BT/PR/4555/INF/22/126/2010 dated 30-09-2010) and P. G Departments of Biotechnology and Microbiology Karnatak University, Dharwad, for providing the facilities for pursuing the research work at the Department.


Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement. TrEMBL in 2000. Nucleic Acids Res. 28:45-48.


Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I (2003). The swiss-prot protein knowledge base and its supplement trembl in 2003. Nucleic Acids Res. 31:365-370.


Cortes G, Trujillo-Roldan MA, Ramirez OT, Galindo E (2005). Production of β-galactosidase by Kluyveromyces marxianus under oscillating dissolved oxygen tension. Process Biochem. 40:773-778.


Combet C, Blanchet C, Geourjon C, Deléage G (2000). [email protected]: Network Protein Sequence Analysis.Trends Biochem. Sci. 25:147- 150.


Domingues L, Lima N, Teixeira JA (2005). Aspergillus niger β-galactosidase production by yeast in a continuous high density reactor. Process Biochem. 40:1151-1154.


Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD (2005). Protein Identification and Analysis Tools on the ExPASy


Geourjon C, Deleage G (1995). SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput. Appl. Biosci. 11:681-684.


Gill SC, VonHippel PH (1989). Extinction coefficient. Anal Biochem. 182:319-328.


Guruprasad K, Reddy B, Pandit MW (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Prot Eng. 4:155-164.


Gouripur G, Kaliwal B (2013). Isolation and characterization of β-galactosidase producing Bacillus subtilis from milk. World J. Pharm. Res. 3:597-618.


Hirokawa T, Chieng SB, Mitaku S (1998). SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 14:378-379.


Ikai AJ (1980). Thermo stability and aliphatic index of globular proteins. J. Biochem. 88:1895-1898.


Jurado E, Camacho F, Luzon G, Vicaria M (2002). A new kinetic model proposed for enzymatic hydrolysis of lactose by a β-galactosidase from Kluyveromyces fragilis. Enzyme Microbial. Technol. 31:300-309.


Kaur K, Mehmood S, Mehmood A (2006). Hypolactasia as a molecular basis of lactose intolerance. Ind J. Biochem. Biophys. 43:267-274.


Kyte J, Doolottle RF (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105-132.


Neelima A, Amit KB, Srilaxmi M, Upadhyayula M (2009). Comparative characterization of commercially important xylanase enzymes. Bioinformation 3(10):446-453.


Patil MM, Mallesha KV, Bawa AS (2011). Characterization of partially purified β-galactosidase from Bacillus Sp MTCC-864. Recent Res. Sci. Technol. 3:84-87.


Sani RK, Chakraborti S, Sobti RC, Patnaik PR, Banerjee UC (1999). Chracterization and some reaction engineering aspects of thermostable extracellular β-galactosidase from a new Bacillus species. Folia Microbiol. 44:367-371.


Sivakumar K, Balaji S, Gangaradhakrishna N (2007). In silico characterization of antifreeze proteins using computational tools and servers. J. Chem. Sci. 119(5):571-579.


Sivakumar K (2010). Biocomputation and Biomedical Informatics: Case Studies and Applications, Edition: 1, Publisher: Medical Information Science Reference (an imprint of IGI Global), Editors: Athina Lazakidou University of Peloponnese, Greece. pp. 143-157.


Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011). MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28: 2731-2739.


Vasiljevic T, Jelen P (2002). Lactose hydrolysis in milk as affected by neutralizers used for the preparation of crude β-galactosidase extracts from Lactobacillus bulgaricus11842. Innov. Food Sci. Emerging Technol. 3:175-184.


Vinderola CG, Reinheimer JA (2003). Lactic acid starter and probiotic bacteria: a comparative "in vitro" study of probiotic characteristics and biological barrierresistance. Food Res. Intern. 36:895-904.


Voget CE, Flores MV, Faloci MM, Ertola RJ (1994). Effects of the ionic environment on the stability of Kluyveromyces lactis β-galactosidase. Lebensm. Wiss Technol. 27:324-330.