Does a protein coevolve with its multiple interacting partners ? A case study

Protein-protein interactions are playing a fundamental role in different cellular activities. Although the coevolution of interacting protein pairs has been established by several groups, whether a protein having multiple interacting partners coevolves with all of its interacting partners or not have not been studied, so far. Here, the coevolution of proliferating cell nuclear antigen (PCNA) with their multiple interacting protein partners was studied. The ‘mirror tree’ method was used to predict the signature of coevolution of the interacting pairs. The results show that PCNA, which interacts with a larger number of proteins, does not coevolve with each of its partners. Rather, the degree of coevolution varies in a statistically significant wider range. The nature of coevolution of these interacting pairs in two different lineages (archaea and eukarya) has been further investigated separately. Results show that the coordinated evolutions of some of the interacting pairs are different for two different lineages. The possible reasons (percentage of disorder region of partner proteins, synonymous to non-synonymous ratio, cascade interactions, etc.) of the variations have also been discussed.


INTRODUCTION
Proteins rarely act alone.A large number of proteins interact with other proteins to carry out their respective biological functions (Pereira et al., 2006;Grigoriev, 2003) and several studies have focused on such protein-protein interactions with emphasis on their different structural and functional properties including preference of residues at the interface, combinatorial effect of interactions, the emerging properties of protein-protein interaction networks (PPIs), etc (Bork et al., 2004;Argos 1988;Janin et al., 1988;Jones and Thornton 1996;Hoskins et al., 2006;Pal et al., 2006).In addition to the structural and functional perspective, numerous studies have attempted to identify the trends in the evolution of such interacting protein partners (Altschuh, et al., 1987;Moyle et al., 1994;Pazos et al., 1997;Goh et al., 2000).These studies have shown that in the case of systems containing two different interacting proteins, change in one interacting partner often imparts a direct influence on evolution, often through a compensatory change, in the other partner to maintain the structural and functional integrity of the complex.Moreover, even in cases where the interactions among the different domains of the same protein is known to be important for its biological functions, these interacting domains have been generally observed to be coevolved that is a heritable change in one of the interacting domain has been found to exerts a selective pressure for a corresponding change in other interacting domain(s).
However, most of our knowledge of the nature of coevolution (the term 'coevolution' has been used to refer to the similarity of evolutionary histories, which can be quantified through the similarity of the corresponding phylogenetic trees of proteins) of proteins are based on studies on systems containing paired interacting protein *Corresponding author.E-mail: skbmbg@caluniv.ac.in. Tel: 91-33-23508386. Fax: 91-33-23519755.partners.However, as a large number of cellular proteins are known to interact with multiple interacting partners (at least some of which may in turn interact with one or more interacting partners), it is significant to investigate whether the observed trends for coevolution of paired interacting partners remain valid for evolution of proteins which are involved in complex protein interaction networks that are common in nature.To address the largely unexplored problem of evolution of proteins in context of such complex interaction network architecttures, we have used the evolutionary analysis of the Proliferating Cell Nuclear Antigen (PCNA) and its interacting partners, to assess the extent of structural and functional constraints that may be imposed on the evolution of a protein due to its interaction with different interacting partners.
The PCNA, is a member of the so-called DNA sliding clamp family which has a remarkable ability to interact with multiple proteins (Giovanni and Ulrich, 2003).The interacting partners of PCNA interact with PCNA through different but specific interacting sites.The sites are mainly the inter-domain connecting loop of PCNA ring like structure, N-terminal region comprising inner αhelices and the C terminal tail of PCNA (Jonsson and Hubscher, 1997;Warbrick, 2000).Although PCNA is known to interact with numerous partners, only ten of its interacting protein partners (Replication factor C3(RFC3), DNA Polymerase delta(pold), DNA Ligase 1(Ligase 1), DNA Topoisomerase 1(Topo 1), DNA Topoisomerase 2(Topo 2), Flap endonuclease 1(Fen 1), XPG endonuclease(XPG), WRN helicase(WRN), MLH 1(MLH 1), Uracil-DNA glycosylase(Uracil)) for which comprehensive literature based evidence for physical interactions of these proteins with PCNA as well as corresponding protein sequences from various taxa available, were selected for the present study.
We observed that PCNA does not have similar correlated evolution with all of its ten interacting partners.Rather, the values of correlation coefficients indicate varying degrees of correlated evolution of PCNA with its interacting partners.This lead to notion that a protein having multiple number of interacting protein partners may not coevolved with all of its partners.We have further studied the correlated evolution in two different lineages: eukarya and archaea separately.Significant differences have been observed in two lineages for some of the interacting partners.We have also searched for the possible underlying reasons for different values of correlation coefficients.When varying number of interacting partners do not throw any light, the degree of disorder of the interacting protein partners exhibit some clue for it.Here, we have explored the possibility of any specific signature that may correlate the nature of coadaptation with percentage of disorder region of the interacting partners.We have further extended our search by measuring nonsynonymous (dn) to synonymous (ds) ratio to understand whether these values can provide any rationale for the observed variations in the degrees of co evolutionary pressures.

Data collection
Protein sequences of PCNA and its ten interacting partners (proteins) from nine eukaryotic and nine archaeal species were collected from the NCBI database (http://www.ncbi.nlm.nih.gov) and are listed in Supplementary Table S1.All the proteins were collected by protein name query in the NCBI database.Whenever a desired protein was not found by simple name query, protein blast (BLASTP) (Altschul et al., 1990) in NCBI followed by manual curations was performed to incorporate such sequences in our study (Supplementary Table S1 1a,1b).All these sequences form the dataset 1.The dataset 2, which is a subset of dataset 1, includes only those proteins from the dataset 1 which are properly annotated that is neither hypothetical nor putative (putative and others are marked) (Supplementary Table 1a and 1b).
When we have studied the coevolution of PCNA with its ten interacting partners (proteins) in two different lineages separately, a comparatively larger set of sequences was used.These are listed in Supplementary Table S2 and denoted as dataset 3.All the sequences of dataset 3 were properly annotated.It includes all the sequences present in dataset 2 and also some sequences from NCBI and Orthodb (http://cegg.unige.ch/orthodb2).While protein sequences of four interacting proteins (MLH1, Uracil, XPG and WRN) of eukaryal species were taken from database of orthologous groups (http://cegg.unige.ch/orthodb2), the rest of the protein sequence of interacting partners were collected from NCBI.Contrary to the Table S1, all the sequences of the interacting proteins were not from the same set of species.However, the coevolution of any interacting pairs was studied using the sequences taken from the same set of species.
To calculate the dn/ds ratio, we collected the respective DNA sequences of proteins (listed in dataset 3) from NCBI.

Calculation of correlation coefficient (r) as an indicator of coevolution
To measure the correlated evolution of interacting partners, the most widely used method (Goh et al., 2000;Pazos and Valencia, 2001;Goh and Cohen, 2002;Ramani and Marcotte, 2003;Kim et al., 2004;Tan et al., 2004;Pazos et al., 2005;Sato et al., 2005;Mintseris and Weng, 2005;Pazos and Valencia, 2008;Pazos et al., 2008) "entire-sequence" approach of "mirror tree" comparison was used.In this method, pair wise distance matrices derived from the alignment of entire amino acid sequences were compared, their correlation coefficient values were calculated and the detections of statistically significant correlations were used to infer correlated evolutions.
Sequences of the two interacting proteins have been taken from the same set of species.CLUSTALW (Higgins et al., 1994) was used to align the sequences.The distance matrices were calculated using PROTDIST of PHYLIP (Felsenstein, 2002) package with Jones-Taylor-Thornton matrix.The linear correlation coefficient of these two distance matrices was calculated using the expression (Press et al., 1992).
Where n is the number of elements of the matrices, that is, (N 2 -N)/2, N is the number of sequences in the multiple sequence alignments, Ri are the elements of the first matrix (the distances among all the proteins in the first multiple sequence alignment), Si is the corresponding value for the second matrix and and are the respective average of Ri and Si, respectively.It should be mentioned that this r-value is an indicator of coevolution.The higher the r-value (positive) represents the more coordinated evolution.
A bootstrap analysis is used to estimate the statistical significance of the computed correlation coefficient values (r).For this, we generated 1000 sets containing n pair-wise distances randomly drawn (with replacement) from the n pair-wise distances in the original set and calculated 1000 values rrand.Z score was calculated using the expression: Where, σ is the standard deviation of rrand and rand is the mean (effectively zero for truly random data).The p-value is then obtained from p = erfc(|z|)/ , where erfc is the complement error function.
Further, we also used a two-tailed test to predict whether any two calculated r-values are statistically significantly different or not (Spiegel, 1972).
An in-house PERL script is used to calculate the r-values and the corresponding p values.

Phylogenetic tree building
For a given set of orthologous sequences, we first generated the multiple alignment using CLUSTALW (version 1.83), a progressive alignment method.For generating the bootstrapped tree, we generated the multiple copies using seqboot, and distance matrices were calculated using PROTDIST with Jones-taylor-Thornton matrix.The phylogenetic trees were constructed for multiple data sets using NEIGHBOR, a neighbor-joining method.The final tree for each of the proteins was generated using CONSENSE program.We used Phylip package version 3.6.

Protein disorder calculations
Evidence is rapidly accumulating that many protein regions and even entire proteins lack stable tertiary and/or secondary structure in solution yet possess crucial biological functions.These naturally flexible proteins regions are known by different names.We refer to these flexible regions as protein disorder region in this article.Protein disorder region provides essential biological functions because dynamic conformation allows proteins to interact with multiple targets (Dunker et al., 2002).Disordered regions are comprised of a category of amino acids distinct from that of ordered protein structures (Garner E, Cannon P,Genome Inform Ser Workshop Genome Inform 1998).We used a well established web server Poodle-S (http://mbs.cbrc.jp/poodle/poodle-s.html) (Kana Shimizu et al., 2007) to calculate protein disorder region and from that we calculated the percentage of disorder region of two eukaryotic organisms and three archaeal organisms for all 10 PCNA interacting partners, which we considered in our study.

Dn/Ds calculations
Estimation of nonsynonymous and synonymous substitution rates is widely used to understand the dynamics of molecular sequence Biswas and Kundu 25 evolution (Gillespie, 1991;Ohta, 1995).We used yn00 program of Paml3.14 package for dn/ds calculation following Yang and Nielsen (2000) method of estimation.We used the maximum likelihood method for pairwise sequence comparison.When nonsynonymous (dn) to synonymous (ds) ratio(ω) is <1, =1, >1, it is a negative selection, neutral and positive selection respectively.Coevolved interacting partners tend to show (ω) <1 selection pressure on them due to evolutionary conservation.Partners showing (ω) >1 in was the case of proteins which is not under evolutionary constrained and positive selection acting on those proteins.

RESULTS AND DISCUSSION
We calculated the Pearson correlation coefficient (r) values of PCNA and each of its ten interacting protein partners.The accession numbers of those protein sequences (dataset 1) are given in Supplementary Table S1, a and b).The r-values are listed in Table 1.We also calculated the statistical significances of these r-values.
The result shows that all the r-values except the values marked as # have p values of less than 10 -5 .The results show that seven among ten interacting partners of PCNA, namely Ligase 1, Pold, Fen 1, Topo 1, Topo 2, MLH 1 and Uracil, had high correlation coefficient values (r > 0.6).On the other hand, WRN had comparably smaller rvalues, whereas the other two interacting partners of PCNA, namely XPG endonuclease and RFC3 show very low and negative correlations (almost no correlation), respectively with PCNA.
To study how correlated evolution act on different interacting partners of PCNA, a well-established entire sequence based correlation coefficient value approach was employed.As all the proteins included in our study are known interacting partners (Giovanni and Ulrich 2003), we can excluded the possibility of false positive results that may arise due to chance.A well established method like 'mirror tree' approach is used to study the pattern of evolution of PCNA with their interacting partners.It should be mentioned that the aim of this paper was not to find any new interacting partner, but to understand the evolutionary relationships of the interacting partners with PCNA.
It is expected that the interacting proteins should coevolved (Atwell et al., 1997;Jespers et al., 1999;Moyle et al., 1994;Pazos et al., 1997).The high value of correlation coefficient (r) of two interacting proteins is an indicator of this correlated evolution of the partners (Goh et al., 2000).The interaction of PCNA with each of the ten proteins, included in our study, is experimentally verified (Giovanni and Ulrich, 2003).
So, we expect high positive r-values for each of the ten partners.However, we observed a wide range of r-values starting from very low negative (nearly zero) to high positive r-values.The statistical significances of the wide variation of the calculated r-values are given in Supplementary Table S3 to S8).This indicates that there is different order of constraints acting on PCNA and its different interacting partners.It should be mentioned that  some of the sequences (taken from 18 different species) used in the above study (dataset 1) are hypothetical, putative, etc, that is there is no experimental evidence of their functional annotation.Therefore as a next step, we calculated the r-values using only those sequences that are neither hypothetical nor putative (dataset 2).
In almost all the cases, except two (Topo 1 and Ligase 1), the r-values showed a clear increase (Table 1).Most significantly, the r-value of RFC3 becomes positive (0.202) as is evident from Table 1.However, it is still significantly low to conclude strong coordinated evolution of PCNA and RFC3.Interestingly, XPG still shows a very weak negative correlation (almost no correlation) with PCNA.Here, we also observed statistically significant differences in the r-values (Supplementary Table S4).It is also observed that when we use hypothetical or putative orthologs, we obtained comparably lower r-values.This is also expected because the hypothetical or putative orthologs have larger variations in their sequences.
It is evident from Table 1 that there is a wide variation in the r-values and the variations are also statistically significant (Supplementary Table S3 and S4).While seven interacting partners of PCNA exhibit different orders of constrains to maintain their coevolution with PCNA; three partners, namely RFC3, WRN and XPG do not show any coordinated evolution with PCNA.The results can be explained by the following arguments.The protein, PCNA has several interacting partners.The interacting partners may impose different evolutionary pressures depending on the necessity of structural and functional integrity of each of the interacting complexes.Thus, the result supports our hypothesis that a protein having multiple interacting partners may not coevolved with all of its partners, even the degrees of evolutionary pressures (constrained imposed to any change) may vary in a wide range.
So how do the coevolution of PCNA and their interacting partner proteins along two different lineages (archaea and eukarya separately) follow?Phylogenetic analysis of all available archaeal PCNA homologues suggests that creanarchaeal homologs are divided into two groups while other archeal PCNA have single PCNA (Toshie et al., 2000).So, to keep homogenity of PCNA homologues in archaeal set, we exclude any crenarchaeal sequence that was previously considered in combined set.WRN homologues and Uracil homologues were not functionally annotated in most of the archaeal organisms that we considered earlier, hence not used in independent archaeal study.
The r-values (using dataset 3) (Supplementary Table S2 2a and 2b) obtained are listed in Table 2.We observed differences in r-values between archaea and eukarya in most of the cases.
The statistical significances of the differences in r-values between archaea and eukarya lineages are given in Supplementary Table S5.
While the r-value obtained from eukaryal PCNA and polymerase delta is very high (0.897), the archaeal counterpart had lower r-value (0.693).The r-values show statistically significant difference (p <0.01).The smaller value of r in the case of archaea and its significant differences with eukaryal r-value clearly indicate that the archaeal polymerase delta and PCNA evolved in a less coordinated manner than their eukaryal counterparts.The protein Fen1 also had significantly higher r-value in eukarya than that of archaeal counterparts (p<0.01).On the other hand, we obtained a very high r-value (0.889) for archaeal Topo 1 which is nearly double that (0.464) of its eukaryal counterpart (p<0.01).Furthermore, the rvalue of archaeal RFC3 was also significantly higher (p<0.01)than that of eukaryal RFC3.These suggest that archaeal Topo 1 and RFC3 evolved with PCNA in a more coordinated manner than their eukaryal counterparts.These results clearly indicate that the evolution of interacting proteins may be significantly different along different lineages.However, it should be mentioned that in some cases (for example, Ligase 1, MLH 1), the computed r-values are significantly high both in archaea and eukarya as is evident from Table 1.In these cases, the differences in the r-values are statistically insignificant and hence indicate negligible amount of difference in their coevolution in two different lineages.Furthermore, the r-value of XPG is nearly double in archaeal lineage than eukarya and the difference is statistically significant (Supplementary Table S5).The above results show that there exists a significant difference in the r-values of archaea and eukarya for some of the partners, while for the others, the differences are not significant.Thus, we can infer that there is a possibility of different order of structural and functional constrained working in different lineages to shape the correlated evolution of interacting partners.
When we took RFC3 and PCNA sequences from both archaeal and eukaryal species and calculated the r-value for this combined set; we obtained a very low r-value (0.202).As mentioned previously, in the present study we calculated the r-values for archaea and eukarya independently.Interestingly, we obtained a high r-value (0.745) for RFC3 in archaeal lineage whereas in the case of eukarya, we still obtained a negligible correlation coefficient value (0.235).The results indicate that the archaeal PCNA evolved in a coordinated way with its interacting partner RFC3.On the other hand, the eukaryal counterparts do not have a signature of correlated evolution.The above results again indicate that the coordinated evolution of the interacting proteins may be different for different lineages.We also observed that all the interacting partners do not always coevolve.
It would be interesting to construct the phylogenetic trees for PCNA and its ten interacting partners to get insight of their clustering feature.The bootstrapped phylogenetic trees are shown in Supplementary Figure 1 to 11.It is clear from Figure 1 that eukaryal PCNAs do form a single cluster.It has been already mentioned that seven interacting partners (DNA Ligase 1, DNA Polymerase delta, DNA Topoisomerase 1, DNA Topoisomerase 2, Flap endonuclease 1, MLH 1 and Uracil DNA glycosylase ) among the ten showed positive high rvalues (r > 0.60).We also observed similar trend of phylogenetic trees for the above-mentioned seven interacting partners as is evident from Supplementary Figure 2 to 8.
The two interacting partners RFC3 and XPG which showed low negative r-values (almost no correlation) indicate no evidence for coevolution of them with PCNA.On the other hand, WRN had comparatively lower rvalue.The phylogenetic tree of WRN also clearly supports (Supplementary Figure 11) the low r-value.The striking difference of WRN with PCNA is that in the case of WRN, the two archaeal species Methanococcus maripaludis and Methanosarcina acetivorans fall within eukaryal lineage.On the other hand, the phylogenetic trees of RFC3 and XPG do not have any distinct difference in the clustering pattern of branches with PCNA phylogenetic tree.However, there are also differences in the arrangements within eukaryal kingdom.For example, Oryza sativa and Arabidopsis thaliana did not cluster together in the case of RFC3 and XPG.
The above study shows that in contrast to the expected coevolution of a protein with all of its interacting partners, PCNA interacting partners do not always coevolved with PCNA.It further indicates that the coordinated evolution of interacting partners is different for different lineages.Seven among the ten interacting partners having significantly high positive r-values, indicate the coevolution among PCNA and interacting partners, whereas the rest three do not have any signature of the coevolution.The 'entire sequence' approach used in our study deals with the pair wise distance matrix calculation of the alignment of the whole sequence.To understand the underlying reasons for wide variations in r-values, the important effect of cascading interactions and multiple interactions on the interacting protein partners is necessary to be addressed.A protein having multiple interacting partners (proteins) may exhibit different evolutionary pressures exerted by different interacting partners.Another probable reason for different order of evolutionary pressures may be following.Each of the interacting partners may also have interactions with other cellular proteins.For example the protein A may have interacting partners A1, A2 and A3.Again the protein A1 may have three partners A, A11 and A12.Each of the three proteins (A, A11 and A12) would provide structural and functional constraints to A1.Thus, when we consider the coevolution of A and A1 it is just not a pair of interactions (A and A1).Actually it is a cascading effect of coordinated pressures that ultimately develop the shape of so-called coevolution.For example, WRN, an interacting partner of PCNA has a large number of interacting partners, viz.P53, RAD52, RAD51, SUMO-1, Topo 2, RPA, etc. Furthermore, the nature and magnitude of pressure should depend on the functional importance of the complex and also on the number of interacting partners of each of the A1, A2 and A3.The pathway where the interacting proteins are involved may also be a determining factor.We have estimated the number of interacting partners of each of the ten interacting proteins of PCNA of eukarayal dataset using string database [http://string.embl.de],but failed to observe any direct correlation between r-values and number of protein partners (Figure 1).We took a stringent cut-off value of 0.9 score of STRING database to include number of interacting partners for each of the proteins.However, we did not observed any significant dependency of correlation coefficient values of PCNA interacting partners with their number of interacting partners.On the other hand, the significant point is that the interacting partners are widely involved in diverse kinds of biological pathways.The different pathways may impose different order of sequence-structure-functional constraint throughout the evolution.
Therefore, it would be very intriguing to understand whether there is any specific signature correlating the nature of coadaptation with the percentage of disorder region of the interacting partners.We identified the percentage of disorder regions of eukaryal and archaeal PCNA and the interacting proteins and the percentage of disorder regions are listed in Table 3.
We have found that all the PCNAs for the species mentioned in Table 3 had very lower percentage of disorder regions (data not shown here).On the other hand, the percentages of disorder regions of interacting proteins varied.We further classified the values (disorder region's percentages and r-values) into three groupshigher (r ≥0.6 indicated as 1 (Table 4), lower (r<0.3indicated as -1), not determining (r ≥0.30 and r <0.6 indicated as 0).Based on this classification, using the data of Table 3 and the r-value listed in Table 2, we have

Number of observation
further derived Table 4. Table 4 shows some interesting observations which is again tabulated below in a derived Table 5.The predominant are -higher disorder (with higher percentage of disorder region) proteins which when interacted with PCNA (lower percentage of disorder region) give lower r-values (5 cases in eukarya).Lower disorder (with lower percentage of disorder region) proteins interact with PCNA (lower disorder) and give higher r-values (2 in eukarya and 6 in archaea).There are exceptions also indicating that the coevolution and coadaptation may have a relationship with percentage of disorder regions, however it alone cannot explain the wide range of r-values.
To understand the involvement of lineage specific selection pressures, we calculated the dn/ds ratio of both the eukarayal and archaeal dataset for PCNA as well as for each of its interacting partners.The basic idea behind such study is that if the dn/ds ratio for any protein is >1, the protein is estimated to be under positive selection.If any PCNA interacting partners have dn/ds >1 they are not expected to coevolve.Similarly within the subset of interacting partner, if any organisms have dn/ds >1, then the organisms too are not expected to coevolve.In our study (Table 6 and for details see Supplementary Table S9) we found dn/ds >1 in the case of Topo2 in archaeal set for few organisms.Archaeal Topo2 r-value is comparatively lower than other interacting partners.
Probably, this may be one of the reasons of its lower rvalue.In the case of eukaryal Topo2, although the r-value is quite low (0.405) but we did not get any positive selection pressure in that protein set.Topo1 in eukaryal dataset shows low r-value but the dn/ds in this case was less than 1.RFC3 also shows a very low r-value in eukaryal dataset while dn/ds ratio did not give us any indication of positive selection pressure.Archaeal Fen1 dataset also showed low correlation coefficient value but only negative selection pressure existed.So, by estimating dn/ds ratio alone, enough clue of correlated evolution of PCNA interacting protein set was not gotten.Moreover, existing literature suggests that the interacting partners of PCNA are involved in various functional pathways, viz, DNA Polymerase delta, Replication factor C3, DNA Ligase 1, DNA Topoisomerase 1, DNA Topoisomerase 2; are involved in DNA replication and repair, MLH1 in mismatch DNA repair, XPG endonuclease in nucleotide excision repair, WRN helicase in double strand breaks DNA repair and Uracil DNA glycosylase in base excision repair (Giovani and Ulrich, 2003).Thus, the interacting partners having involvement in a number of different functional pathways exhibit different orders of pressures to maintain their structural and functional integrity.Finally, we can say that evolutionary relationships of a protein with its multiple interacting partners (proteins) depend on several factors that need a future study.
In summary, the evolution of a protein having multiple interacting partners is governed by the structural and functional constraints imposed by its partners.The interacting partner proteins may have different order of controls on the protein which result in differences in their coevolutionary pattern.In addition, the present work shows that the natures of coevolution of the interacting proteins are different in case of the eukaryal and archaeal lineages.The possible structural and functional constraints and their possible influences have also been discussed.It has been observed that the percentage of order and percentage of disorder region of the interacting proteins appear as the most significant determinant of their coevolutionary pattern.However, we should mention that not any single constraint (percentage of order and disorder region of proteins) but a set of constraints like cascading effects of interaction of interacting partners, their functional constraints, etc. should also play important roles in shaping the coevolutionary nature of multiple-interacting proteins.

Figure 1 .
Figure 1.Relation between correlation coefficient values (r) and number of interacting partners.

Table 1 .
Correlation coefficient values of PCNA and its ten different interacting partners.

Table 2 .
Correlation coefficient values of PCNA and its ten different interacting partners.

Table 3 .
Disorder percentage of ten PCNA interacting partners in Eukarya and Archaea lineage.

Table 4 .
Relationship of disorder and r value .Higher disorder taken as 1 and lower disorder taken as -1.

Table 5 .
Representation of Disorder and r-value in Eukarya and Archaea.
#This set of organisms sequence taken from NCBI database.*Thisset of Organisms sequence taken from Orthodb database.Supplementary

Table S8 .
Z-values obtained from two-tailed test to predict whether any two calculated r-values are satistically significant or not.The r-values are statistically significantly different at the 0.10%, 0.05% and 0.01% level if the |Z| values are greater than 1.65, 1.96 and 2.58, respectively.The results are given for the r-values obtained using the combined with hypotheticals and comined without hypotheticals sequences (dataset 1 and dataset 2).