Potential drug targets prediction for H1N1 influenza A based on protein-protein interaction networks

In this paper, some proteins were determined as potential drug targets for H1N1 influenza A. These proteins were found using three steps: i) determining a set of human proteins that had interactions with the proteins of H1N1 virus, and assigning a weight to each protein; ii) mapping the set of human proteins into the human protein-protein interaction networks; iii) combining the topological properties of the network and the weights of the proteins to determine the score of each node. Eventually, TRAF2 and MAPK9, two nodes with high scores, were regarded as potential drug targets. The results were consistent when different databases were used, and were in agreement with the KEGG pathways to some degree. However, it still needs further study to ascertain whether these potential targets can be novel practical targets applied in the drug design.


INTRODUCTION
In March and early April 2009, a new H1N1 virus emerged in Mexico and the United States.During the first few weeks of surveillance, the virus spread worldwide to 30 countries by human-to-human transmission, causing the World Health Organization to raise its pandemic alert level (Smith et al., 2009;Roberts et al., 2012).Currently, two classes (M2 channel blockers and neuraminidase inhibitors) of U.S. Food and Drug Administration (FDA)approved influenza antiviral drugs are available, but there are great concerns of emergence of viral resistance.Hence, timely identification of suitable targets, which are key elements related with the mechanism of the influenzas is crucial (Basler, 2007;Vijayan et al., 2012).Besides, the identification method should be used to predict targets for similar diseases in case of another flu Typically, a drug target is a key molecule involved in some particular metabolic or signaling pathways that are specific to a disease condition or pathology or to the infectivity or survival of a pathogen (Sams-Doddf, 2005;Knowles and Gromo, 2003;Cheng et al., 2007).Identifications of novel drug targets that activate or inhibit these responses can be broadly divided into studies at the physiological, mechanistic or genetic level (Lindsay, 2005).A great number of databases at genetic or proteomic level make it possible to do some in-silicon predictions on novel targets.
During the process of a H1N1 viral infection, proteins of the virus interfere with many of the host proteins, and some of these proteins interact with each other, or interact with the other proteins simultaneously (Shapira et al., 2009;Zhirnov et al., 2002).Investigating the individual protein is not enough to elucidate the complex variation of the bioprocesses of a body in a disease condition.Instead, mapping these proteins to the protein-protein interaction networks can highlight the complex relationships among them and help to determine the key nodes which can be potential drug targets.
This study tried to identify the influenza-related proteins that could be potential drug targets.To achieve this goal, there are mainly three steps, i) find out a set of proteins of human that have interactions with the proteins of H1N1 virus, and assign a weight to each protein according to the number of its interactions with the viral proteins; ii) map the set of human proteins into human protein-protein interaction networks, describing the relationship among these proteins; iii) combine the topological properties of the network and the weights of the proteins to determine the score of each node, the nodes with high scores can be regarded as potential drug targets.Several previous studies had identified potential drug targets through network approaches (Wu et al., 2008;Oti et al., 2006).However, most of these methods used known targets to predict new targets without considering the proteins of virus and human together.By spanning the proteins of influenza virus and human together, our method can have a full perspective about the disease.

Extracting protein-protein interactions (PPIs) between human and H1N1 influenza virus
When a H1N1 influenza virus invades somebody, the proteins of this virus will have effects on the bioprocesses of human body by interacting with human proteins.So, it is necessary to find out which human proteins have interactions with the virus.
There are many accessible protein-protein interactions databases, such as BioGrid (Stark et al., 2011), IntAct (Kerrien et al., 2012), Human Protein Reference Database (HPRD; Keshava et al., 2009), Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; von Mering et al., 2005), GeneMANIA (Mostafavi et al., 2008).Among these databases, IntAct is a database that can be easily used to retrieve the PPIs of two different species (the results show the interactions of which two interactors are from two different species).
The PPIs between human and H1N1 influenza virus can be investigated using the Molecular Interaction Query Language (MIQL) (Kerrien et al., 2012), which is based on Lucene syntax to limit the species (to get the protein-protein interactions between H1N1 influenza virus and human, the query should be:' taxid A: 211044 AND taxid B: 9606').At last, 60 records were given out by the database.

Assigning weights to proteins
Based on the assumption that human proteins which are interfered by more proteins of the virus are more likely to play key roles in the bioprocesses of the disease, the weight of each human protein is defined as: Where p stands for a protein, c(p) represents the number of the interactions that the protein p has with viral proteins, and N is the total number of interactions.So, a protein that has more neighbors Chen et al. 2951 will be assigned with a higher weight, and the weight can represent the importance of the protein.

Mapping proteins into the human PPI networks
To find out the relationships among the human proteins further, the human proteins that have interactions with virus were mapped into a human PPI network.STRING and GeneMANIA were used to extract sub-networks that span all the 54 proteins and their interactions.
STRING is a database containing known and predicted protein interactions (both physical and functional associations derived from genomic context, high-throughput experiments, conserved coexpression, and previous knowledge).It can also be employed to retrieve the interactions among multiple proteins.Besides, GeneMANIA is another similar database that can be used to find out the interactions among a set of input proteins.On the networks, the nodes are proteins, the edges are interactions between nodes, and each edge is weighted by a confidence score.

Combining the network and the weight of each node to determine a score for each node
Since a node with many neighbors in the PPI network is more likely to play an important role in the bioprocesses of human body, and the node that is a neighbor of an important node is probably another important one, the importance of each node might be related with how many neighbor nodes it has and the importance of its neighbors.Consequently, the score for each node is defined as: Where p stands for a protein or a node, i w is the weight of protein i, nei(p) represents the nodes that are connected with p directly in the network, and confidencei ranges from 0 to 1 (and it is the confidence score given by the PPI database to describe the reliability of the interaction).Thus this score considers how many neighbors one node has and how important and reliable the neighbors are at the same time.A node with a higher score must be an important protein in the PPI network, as well as a protein have impacts on the interactions between human and virus.Then all the proteins are ranked by the scores.

Protein-protein interactions retrieved from the IntAct
By query with the IntAct, we got 60 interactions between the proteins of H1N1 influenza virus and the proteins of human (Figure 1).Among these 60 interactions, there were 6 proteins from H1N1 influenza, and 54 proteins from human.Set A and Set B were used to represent proteins of the two species respectively (set A contained 6 viral proteins, and set B contained 54 human proteins).

Weights of proteins
Each protein of the set B was weighted based on the number of interactions it had with the proteins in set A. Most of the proteins in set B interacted with 1 or 2 proteins in set A, so there were two weights.Only 6 of them had weights of 0.033; the others were weighted as 0.017 (as shown in Table 1).Each node represents a protein.An edge between two nodes means that there is an interaction between these two nodes.Among these genes, NP, PB1, NS, M, PB2, PA are from H1N1 influenza, while the others are from human.

Determining the score of each node in the network
Figure 2 shows the result of mapping proteins to PPI network of STRING.To determine which nodes are more likely to play a key role in the process of the virus invasion, we calculated the score of each node and the five top proteins are shown in Table 2.However, the interactions shown on the STRING were less than the interactions that can be derived by the tool of GeneMANIA using a very large set of functional association data.To make sure that this method is consistent when different databases are used, we also used GeneMANIA to calculate the scores of proteins in set B. The corresponding result is shown in Figure 3.In addition, the score for the top eight are shown in Table 3. Comparing these two results, it was found that four of the top five proteins in Table 2 are in the top eight proteins in Table 3.This was significant for the p-value is only 0.00002.It shows that the results are consistent.As TRAF2, MAPK9, UBE2I, CCDC33, especially TRAF2 and MAPK9, rank high for both results, these proteins are more likely to play key roles in the bioprocess of H1N1 influenza A and may be potential targets for the disease.All of the proteins on this map are from set B. On Figure 2, each edge has a confidence score given by STRING, different colors represent different kinds of proteins and this study only used the interactions between proteins, so these colors have no impact on the analyses or calculations.On Figure 3, gray nodes represent proteins

Analysis and validation of the results
According to the scores derived from the PPI network, MAPK9, TRAF2, UBE2I, CCDC33 are proteins with liable high scores (they rank high for both results).They may play key roles in the bioprocesses during the virus   infection, and they may be putative targets.By mapping them into the pathways of KEGG (Kanehisa et al., 2010), it can be found that MAPK9 is a factor in the pathway of Influenza A, and TRAF2 has many interactions with the factors (TNF, TNFRSF1A, TNFRSF10A, TNFRSF10B) ofthis pathway.So the pathway of influenza A can be disturbed by binding with MAPK9 or TRAF2.These two proteins calculated by our method can be potential drug targets.Other proteins with low scores (EXOSC8, DVL3, BHLHE40, 'FXR2, GLYAT etc) are not related with the pathway of Influenza A. This shows that our method has the potential to predict novel putative targets.As for other genes found out by our method, which are not on the pathway of influenza, although it can be that they are not potential targets, it is possible they have impacts on the human bioprocesses through other ways or the pathways recorded on the KEGG are not complete enough to contain all related proteins.

Extrapolation to other proteins
The method aforementioned only uses the proteins that have direct interactions with the virus, but there are many other proteins which can interact with the virus indirectly.
Considering that GeneMANIA also determine other proteins that are related to a set of input proteins, it can find out some indirect interactors that are not in set B (the white nodes in Figure 3).In addition, considering GeneMANIA finds other proteins that are related to a set of input proteins, we calculated the scores of proteins that were not in set B by summing up the scores of neighbors that were in set B. Finally, 'PTPN11', 'DICER1', 'STRBP', 'ILF3', 'CHRM1' were among the top 5 proteins.Although they are not related with the viral proteins directly, they interact with many proteins of set B and may perhaps play key roles in the bioprocess during the infection progress.
Further investigation is needed to ascertain the mechanism of the diseases of H1N1 influenza, and more attention should be paid on the proteins (or corresponding genes) ranked high by the method.As they interact with many proteins that may be invaded by the virus, they must play certain roles in the bioprocesses of this disease.Moreover, it is also essential to understand other critical aspects of the drug targets such as underlying toxicity, clinical effects, adverse drug reactions and others (Imming et al., 2006;Hopkins and Groom, 2002).Consequently, it still needs profound investigation to ascertain whether these potential drug targets can be practically applied in drug design (Noble et al., 2004;Lorna and Saad, 2006;Smits et al., 2005).

Figure 2 .
Figure 2. The PPI network derived from STRING.

Table 1 .
Weights of proteins in set B.

Table 2 .
Top five proteins derived by the PPI network of STRING.

Table 3 .
Eight top proteins derived by the PPI network of GeneMANIA.