Novel characteristics and polymorphisms of hemagglutinin subunit 1 of 2009-2011 A / H 1 N 1 viruses in Zhejiang , China

Since the 2009 pandemic A/H1N1 influenza virus emerged in North American, two H1N1 peaks have been reported in Zhejiang, China. The first peak occurred in November 2009 and the second in January 2011. In this study, we collected and analyzed the HA1 sequences of the Zhejiang H1N1 viruses in 2009 and 2010-2011. The phylogenic tree of HA1 suggested that the Zhejiang viruses were all derived from the 2009 pandemic viruses in North American. The consensus informational spectrum (CIS) of HA1 showed that the receptor binding preference of the Zhejiang viruses was also the same as that of the North American viruses. However, a lot of mutations in HA1 happened during local transmission and some of them could significantly increase or decrease the amplitude at the dominant frequency in informational spectrum (IS), implying that they may influence the receptor binding affinity. The structure analysis showed that four critical mutations, K219I, D222G, G225R and A134T, occurred in the receptor binding sites, among which D222G may be essential for the emergence of a lethal strain.


INTRODUCTION
The majority of influenza epidemics are caused by influenza virus type A, which is coded by a genome of eight single-strand RNA segments (HA, NA, NP, M, NS, PA, PB1 and PB2) (Poland et al., 2007).Influenza A viruses can be further classified into different subtypes (H1N1, H2N2, H3N2, etc) based on the viral surface proteins hemagglutinin (HA) and neuraminidase (NA).HA is a spike-shaped homotrimer, where each monomer consists of two disulfide-linked subunits HA1 and HA2 (Mineev et al., 2013).HA1, which is at the distal end of the spike, is responsible for binding of the virus to its host cell receptors.HA2 forms the stem of the spike and mediates the fusion between the viral envelop and the host cell membrane (Skehel and Wiley, 2000).During the evolution of influenza viruses, their genome segments underwent continuous reassortments among different hosts including birds, pigs and humans (Zimmer and Burke, 2009).Influenza pandemics occur when newly emerged viruses are effectively transmit to the human population with no existing immunity.
In April 2009, a novel swine-origin influenza virus (pandemic 2009 A/H1N1) infected humans in North American and rapidly spread worldwide by human-tohuman transmission.Up to April 2010, it is estimated that about 61 million people were infected with 2009 H1N1 and 12 thousand deaths occurred (Glatman-Freedman et al., 2012).Molecular phylogenic analyses revealed that the 2009 H1N1 virus was derived from several viruses reassorting in swine (Garten et al., 2009;Smith et al., 2009).The informational spectrum method (ISM) showed that HA1 of the new virus has distinct biochemical characteristics from other H1N1 subtypes, which could reflect its receptor binding specificity (Veljkovic et al., 2009;Hu, 2010).The 3D structure of HA of the pandemic virus was also determined, which is similar to other reported HA structures but has a strict preference for human receptors (Xu et al., 2010;Yang et al., 2010).
In Zhejiang, China, the 2009 H1N1 virus was first reported in May.The percentage of sentinel respiratory specimens testing positive for H1N1 rapidly rose to 100% in November 2009.The second peak came in January 2011, but it was much lower than the first one.To understand the characteristics and variations of the H1N1 viruses in Zhejiang during 2009-2011, we collected and analyzed their HA1 sequences in this study.
After collapsing identical sequences and excluding sequences with ambiguous nucleotides, we preserved 27 and 16 HA sequences in year 2009 and 2010-2011, respectively.The HA1 segments of the sequences were extracted by the NCBI Influenza Virus Sequence Annotation Tool (Bao et al., 2007).Based on a recent study, we also retrieved five representative HA1 sequences for each H1N1 subtype from NCBI, including American avian, Eurasian swine, American swine, human seasonal and 2009 American pandemic viruses.

Phylogeny construction
All HA1 sequences were aligned at the codon level by ClustalX 2.0.10 (Larkin et al., 2007).The phylogenic tree was constructed with the maximum likelihood (ML) method by MEGA 5.0.1 (Tamura et al., 2011).The substitution model were determined as GTR+I+Γ4 (the general time-reversible model with the proportion of invariants sites and the gamma distribution of among-site rate variation with four categories) by jModelTest 0.1.1 (Posada et al., 2008).The robustness of the tree topology was evaluated with the bootstrap resampling method for 100 times.

Information spectrum method
The information spectrum method (ISM) for digital signal processing is widely used to identify the structural and functional characteristics of proteins (Veljkovic et al., 2008).Firstly, the amino acid sequence was translated to a numeric sequence according to the value of electron-ion interaction potential (EIIP), which represents the unique biophysical property of each amino acid (Godzik, 2003).Next, the numeric sequence is decomposed by discrete Fourier transform into a series of periodical functions.Finally, the informational spectrum (IS) for the sequence is computed as the energy density spectrum: Where, N is the sequence length, X(n) is the discrete Fourier transformation coefficient and S(n) is the amplitude at frequency n/N.The maximum frequency in IS is 0.5.The common frequency components for K amino acid sequences can be determined by their consensus informational spectrum (CIS): Where, Si(n) is the amplitude at frequency n/N for sequence i, and C(n) is the corresponding amplitude in CIS.Generally speaking, the peak frequency in IS and CIS represents the primary biochemical property of proteins.The significance of a peak can be measured by its signal-to-noise ratio (S/N), which is defined as the ratio between its amplitude and the mean amplitude of the whole spectrum.We implemented the ISM algorithm in R.

Structure analysis
The crystal structure of the HA protein from the A/California/04/2009 H1N1 virus was retrieved from the RCSB PDB website (PDB ID: 3LZG).A HA1 monomer was extracted from the whole protein.The positions of individual amino acids were marked and displayed in the 3D structure by Jmol 12.2.

Phylogeny of HA1
We constructed the phylogenic tree for the representative HA1 sequences of H1N1, including those collected in Zhejiang during 2009-2011 (Figure 1).The phylogenic tree is consistent with previous reports, suggesting that the HA1 segments of the 2009 pandemic H1N1 viruses originated from the North American swine lineage.The 2009-2011 Zhejiang viruses are all grouped into the 2009 pandemic cluster, indicating that this subtype of viruses have undergone local transmission since they were imported in 2009.

Characteristics of informational spectrum
It has been demonstrated that the informational spectrums (IS) of HA1 of different H1N1 subtypes have different peak frequencies, which could characterize their receptor recognition preferences.Following the studies, we constructed the consensus informational spectrum (CIS) for HA1 of the 2009-2011 Zhejiang viruses, taking other H1N1 subtypes as control (Figure 2).The peaks of 2009 pandemic viruses are at the frequency F(0.086), while the peaks of swine and human seasonal viruses are at F(0.285) and F(0.058), respectively.The CIS of 2010-2011 Zhejiang viruses is quite similar to that of 2009, suggesting that HA1 did not have significant switches in receptor recognition during local transmission.

Effects of polymorphisms
According to the ISM concept (see methods), mutations in HA1 which alter the amplitude of the dominant peak at F(0.086) would potentially influence the binding affinity of the 2009 pandemic viruses.We inspected the IS for the 2009-2011 Zhejiang viruses individually, and compared their amplitudes at F(0.086) with that of the consensus HA1 sequence (Figure 3A).Both sequences that can highly increase and decrease the amplitude were found (Figure 3B).To further identify the underlying mutations that contribute most changes, we calculated the variation amount for each single amino acid mutation, taking the consensus HA1 sequence as the control (Figure 3B).The most common mutation in the 2009 Zhejiang viruses is S128P, which remarkably increases the amplitude at F(0.086) (7.7%).Two successive mutations that increase the amplitude, A73S (5.5%) and D222G (3.7%), may play important roles in the emergence of the lethal strain A/Zhejiang-Yiwu/11/2009.During 2010-2011, more muta-tions that can increase and decrease the amplitude occurred, and many of them were in one sequence such as H19 and L131.However, the peak frequency remains the same.

Polymorphic sites in 3D structure
We mapped the positions of the critical polymorphic sites to the 3D structure of HA1 of the 2009 pandemic virus (see methods).Most of the sites are located in disordered regions which are easily exposed on the protein surface (Figure 4).In general, the receptor binding site (RBS) of HA is at the membrane distal end, which is composed of three elements : 190-helix (residues 184-191), 220-loop (residues 218-225) and 130-loop (residues 131-135).In our result, four polymorphic sites, K219I, D222G, G225R and A134T are located in the RBS domain (Figure 4).It is of note that the mutation D222G is found in the lethal strain A/Zhejiang-Yiwu/11/2009. Two mutations, K219I and A134T, which have reverse effects on the amplitude at F(0.086), together occurred in the 2010-2011 strain H19 (Figure 3B).The most common polymorphic sites, S128P, are also located quite closely to the RBS domain.

DISCUSSION
The HA1 subunit is important for the receptor recognition and host infection of influenza viruses.The phylogenic tree of HA1 showed that the recent H1N1 viruses circulating in Zhejiang were all derived from the 2009 pandemic viruses in North American (Figure 1).Although the viruses experienced local transmission since imported, the CIS of HA1 showed that their biochemical properties and receptor preferences have not undergo significant switches (Figure 2).It may be because the population here is not great enough for the occurence of the preference receptors mutation.
Nonetheless, a lot of polymorphisms in HA1 during local transmission may still modify their receptor binding affinity (Figure 3).It was indicated that the mutation of the preference receptors may experience a long period accumulation of none sense mutations during the transmission of the influenza virus.For example, the most common polymorphism, S128P, whose position is close to the RBS domain (Figure 4), can greatly increase the dominant peak in IS.The mutation happened in the RBS domain, D222G, together with A73S, are critical for the emergence of the lethal strain A/Zhejiang-Yiwu/11/2009 (Figures 3 and 4).In the viruses collected in 2010-2011, more mutations occurred and some of them, such as K219I and A134T, are located in the RBS domain (Figures 3 and 4).These mutations could increase or decrease the amplitude of the dominant peak in IS, but the peak frequency has not been shifted.This result may explain why there were much less infected cases during 2010-2011.The sequences that can significantly increase and decrease the amplitude at F(0.086) (> 5%) are colored in red and green, respectively (Supplementary Table 1 for full results).Individual mutations that can significantly increase and decrease the amplitude (>3%) are marked on the branches in red and green, respect ively (Supplementary Table 2 for full results).

Figure 1 .
Figure 1.Maximum likelihood phylogenic tree for representative HA1 nucleotide sequences.The 2009 Zhejiang viruses are colored in red and 2010-2011 Zhejiang viruses are colored in green.The numbers near the nodes indicate bootstrap values.

Figure 2 .
Figure 2. CIS of HA1 sequences from Figure 1.Each H1N1 subtype group contains five HA1 sequences.The peak frequency of American swine, Eurasian swine and American avian viruses is F(0.285).The peak frequency of human seasonal viruses is F(0.058).And the peak frequency of 2009 pandemic viruses is F(0.086).

Figure 3 .
Figure 3. (A) IS of HA1 sequences that can increase or decrease the amplitude at F(0.285).(B) Maximum likelihood phylogenic tree for HA1 of Zhejiang viruses.The sequences that can significantly increase and decrease the amplitude at F(0.086) (> 5%) are colored in red and green, respectively (Supplementary Table1for full results).Individual mutations that can significantly increase and decrease the amplitude (>3%) are marked on the branches in red and green, respect ively (Supplementary Table2for full results).

Figure 4 .
Figure 4. Positions of critical mutations in the 3D structure of HA1.Four polymorphic sites, K219I, D222G, G225R and A134T are located in the receptor binding site (RBS) domain.

Table 1 .
Amplitude of HA1 sequences at the frequency F(0.086) in IS Supplementary Figure1.Contd.

Table 2 .
Amplitude at the frequency F(0.086) for each polymorphic site