Genome codon bias analysis of dengue virus type 1

Dengue viruses (DENV) are the most common mosquito-borne RNA virus with high variation and adaptation in tropical and subtropical regions. Exploration of codon usage bias of DENV can be significant to understand their genetic variation and adaptation. In the study, the codon usage pattern of dengue virus type 1 (DENV-1) was analyzed by using codonW, CUSP and SPSS. The extent of codon preference of DENV-1 is weak with a 50.57 mean value of ENC, indicating that the DENV-1 genome has low codon bias. Of the 18 optimal codons of DENV-1, 13 end in A/U, with A ending in the majority. The result shows that DENV-1 prefers A-ended codons, and their codon bias is influenced more by natural selection than by mutations selection, as revealed by ENC-plot and neutrality analysis. Furthermore, comparison of codon usage bias between DENV -1 and host showed that codon usage pattern of DENV-1 is more similar to Home sapiens instead of Aedes aegypti or A. albopictus . Our findings contributed to understanding of the evolution of the DENV-1.


INTRODUCTION
Dengue virus (DENV) is a single-stranded positive-sense RNA virus belonging to the Flavivirus genus. DENV is divided into four serotypes (DENV-1, 2, 3, and 4), causing severe tropical and subtropical diseases such as dengue fever (DF), dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS) (Halstead, 2007). Dengue fever is the most important viral-borne disease in clinical practice, with 96 million cases of apparent infection each year among nearly four billion people at risk in 128 countries (Bhatt et al., 2013). Since 1978, the first outbreak of dengue fever in China, it has occurred every few years and has become a serious public health threat in Southern China (Sun et al., 2014;Hu et al., 2017). But the factors underlying the current spread of the virus and variation and adaptation remain largely unknown (Bhatt et al., 2013).
Amino acids are coded by more than one synonymous codon; the preference of specific codons to synonymous codons is not equal, which leads to codon usage bias (Gustafsson et al., 2004). Variations in codon usage bias lead to a shift in the balance between mutation and natural selection (Morton, 2003). In addition, mutation pressure, natural selection, replication, and selective transcription can influence codon use patterns (Butt et (Gu et al., 2004). And analysis of codon usage bias can reveal important information about the molecular evolution, regulation of gene expression, and the design of vaccine (Butt et al., 2014). In the present study, we analyzed the codon usage bias of DENV-1 and their influencing factors. We hope the comprehensive analysis of codon usage bias of DNEV-1 will provide help for understanding the evolution of DENV-1, provide some data to help research the vaccine and monitoring of the DENV-1 in the future.

Sequence
The complete genome of DENV-1 (ID:NC-001477.1) was retrieved from the GenBank database at the National Centre for Biotechnology Information (NCBI).
The complete readability reading frame (10178bp), and 14 gene sequences were found after the anchor or precursor sequences were removed. The names are Capsid protein, Membrane glycoprotein precursor, Membrane glycoprotein, Envelope protein, NS1, NS2A, NS2B, NS3, NS4A, NS4B, NS5. Then they are used as samples for codon preference analysis.

Relative synonymous codon usage (RSCU) analysis
The RSCU value is the ratio of observed frequency to the predicted frequency in the synonymous codon family of a particular amino acid (Sharp and Li, 1987). To find the optimal codon for an amino acid, we used CodonW to calculate the sequence RSCU and define the optimal codon according to RSCU.

General analysis of genomic codon preference and its base composition
The frequencies of occurrence of nucleotides G+C at the first, second, and third base of codon (GC1, GC2, GC3) and ENC (effective number of codons) of each gene in the genome of DENV-1 were calculated by codonW and CUSP of EMBOSS, and the relationship between each parameter was analyzed by SPSS 22.0 (https://www.ibm.com/analytics/spss-statistics-software).

ENC-plot analysis
The ENC-Plot (ENC vs GC3s) is widely used to determine the effect of G+C compositional constraints on codon usage bias (Wright, 1990). CodonW was used to calculate the values of GC3s and ENC, and a standard curve (ENC=2+GC3s+29/ (GC3s 2 + (1-GC3s) 2 ) was added to the graph, indicating that the predicted value of the gene was determined only by the base composition. When the corresponding points fall near the expected curve, the mutation is the main force influencing the use of codon. And the points below the standard curve are more susceptible to natural selection (Morton, 2003).

Neutrality-plot analysis
By comparing GC3 and GC12 (the mean value of GC1 and GC2), a neutrality-plot was drawn to illustrate the role of mutation-selection balance in codon usage disparity. An effect of mutation pressure on the biased usage of codons is indicated by the slope of a regression line of GC12 vs GC3. If there is a significant correlation between the two, that is, the slope is close to 1, there is no significant difference in the cmomposition of the first two bases and the third base of the codon, that is, the mutation is the main factor affecting the use of the codon; on the contrary, it shows that the composition of the first two digits and the third digit of the codon are different, indicating that natural selection is the main factor affecting the use of codon (Sueoka, 1988;Zhao et al., 2016).

Comparison analysis
The RSCU of DENV-1 was compared with the RSCU of its host, including human (Homo sapiens) and mosquitoes (A. aegypti and A. albopictus). The codon usage data of DENV-1's hosts were retrieved from the codon usage database (http://www.kazusa.or.jp/codon). In our comparison, if the RSCU value of DENV-1 and that of the same codon of the host are both <0.6, >1.6, or between 0.6 and 1.6, then it is judged that the codon use pattern of both is similar (Wong et al., 2010;Ma et al., 2015).

RSCU of each gene of DENV-1
The RSCU value of DENV-1 was calculated by codonW. We can find that there are 23 codons in which RSCU>1, namely GCC, GCA, AGA, AGG, AAC, GAC, UGU, CAA, GAA, GGA, CAC, AUA, UUG, CUA, CUG, AAA, UUC, CCA, UCC, UCA, ACA, UAU, GUG (23 in total); those that end in A/U have 13 (56. 5%), and those that end with U only have 2; it explains the low frequency of codon that appears at the end of U. We plotted the optimal codon for each amino acid in *, and you can find that most of them end in A (Table 1 and Figure 1).

General analysis of genomic codon preference and its base composition
To determine whether codon bias exists in the genome of DENV-1, the effective codon usage (ENC) was measured. ENC is a simple and relatively direct method to estimate codon usage bias (Novembre, 2002). The ENC value of the genome gene of DENV-1 is 43. 33~57.21; the average is 50.57 (Table 2). It can be considered that the codon preference of DENV-1 is weak, that is, the use of each codon is more uniform. The difference between GC 1 , GC 2 , and GC 3 of codon of DENV-1 is small. The GC content of the genome is slightly lower than the AU content (Table 2). And according to the RSCU worthy of the conclusion is basically consistent.

ENC-plot
The ENC values of DENV-1 genomic genes were  (Figure 2). From the figure, we can find that the ENC values of genomic genes of DENV-1 are distributed between 43 and 51, indicating that the preference of each gene to codon is not significantly different. When the ENC values of these genes are lower than the standard curve, it indicates that natural selection plays an important role in driving codon usage bias (Fuglsang, 2008). The ENC values of each  gene were basically below the curve, indicating that the genomic genes of DENV-1 were limited by mutations and more affected by natural selection.

Neutrality-plot
A neutrality-plot (GC 12 -GC 3 ) was used to estimate the   coefficient of -0. 13 and R 2 =0. 015, indicating that GC 12 and GC 3 are not correlated. This is the same with the correlation analysis of the preference parameters of the virus genome (Table 3). This indicates that the influence of natural selection on codon preference of DENV-1 is greater than that of mutation (Kumar et al., 2016).

Comparison analysis
To determine whether the codon usage pattern of DENV-1 is influenced by its hosts, the codon usage pattern of DENV-1 was compared with its natural hosts, including Home sapiens, A. aegypti, and A. albopictus. We found that 46 of 59 synonymous codons between DENV-1 and humans were considered similar, while only 38 or 28 were considered similar between DENV-1 and A. aegypti or A. albopictus (Table 4). As we can see, the codon usage pattern of DENV-1 is more similar to Home sapiens.

DISCUSSION
In the present study, we demonstrated that DENV-1 had a weak codon bias with an average ENC value of 54. 58. This indicates that the overall degree of codon usage bias in DENV-1 is low and the bias between genes is not significant, consistent with some previous reports (Jenkins and Holmes, 2003;Yohan et al., 2018).
Analysis of the ENC-GC3s plots indicated that the genomic genes of DENV-1 were more affected by natural selection, which is consistent with the codon preference of Flaviviridae viruses (Yao et al., 2019). Results of the neutrality analysis validated the results derived from ENC-GC3 plots and further suggested that natural selection pressures had a greater influence on the spread and mutation of DENV-1. Although we did not find any usage correlation between the first, second, and third positions of the codon of DENV-1, some relevant studies indicated that there was a correlation among all serotypes of DENV, and they put forward a viewpoint that all the codon sites are related to the geographical environment of the strain (Lara-Ramírez et al., 2014); their study found differences in codon expression between DENV-1 strain from America and DENV-1 from Asia. The content of A/U is higher than G/C; the RSCU analysis indicates that DENV-1 prefers A/U-ended codons, especially A-ended codons (Roy et al., 2019). This is similar to studies on codon preference of Flaviviridae viruses (Yao et al., 2019). And It is also similar to studies on codon preference of other RNA viruses such as Ebola virus (Cristina et al., 2015;Kustin and Stern, 2020).
The results of comparison analysis suggest that the codon usage pattern of DENV-1 is more similar to that of Home sapiens, instead of A. aegyptior and A. albopictus. The DENVs are known to be transmitted to humans by mosquitoes; the difference in codon usage bias between DENV-1 and its hosts might be caused by the different defense mechanisms of different hosts against DENV-1 infections (Sexton and Ebel, 2019). In addition, some relevant studies indicate that there were little correlation between mosquito vector index and human epidemic during the transmission of DENV (Bowman et al., 2014;Chadee, 2009). From this study, we can conclude that the codon usage pattern of DENV has more similarities with Home sapiens.
And as human genes are more biased to AT-ending codons (Alvarez-Valin et al., 2002), and DENV-1 have a similar pattern of codon usage bias, this may be related to the mechanism of human infection with DENV. The RSCU of all the codons in the genome of DENV-1 was used as the standard for screening, and we finally found the optimal codons of each amino acid. The discovery of optimal codon provides a way for viral expression of proteins, the development of viral vaccines for patients infected with DENV, and a theoretical basis for their selection of hosts (Kames et al., 2020). Due to the greater influence of natural selection on the preference of codon of DENV-1, there may be different codon usage bias of DENV in different regions. In the future, we should conduct specific analyses according to different regions to provide help for limiting the spread and development of DENV in different regions, such as America, Asia and Africa.

Conclusion
In summary, the combination of the ENC-plot and  neutrality analysis proves that natural selection has a greater influence on the condon usage bias of DENV-1. We can consider that the geographic origin of dengue viruses has a strong influence on the formation of codon usage patterns (Lara-Ramírez et al., 2014). In addition, the preference of A-ended codon of DENV-1 may also be helpful for future research on DENV.