Identification of recombinants of hepatitis B virus genotypes C from Hong Kong China

Hepatitis B virus (HBV) infections severely threaten the health of the human being. Frequent recombination of HBV within or between genotypes was reported to favor the viral evolution and adaption and the recombination may cause more severe clinical symptoms. In the study, we collected genotype C HBV from five Asian countries and detected their possible recombination events by bioinformatic analysis. There are two main subtypes, C1 and C2, within the C genotypes among the collected data in the study. Subtype C1 is most prevalent in Cambodia, Bangladesh, and China, while C2 is prevalent in Japan, Indonesia, and a small part of China. Three recombination events were detected and verified from C1 genotype HBV from Hong Kong China as demonstrated by recombinant (KJ410515), ranging from 2381 to 1861 nt. Recombinant events were detected and verified by recombination analysis in the study. It is important to filter possible recombinants when using the online-genbank data to do phylogenetic analysis.


INTRODUCTION
Hepatitis B virus (HBV) infection severely threatens the health of the human being. One-third of the world's population is infected with HBV, including 350 million of them suffering from chronic HBV infection (Lee, 1997). Thirty percent of the infected person even died from HBV-related liver disease. HBV belongs to the genus Orthohepadnavirus; the HBV genome is an incomplete double-stranded circular DNA molecule with 3.2 kp that encodes four overlapping open reading frames (orfs), including preC/C, preS1/preS2/S, and X genes (Westover and Hughes, 2007;Zhang et al., 2010). Due to the lack of proofreading activity of viral polymerase during the reverse transcription step of genome replication, HBV genetic variability is high and leads to differences in nucleotide sequence (Arauz-Ruiz et al., 2002;Norder et al., 1994;Okamoto et al., 1988;Stuyver et al., 2000). Therefore, eight genotypes (A-H) have been established based on the 7.5% inter-ethnic differences in the entire nucleotide sequence (Arauz-Ruiz et al., 2002;Norder et al., 1994;Okamoto et al., 1988;Stuyver et al., 2000). Later, two other genotypes (I and J) were initially proposed for the HBV strains found in Vietnam and Laos (Tran et al., 2008;Olinger et al., 2008).
Gene recombination is an important mechanism of virus evolution and challenges the designation of vaccines and antiviral treatment strategies (Yang et al., 2006). The characteristics of life-long persistent infection and the prevalence of different genotypes in the same area provide a higher probability of co-infection of different genotypes in a host, thus having a high risk of viral recombination. In Northwestern China, recombinants between HBV genotypes were frequently reported in the past decade (Wang et al., 2005;Zhou et al., 2011). Subgroup Ba of genotype B, which is recombinant with genotype C, is found primarily in Southeast Asian countries and seems to be more pernicious than subgroup Bj (Huy et al., 2004). In the past, the strain of HBV genotype F3/A1 was found in the Afro-Colombian population (Alvarado-Mora et al., 2012). Evidence was supplied to prove the recombination of genotype C/D in Western China (Zhou et al., 2012) and even detected new subtype D9, which is recombined by genotypes C and D (Ghosh et al., 2013). However, there is no much research to discuss the recombination between different subtypes in the same genotype in Asia, so the goal is to research the recombination between HBV C genotype in five countries of Asia (China, Cambodia, Indonesia, Japan, and Bangladesh).
In the study, we collected 131 complete genomes of HBV genotype C in five Asian countries and constructed their phylogenetic tree. Then recombination events were detected and confirmed. Here, we reported recombination events detected in our dataset.

Sequence download
All of HBV complete genomes available before 6 November 2020 were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/) Nucleotide Database. "HBV genotype C complete genome" and "HBV C" and "China" were used as keywords research terms, and the search results were filtered by sequence length from 2800 to 3500 nt and genotype C. 131 complete genome sequence were retrieved from the research records. After those sequences were aligned and analyzed, we had only left 12 full-sequences suspected as the recombinant genome. All complete genome can be obtained by searching for the accession number on GenBank.

Sequence alignment and recombinants identification
The original datasets were aligned by MUSCLE listed in the Molecular Evolutionary Genetics Analysis (MEGA) version X software, and MEGA conducted phylogenetic analyses (Radjef et al., 2004). To find potential recombination signals in those isolates, recombination analysis was performed using the RDP4 program (Wang et al., 2015) Using seven algorithms, including RDP, GENECONV, MaxChi, Chimaera, 3eq, SiSican, and bootscan, 12 recombination events were detected in all 131 isolates (Table 1). In all 12 potential recombination isolates, three recombination sequences had remarkable high certainty with a P-value of at least three algorithms <1 × 10 −6 , and two potential recombination strains have a high degree of certainty on account of recombinant score > 0.6 (Table 1) (Martin et al., 2010). Besides, one recombination sequences have a fair probability since the recombinant score is between 0.4 and 0.6 (Wang et al., 2005). The similarity plot and bootscanning analyses were performed by Simplot software version 3.5.1. At last, only 3 sequences were proved actually recombined. The neighbor-joining of MEGAX established a Phylogenetic tree to manifest the relationship between the related fragment of recombinant isolate and theirs major or minor parents (Pérez et al., 2014).

RESULTS
A total of 187 original full-length sequences of five countries (China, Cambodia, Indonesia, Japan, and Bangladesh) in Asia were retrieved, but 131 valid sequences were finally included after removing the nosubtype classification. After collating the data, we found that C1 and C2 subtypes are more common than other subtypes. Among the C genotypes, subtype C1 is prevalent in Cambodia, and most of China and Japan and a small part of China are prevalent subtype C2 in the five countries.
MEGA X ran preliminary multiple sequence alignments and the RDP4 program performed our recombinant analysis of full-sequences and recombination analysis. Twelve potential recombinations were detected in all 131 isolates. In all 12 potential recombination sequences, three recombination sequences had a prominent high degree of certainty with a P-value of at least three algorithms <1 × 10 −6 and two-three potential recombination sequences with a high degree of certainty on account of recombinant score > 0.6 (Table 1). Also, because the recombination score is between 0.4 and 0.6, a recombination sequence has a considerable probability.
Potential recombinant isolates KJ410515 were analyzed using the Simplot 3.5.1 program with a 200-nt window moving in 20-nt steps. In the similarity plot analysis of recombinations, the horizontal axis represents the nucleotide position of the midpoint of the window from the 5' end of the query sequence (nt 2381 in HBV-KJ410515) (Figure 1). The vertical axis represents the similarity between the analyzed sequence and the reference sequence. KJ410515 displayed the highest similarity with KJ410505 isolate more than 90% from beginning to about 2381 nucleotide position. However, at 2381-2861 nucleotide positions, the sequence similarities were comparatively higher with the FJ562308 strain isolated. The analysis for KJ410515 demonstrated that recombination sites were probably located at 2381 to 2861 nucleotide positions, which was most closely related to FJ562308. Bootscanning analysis confirmed the recombination of strain KJ410515. The recombinant gene segment of Simplot is in keeping with the consequence of RDP4. Similarly, the recombinant regions of KJ410508 and KJ410512 were detected and  (Table 1).
To further analyze the recombination of KJ410515, a phylogenetic incongruence analysis was performed using major parent and minor parent by the neighbor-joining of MEGAX for recombinant sequences and performed bootstrapping with 1000 repeats. Genome sequences were divided into two alignments, and an independent phylogenetic tree was constructed for each data set ( Figure 2). The phylogenetic trees constructed with the corresponding fragments of KJ410505 and FJ562308 confirmed the chimeric pattern found in the genome of the KJ410515 strain. Most of the recombinant strain fragments were derived from a KJ410505 strain, and the middle part of the fragments was from an FJ562308 strain ( Figure  2). KJ410515 Phylogenetic analysis of these sequences confirmed their relatedness, as they all possess a high bootstrap value (BV). As shown in Figure 2A, KJ410515 and its major parent KJ410505 clustered together at 0-2381 and 2861-3220, and in the nucleotide fragments of 2381-2861, KJ410515 and FJ562308 were clustered together (Figure 2b). It shows a close relationship of KJ410515 with KJ410505 and FJ562308, which indicated the reasonability of recombination. From Table 2, basic information of KJ410515 and its parents can be obtained. The collection date and countries increase the reasonability of the recombination of KJ410515.

DISCUSSION
HBV infection is endemic in many Asian countries, especially in China. HBV is endowed with a variety of genetic diversity. The long-term prevalence of HBV infection favors the frequent occurrence of genetic mutation and recombination within or between genotypes or subgenotypes. Rich HBV data available online encourages us to investigate the possible recombination event, originally ignored by the submitter. In the study, we collected 131 full-length HBV genotype C from five Asian countries and found three underdescribed recombinations (KJ410515, KJ410508, and KJ410512) in isolates from Hong Kong, China using RDP4 recombination detection software. The recombination regions are spanning or within the Core or P gene region, a common region for reported variation and recombination (Kay and Zoulim, 2007;Araujo, 2015). Different locations on a phylogenetic tree of nonrecombinant and recombinant regions confirmed recombination events, as shown by the case of KJ410515. Those recombinations may have functional roles for the recombination region of one of the recombinant strains KJ410512 which is highly similar to the Ba subtype of type B/C recombination of the preC region and this type of recombinant strain is widely popular until now, and the Ba subtype HBV can cause more serious clinical diseases (Sugauchi et al., 2002). The significance of HBV recombination deserves investigation.
The recombination of KJ410515 is located in the P gene region, and its function is to encode polymerase. The HBV Pol protein is the focus of basic research and translational research. The Viral polymerase is the target of all current HBV drug therapies (except interferon), and it is the only area that is usually sequenced during treatment escape (Rhee et al., 2010;Buhlig et al., 2020).
Genetic variation resulting from recombination can allow immune escape and treatment resistance. An intergenotypic B/I recombinant and B/C recombinants, the new subtype C17 (Feng et al., 2020), this type of recombination between different genotypes are prevalent, but there are few studies on recombination between different subtypes.
Genetic recombination is an important mechanism for virus evolution, and it can be found in some studies, and it is common for viruses to undergo genetic recombination. In the past years, C/D inter-genotype recombinants of type I (breakpoints at nt 50 and 1450) and type II (breakpoints at nt 10 and 799) have been reported from the Qinghai-Tibet Plateau (Wang et al., 2005). Moreover, in 2011, the prevalence of recombinants between C/D genotypes in chronic hepatitis B patients in Northwest China was declared to be high (Zhou et al., 2011). B/C intergenotype recombination has also been reported in Thailand, Vietnam, Indonesia, and South China (Luo et al., 2004). The emergence of recombinants may change the epidemiology of the virus. A few HBV/E strains also were identified as minority genotypes further east in Mozambique and Madagascar or further north in Tunisia (Kramvis et al., 2005). However, phylogenetic analysis showed that genotype E was dominant in Sudan (51%), followed by genotype D (41.5%). Genetic recombination probably produces new subtypes. The HBV/D-E recombinant of a new HBV/D subtype that has been identified and spread in Nigeria is considered a new subtype D8 (Abdou Chekaraou et al., 2010). In short, genetic recombination as an important way of virus evolution may change its clinical characteristics and epidemiology, and the identification of genetic recombination is only the first step to detect the emergence and prevalence of new subtypes. However, only the emergence of dangerous recombinants is found as early as possible to control the virus infection further. Therefore, the identification of recombination is of great A B Figure 2. Phylogenetic trees of HBV genotype C isolates. A, a Phylogenetic tree based on the fragment of 0-2381 and 2861-3220 (major parent) in isolates. B Phylogenetic tree based on the fragment of 2381-2862 (minor parent) in isolates. Trees were constructed with a MEGA software package (v5.0) using the neighbor-joining method and maximum likelihood with 1000 bootstrap. The guide value is indicated at the node. Three different patterns •, ▲ and ■ before the title, represent recombinant isolate, major parent and minor parent, respectively. significance.
In the present dataset, 12 potential recombinant sequences were detected within 131 isolates, indicating a high possibility of recombination with the submitted sequences. Thus, it is of significance to screen recombination when using the NCBI data to inquire about the evolution of the HBV. One limitation of our research is that the reliability of recombination could be improved by more detection methods, and the origin and evolution of specific recombination need to be investigated.