Sequence heterogeneity of the envelope-like domain in the Egyptian cotton Gossypium barbadense

The current study aimed to investigate the evolution of env-like sequences in the Egyptian cotton Gossypium barbadense. DNA sequence determination and analysis of env-like sequences revealed that these sequences are heterogeneous in G. barbadense. The observed sequence diversity, however, seems to preserve the coding information. Phylogenetic analysis demonstrated that plant env-like sequences group together, suggesting their monophyletic origin. Gossypium env-like sequences are, however, more closely related to elements present in other plant species. Our result suggests that envlike sequences in cotton have evolved under functional constraint and likely to play a role in the life cycle of these elements.


INTRODUCTION
Retrotransposons have been found in the genomes of most eukaryotes (for review see Eickbush and Malik, 2002). Their integrated proviral forms consist of two long open reading repeats (LTRs) flanking an internal region which contains one to three open reading frames (ORFs) coding for structural and enzymatic functions for their replication cycle (Wilhelm and Wilhelm, 2001). Based on their reverse transcriptase (RT) domains, retrotransposons were divided into two major groups: the Ty1/Copia and the Ty3/Gypsy families (Xiong and Eickbush, 1990). They differ by the order of enzymatic domains in the pol gene. Moreover, the Ty3/gypsy family is more closely related to vertebrate retroviruses. The viral envelope (env) gene of the retroviruses distinguishes them from retrotransposons. Structural and functional data converged when it was shown that the gypsy element of D. melanogaster was able to function as a retrovirus (Kim et al., 1994, Song et al., 1994. Recently, the International Committee on Taxonomy of Viruses (ICTV) has proposed to term the Ty1/Copia and the Ty3/Gypsy families Pseudoviridae and Metaviridae, respectively (Boeke et al., 2000). The Metaviridae are further classified according to the presence of the env gene (genus Errantivirus) or its absence (genus Metavirus) (Hull, 2001).
Phylogenetic analyses based on reverse transcriptase amino acid sequences strongly suggest that the retroviral env gene transduced an env gene from a baculoviral source (Malik et al., 2000). In plants, a recent study has indicated that gypsy-like retrotransposon: Bagy-2 of barely defines a lineage of endogenous plant retroviruses (Vicient et al., 2001). In this regard, the fact that gypsylike elements and env-like genes have been previously described in Gossypium (Abdel Zaki, 2002, Zaki andAbdel Ghany, 2003), has promoted the initiative to search for Bagy-2 env-domain in the cotton genome. In addition, this study also aims to investigate the evolution of env-like sequences in the Egyptian cotton G. barbadense.

Plant materials and genomic DNA extraction
Total genomic DNA was extracted from the Gossypium barbadense cultivar S14, young seedlings, using Qiagen DNeasy kit (Qiagen, Germany).

Isolation of Bagy-2 env domains in Gossypium
Total DNA was subject to PCR with primers specific to the envdomain of Bagy-2 retrotransposon, (5`-TCAGTTGCAAGAAAGTCG CCG-3`) and (5`-CCTCTATCAGTGTTTCGGGGC-3`) (Vicient et al., 2001). DNA amplifications were carried in an ABI GeneAmp PCR system 9700 cycler with a denaturing step at 95°C for 5 min and the step cycle program set for 45 cycles (with a cycle consisting of denaturing 94°C for 30s, annealing at 55°C for 30s and extension step at 72°C for 30s), followed by a final extension step at 72°C for 10 min.

Cloning and sequencing of PCR-amplified fragments
Expected PCR-amplified fragments were excised from the agarose gel and purified using Qiagen Gel Extraction kit (Qiagen, Germany). Purified DNA fragments were then cloned in pCR 4-TOPO vector with TOPO TA cloning kit (Invitrogen, USA) in the competent E. coli strain TOPO 10. Plasmid DNA was isolated using QIA Spin miniprep kit (Qiagen, Germany). Plasmid DNA was sequenced in both directions using BigDye Sequencing Kit and ABI 377 DNA sequencer (ABI, USA).

RESULTS AND DISCUSSION
PCR amplification with primers specific for the env domain of the barely Bagy-2 retrotransposon (Vicient et al., 2001) was employed to search for the env-like domains in G. barbadense. Expected amplicons were cloned in pCR 4-TOPO vector. Two G. barbadense recombinant clones were randomly selected and further studied by DNA sequence analysis. These clones were designated GB and GB1, respectively. GB and GB1 sequences have been deposited in the NCBI nucleotide sequence database, GenBank; with the accession numbers AY257162 and AY257163 respectively. Blast search confirmed the env nature of the cloned products. Furthermore, GB and GB1 derived amino acid sequences are compared to the Bagy-2 env domain (Vicient et al., 2001) in Figure 1, with amino acid similarities of 67% and 80%, respectively. The high amino acid similarities observed supports the interpretation that GB and GB1 represent portions of the env gene of Bagy-2 retrotransposon.
1 QETRRDKQGLRLLPLVREALLELHMSASRLRWRSLLFIGTRLFLPLGIIVLFLVNGPAIWFQ 2 QEARRDTQGLRLLPMVREALLELHMSASRLRWRILLFIGTRSFLPLGLIVLFDVSGPAIWFQ 3 QEARRDKQGLRLLPMVREALLQLHMSVSRLRWRILLFIGTRSSLPPWLILLFLIRPPTIWFP ** *** ******************* ****** ******* ** * ** * *** Comparative nucleotide and amino acid sequences analysis of GB and GB1 using ClustalW program revealed identities of 75% and 72%, respectively ( Figure  2). The level of nucleotide and amino acid identities observed for GB and GB1 is comparable to that reported for the Bagy-2 element, where 86% identity between the genomic copies was observed (Vicient et al., 2001). Despite the fact that multiple gaps were introduced for GB and GB1 at the nucleotide sequence analysis to compensate for the sequence length polymorphism and deletions, yet it seems to preserve the coding information evident to the overall high amino acids homology. A similar pattern of length variation, deletions and coding information conservation was recently reported in the SIRE-1 elements of soybean (Laten et al., 2003).
We have previously identified env-like genes in Gossypium using specific oligonucleotides for the Drosophila gypsy env-gene (Abdel Ghany and Zaki, 2002). Comparative amino acid sequences of env-like sequences in Gossypium were performed (Figure 3). Moreover, nucleotide pairwise comparisons revealed a diversity range of 7 to 81% among Gossypium env-like elements. GM5 and GM6 are clearly closely related to each with nucleotide and amino acid sequences identities of 81% and 79%, respectively.

Abdel Ghany and Zaki 343
Relationships among Gossypium env-like genes and other organisms were assessed by constructing a neighbor-joining tree (Saitou and Nei, 1987), with accession numbers on the tree, and the Drosophila gypsy as the outgroup (Figure 4). The phylogenetic analysis revealed high level of amino acid sequences diversity as evident by the branch lengths which are proportional to the degree of divergence. In addition, plant env-like sequences group together, suggesting their monophyletic origin. Gossypium env-like sequences are, however, more closely related to elements present in other plant species. GM5 has the strongest affinity with soybean element SIRE-1. On the other hand, GB and GB1 closest homologue is that of barley Bagy-2 element.
In this study, we investigated the evolution of env-like sequences in the Egyptian cotton. Our analysis revealed that these sequences are comprised of a very heterogeneous collection of env-like sequences. The observed sequence diversity, however, seems to maintain the ORFs and thus preserve coding information. This suggests that the env-like sequences in cotton have evolved under functional constraint and likely to play a role in the life cycle of these elements. This suggestion is supported by the presence of conserved ORFs coding for env-like sequences that can be identified across diverse plant taxa (Zaki, 2003). It is noteworthy that such functional constraint contrasts with what has been found in mammalian retroviral env genes, where adaptive selection results in high levels of variation to avoid the immune response (Coffin et al., 1997).
Phylogenetic analysis revealed that Gossypium envlikes sequences are more closely related to elements present in other plant species. We previously reported that gypsy group retrotransposons is a standard component of the Gossypium genome (Zaki and Abdel Ghany, 2003). The detection of env-like sequences in the cultivated G. barbadense suggests that these sequences are probably very ancient sequences maintained in the genome because they are located in chromosomal locations where the recombination rate is low and selection therefore less efficient. It should be noted that the chromosomal locations for gypsy group elements are  (Saito and Nei, 1987) was employed to construct the tree, with branch lengths propotional to the degree of divergence between the amino acid sequences. The numbers on the branches represent bootstrap value of 1,000 replicates. Names refer to the accession number of the nucleotide sequences that encode the corresponding envelope domain. yet to be determined in Gossypium. Alternatively, these sequences could be the result of a recent transposition burst event of gypsy group elements in G. barbadense or acquisition of new elements by horizontal transfer. The distribution pattern of gypsy group retrotransposons within the genus Gossypium is similar, with G. barbadense possessing additional hybridisation bands (Zaki and Abdel Ghany, 2003), supports the recent transposition suggestion. Similarly, a massive amplification process was observed in maize since its divergence of sorghum from a common ancestor (SanMiguel and Bennetzen, 1998). Currently, it is difficult to envisage the horizontal transfer of gypsy group elements in G. barbadense, especially that gypsy group retrotransposons have been detected in the genus Gossypium. Further experimental data such as copy number determination, chromosomal distribution, and sequencing of large contiguous regions of the Gossypium genome will significantly add up fundamental knowledge about the role of gypsy group retrotransposons in shaping and evolution of the Gossypium genome.
The mammalian retroviral env gene is a highly diverging sequence in relation to the highly diverse sequences of the receptor molecules with which env proteins interact for virus-cell interaction and entry (Coffin et al., 1997). Nevertheless, the elucidation of various retroviral env complexes show a highly conserved structural conservation possibly reflecting a common mechanism for mammalian retroviruses for triggering the fusion and entry process (Eickbush and Malik, 2002). In this regard, the functional role for plant retroviruses for viral propagation in the plant host is still unknown, and cell walls rule out membrane fusion as a suitable invasive strategy (Zaki, 2003). The identification of a replicationcompetent plant retrovirus is imperative to determine its functional significance. In addition, to test the hypothesis that plant retroviruses are infectious. Finally, to elucidate the unique biology of plants that has helped to restrict the pathogenicity of retroviruses within the animal kingdom.

ACKNOWLEDGEMENT
This work was supported by a grant from the US-Egypt Science and Technology Foundation to E.A. Zaki.