Phylogenetic and molecular evolutionary analyses of gypsy group retrotransposon families in the Egyptian cotton Gossypium barbadense

Gypsy group retrotransposons in the Egyptian cotton, Gossypium barbadense , was examined by phylogenetic and molecular evolutionary analyses. DNA sequences of gypsy group retrotransposons in two G. barbadense cultivars revealed that these sequences are heterogeneous and represent two distinct families. Sequence variation between these families seems to preserve coding information of the reverse transcriptase domain. The high ratio of synonymous to nonsynonymous changes indicates that the reverse transcriptase domain of these families is evolving under purifying selection. Our phylogenetic analysis revealed that the closest relatives of cotton retroelements are found in other plants gypsy group retrotransposons. Cotton retroelements-encoded transcripts were detected in their related respective young seedlings using RNA slot-blot hybridization, suggesting their transcriptional activity. The wide distribution of gypsy group retrotransposons and the detection of their encoded transcripts illustrate their active role in the Gossypium genome.


INTRODUCTION
Gossypium L. contains 50 species whose phylogenetic relationships have been explored using multiple molecular data sets (reviewed in Wendel and Cronn, 2003). Data indicate that shortly after its origin, Gossypium experienced rapid divergence leading to modern monophyletic lineages, designated A through G, and K genomes, that vary in chromosome size and infertility (Wendel, 1989). The five natural polyploids in the genus are believed to have generated from a single polypliodization event 1.5 million years ago (MYA) (Senchina et al., 2003). They all represent the AD genome tetraploids combining an A-genome donated by the maternal diploid parent at the time of polyploidy formation and a D-genome from the pollen parent .
The genus Gossypium is a facile system for investigating the genomic organization and evolution of repetitive DNA sequences that become newly united in a common nucleus (Zhao et al., 1995). Cloning and characterization of the major repetitive DNA in the tetraploid (AD) cotton revealed that most dispersed repeat families are largely restricted to the A-genome diploid ancestor and are absent from the D-genome (Zhao et al., 1998). Some families of these dispersed repeats are, however, found at low levels on chromosomes derived from the D-genome ancestor, suggesting that the repeats have spread since the formation of polyploid cotton (Zhao et al., 1998). A likely mechanism for spread of the dispersed repeats appears to be transposition (Zhao et al., 1998). This suggestion was supported by the fact that four of the dispersed repeats show sequence similarity to retroelements from other taxa (Zhao et al., 1998). It is well known that retroelements constitute an important fraction of the DNA content of plant genomes (Kumar and Bennetzen, 1999). Their abundance, dispersion across the nuclear genome, and their insertional activity indicate that they play a major role in plant genome structure and evolution (Bennetzen, 2000).
As part of a long-term program to understand the organization and evolution of the cotton genome, we describe the phylogenetic and molecular evolutionary analyses of gypsy group retrotransposons in the Egyptian cotton Gossypium barbadense. The current report complements our recent analysis of the characterization and distribution of gypsy and copia group retrotransposons in the Egyptian cotton (Abdel Ghany and Zaki, 2002, 2003.

Materials and methods
Plant materials, genomic DNA extraction and isolation of gypsy group retrotransposons in G. barbadense.
Gypsy group retrotransposons (Table 1) were isolated from G. barbadense as previously described .

RNA slot-blot hybridization
PCR amplified probes were labelled with [α-32 P] dCTP using the random primer method (Feinberg and Vogelstein, 1983), and used for RNA slot-blot hybridization as described (Sambrook et al., 1989). Filters were hybridized overnight at 42°C in a solution containing (50% formamide, 5 x SSC, 10 x Denhardt's, and 0.5% SDS). Hybridization wash was carried out at 50°C in 0.1 x SCC containing 0.5% SDS for 1 h.

RESULTS AND DISCUSSION
PCR primers specific for conserved domains of the reverse transcriptase (RT) genes of gypsy group retrotransposons amplified their corresponding gene in two G. barbadense cultivars: Giza 45 and 84 . These fragments were designated G45 and G84 respectively (Table 1). Using G45 and G84 as hybridization probes, it was revealed that gypsy group retrotransposons can be detected in wild type species of Gossypium, suggesting that gypsy group retrotransposons is a standard component of the Gossypium genome . Comparative amino acid sequences analysis of G45 and G84 using ClustalW program revealed homology of 51% (Figure 1), indicating sequence heterogeneity. The observed sequence heterogeneity suggests that G45 and G84 represent two distinct gypsy group retrotransposon families in G. barbadense. The criterion for assignment to a family was >90% amino acid identity in pairwise comparisons ( Figure 1). This is consistent with previous studies that used a similar criterion in defining retrotransposon families (Konieczny et al., 1991, Flavell et al., 1992, Vanderwiel et al., 1993.  Numbers of synonymous and nonsynonymous substitutions and the standard errors (in parentheses) were respectively estimated according to Nei and Gojobori (1986).
We sought to study the evolutionary relationships of the identified retroelements in G. barbadense. G45 and G84 were compared and aligned with other RT genes of plant gypsy group retrotransposons (accession numbers are shown on the tree) and Ty3 as the outgroup (Figure 2). The neighbour-joining phylogram provided strong bootstrap support for a monophyletic origin of plant gypsy group retrotransposons, yet showed high diversity within all species. G45 has the strongest affinity with Lotus japonicus genomic DNA, chromosome 4, and Ananas comosus gypsy group retrotransposon (Sato et al., 2001, Thomson et al., 1998 with 75% and 74% amino acids identity respectively. On the other hand, G84 has the strongest affinity with L. japonicus genomic DNA, chromosome 5, Hordeum vulgare cereba and Oryza sativa osr31/rire7 gypsy group retrotransposons (Sato et al., 2001, Hudakova et al., 2001, McCarthy et al., 2002 with 75% amino acids identity. To determine whether G45 and G84 are transcribed in G. barbadense, we performed RNA slot-blot hybridization using 32 P-labeled PCR amplified probes (Figure 3). Using two different total RNA concentrations, G45 and G84encoded transcripts were detected, as evident by the detection of similar hybridization intensities in Giza 45 and 84 cultivars respectively. To normalize for RNA loading and eliminate that differences in expression were due to differences in G45 and G84 RT sequence diversity, genomic DNA from Giza 45 and 84 was subjected to DNA slot hybridization using the same 32 Plabeled probes. A similar degree of hybridization was detected ( Figure 3B), indicating that sequence diversity did not affect the results of RNA slot-blot hybridization.
Gypsy group retrotransposons are present within all higher plant divisions as large highly heterogeneous populations (Kumar and Bennetzen, 1999). Phylogenetic analyses have shown that these populations are resolved into diverse families, which span species boundaries, such that the closest homologue of one family is often from a different species (Eickbush and Malik, 2002). DNA sequences of gypsy group retrotransposons in two G. barbadense cultivars revealed that these sequences are heterogeneous and represent two distinct families. Sequence variation between these families seems to preserve coding information of the RT domain. The high ratio of synonymous to nonsynonymous changes indicates that the RT domain of these families is evolving under purifying selection. Moreover, RT sequences in cotton have evolved under functional constraints and likely to play a role in the life cycle of these elements. Our results contribute and exemplify the increasingly reports of strong selection for RT sequences (Konieczny et al., 1991, Flavell et al., 1992, Voytas et al., 1992, Matsuoka and Tsunewaki, 1999, Friesen et al., 2001, Stuart-Rogers and Flavell, 2001. These examples, taken from across the phylogenetic spectrum, illustrate that sequence conservation is a general property of retrotransposons. Gypsy group plant retrotransposons with envelope (env)-like genes have been reported (Zaki, 2003). Phylogenetic analysis of the RT domain of plant gypsy group retrotransposons indicated that they resolve into two lineages: one universally lacking and the other containing env genes (Vicient et al., 2001). Our phylogenetic analysis revealed that G84 RT sequence is clustered with env-containing plant retrotransposons. This suggests that G84 represents an env-containing gypsy group retrotransposon family in G. barbadense. It is noteworthy that env-like sequences, GM5 and GM6, were previously reported in G. barbadense (Abdel Ghany and Zaki, 2002). Currently, it is unknown whether G84, GM5 and GM6 represent the same retrotransposon family. Further experimental analysis is required to address this question.
The phylogenetic analysis of the RT domain provides the evolutionary relationships among gypsy group retrotransposons to be inferred (Malik et al., 2000;Eickbush and Malik, 2002). Our phylogenetic analysis revealed that the closet relatives of G45 and G84 are found in other gypsy group RT of plants (A. comosus and H. vulgare) than to each other. These evolutionary relationships suggest either an ancient origin of plant retrotransposons (vertical transmission), or horizontal transmission, in which these retrotransposons have jumped the species-gap (Eickbush and Malik, 2002). The observation that branch lengths separating plant retrotransposons are usually similar, indicating a similar evolutionary distance, disagrees with the horizontal transmission hypothesis, and supports the existence of a diverse group of retrotransposon families in the progenitor of plants. This suggestion is supported by the fact that gypsy group retrotransposons are detected in all Gossypium species examined .
Plant retrotransposons are known to be transcriptionally silent in most plant tissues during development, suggesting transcriptional control is a major mechanism of control for their retrotransposition (Kumar and Bennetzen, 1999). Their expression and transposition are, however, inducible by stresses such as protoplast isolation and tissue culture (Grandbastien et al., 1997). Detection of their transcripts under ordinary growth conditions has also been reported (Suoniemi et al., 1996, Pearce et al., 1997. In this regard, G45 and G84-encoded transcripts were detected in their related respective young seedlings using RNA slot-blot hybridization, suggesting that G45 and G84 are transcrptionally active retrotransposons. However, the presence of either stop codons or insertions/deletions that have caused frame shifts in G45 and G84 derived amino acid sequences, suggests that these clones represent defective retrotransposons. Nevertheless, the detection of G45 and G84-encoded transcripts, intermediates in the retrotransposition process (Kumar and Bennetzen, 1999), suggests that subsets of these molecules are competent for retrotransposition.