Functional and comparative analysis of expressed sequences from Diuraphis noxia infested wheat obtained utilizing the conserved Nucleotide Binding Site

Russian wheat aphid (Diuraphis noxia, Morvilko; RWA) is a major pest on wheat, barley and other triticale in South Africa. Infestation by the RWA results in altered protein expression patterns, which is manifested as differential expression of gene sequences. In the present study, Russian wheat aphid resistant (Tugela DN, Tugela*5/SA2199, Tugela*5/SA463, PI 137739, PI 262660, and PI 294994) and susceptible triticale (Tugela) were infested and cDNA synthesized. A PCR based approach was utilized to amplify the nucleotide binding site conserved region to obtain expressed sequence tags (ESTs) with homology to resistance gene analogs (RGAs). The approach proved highly feasible when the isolation of RGAs is the main objective, since 18% of all obtained ESTs showed significant hits with known RGAs, when translated into their corresponding amino acid sequences and searched against the nonredundant GenBank protein database using the BLASTX algorithm.


INTRODUCTION
Russian wheat aphid (Diuraphis noxia, Morvilko; RWA) is one of the most adaptable insects that is recognized as a pest of wheat, barley and other triticale (Bryce, 1994;Walters et al., 1980). Infestation can occur shortly after the emergence of the wheat plants and the aphids are found on the newest growth and the axils of the leaves, but damage is greatest when the crops start to ripen. This is due to the twisting and distortion of the heads and the resulting failure to emerge properly (Unger and Quisenbury, 1997). Further symptoms of RWA feeding on susceptible cultivars include longitudinal streaking and leaf rolling, which under severe infestation leads to a drastic reduction in effective leaf area (Walters et al., 1980). Infestation by the RWA also results in altered protein expression patterns, which is manifested as differential expression of total proteins, and specific pathogenesis-related proteins like chitinases, ß-1,3glucanases and peroxidases (Bahlmann, 2002;Botha et al., 1998;Van der Westhuizen et al., 1998a,b, 2002Van der Westhuizen and Botha, 1993;Van der Westhuizen and Pretorius, 1996). The use of RWA-resistant cultivars, however, may reduce the impact of this pest on *Corresponding Author; E-mail: ambothao@postino.up.ac.za, tel: +27 12 420 3945, fax: +27 12 420 3947 wheat production and in the same time reduce environmental risks and control costs due to chemical spraying (Tolmay et al., 1999). The need for more RWA tolerant plants places emphasis on obtaining resistance candidate genes, as well as on the understanding of the underlying mechanisms of defense against the RWA.
Disease resistance genes have been isolated and characterized at the molecular level in several plant species such as Arabidopsis, tobacco, tomato and wheat (Jones and Jones, 1997;Cannon et al., 2002). Resistance gene products specifically recognize and provide resistance towards a large number of pests and pathogens (Seah et al., 1998;Pan et al., 2000). These genes can be divided into four broad, structurally distinct classes. The first class of resistance genes belongs to the serine-threonine kinases (Martin et al., 1993;Ritter and Dangl, 1996). The protein kinases phosphorylate serine/threonine residues and thus control certain signaling networks during the resistance response. The second class of resistance genes encodes putative transmembrane receptors with extracellular leucine rich repeat (LRR) domains (Jones et al., 1994;Dixon et al., 1998). The third class encodes for a receptor-like kinase and combines qualities of both the previous classes. Both the LRR domain and the protein kinase regions are encoded in the same protein. The fourth class, which represents the majority of plant disease resistance genes cloned so far, is the nucleotide-binding site-leucine rich repeat (NBS-LRR) resistance genes. The NBS-LRR class of genes is abundant in plant species. In Arabidopsis, it has been estimated that at least 200 different NBS-LRR genes exist making up to 1% of the genome (Ellis et al., 2000;Sandhu and Gill, 2002).
The NBS-LRR genes contain three distinct domains: a variable N-terminus, a nucleotide-binding site and leucine rich repeats. Two types of N-termini are present in NBS-LRR. One kind contains a leucine zipper or coiled-coil sequence that is thought to facilitate protein-protein interactions. The coiled-coil motif has been found in the N terminus of both dicotyledons and cereals (Pan et al., 2000;Cannon et al., 2002). The second kind of Nterminus has been described only in dicotyledons and is similar to the cytoplasmic signaling domains on the Drosophila Toll-or the mammalian interleukin receptorlike (TIR) regions (Whitham et al., 1994;Cannon et al., 2002). These NBS regions are found in many ATP and GTP-binding proteins that act as molecular switches (Jackson and Taylor, 1996). These genes regulate the activity of proteases that can initiate apoptotic cell death. Since defense mechanisms in plants include the hypersensitive response, which is very similar to apoptosis, the common occurrence of NBS domains in both plants and animals could be an indication of similar functioning (Cannon et al., 2002).
NBS-LRR homologues encode proteins that are structurally closely related. This suggests that they have a common function in signal transduction pathways, even though they confer resistance to a wide variety of pathogen types. The conservation between different NBS-LRR resistance genes enables the use of polymerase chain reaction (PCR)-based strategies in isolating and cloning other R gene family members or analogs using degenerate primers for these conserved regions. Strategies using degenerate primers have been successfully utilized in the cloning of other putative NBS-LRR resistance gene analogs (RGA) from potato (Solanum tuberosum L.) (Leister et al., 1996), soybean (Glycine max L. Merr.) (Yu et al., 1996) and citrus (Deng et al., 2000).
The identification and analysis of expressed sequence tags (ESTs) provide an effective tool to study thousands of genes expressed during plant development and their response to varying environmental conditions (Gyorgyey et al., 2000;White et al., 2000;Yamamoto and Sasaki, 1997) in complex genomes like wheat. The development of EST databases further provides a resource for transcript profiling experiments and studies of gene expression (Mekhedov et al., 2000;Schenk et al., 2000).
The aim of this study was to survey the expressed sequence tags obtained through PCR-based strategies utilizing the conserved nucleotide binding site motifs in an effort to increase the efficacy of isolating resistance gene candidates, from the complex hexaploid wheat genome.

Plant Material
The plant materials in the study were Aegilops tauschii, the near isogenic lines 'Tugela DN' (Tugela*5/SA1684, Dn1), Tugela Dn2 (Tugela*5/SA2199), Tugela Dn5 (Tugela*5/SA463) and Tugela (RWA susceptible), as well as RWA tolerant lines PI 137739 (SA1684, Dn1), PI 262660 (SA2199, Dn2) and PI 294994 (SA463, Dn5). The plants were grown in pots under greenhouse conditions with prevailing day and night cycles at the Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria. The temperature was maintained at 24°C, and the plants were watered daily. Half of the wheat seedlings were infested with RWA (10 aphids per plant) at the 3-4-leaf growth stage. The second and third leaves from uninfested and infested plants were removed after one week for analysis. The aphids were removed from the infested leaves under running water to prevent aphid derived nucleic acid contamination during the RNA isolation. The leaves were dried and used immediately for total RNA isolation.

Treatment of glassware, plastic ware and solutions
All glassware, plastic ware and solutions used, up to the second strand cDNA synthesis, were treated and then kept free of RNases. The glassware was treated overnight in 0.1% (v/v) diethyl pyrocarbonate (DEPC), autoclaved for 20 min at 121°C and baked at 200°C for 3-4 hours (Sambrook et al., 1989). The mortars and pestles were washed in 0.25M HCl for 30 min, prior to DEPC treatment, autoclaving and baking. All plastic ware and solutions, except those containing Tris (2-Amino-2-(hydroxymethyl)-1,3propandiol), were DEPC treated and autoclaved.

Total RNA Isolation and cDNA synthesis
Total cellular RNA was extracted using an acid guanidium thiocyanate-phenol-chloroform extraction method described by Chomczynski and Sacchi (1987). The RNA samples were stored at -80°C for further use. The RNA concentration was determined on a Beckman DU ® -64 spectrophotometer, by reading the absorbance at 260 nm. The 260/280 ratio was determined to indicate the level of protein contamination (Sambrook et al., 1989). The integrity of the RNA was confirmed by analyzing both the infested and uninfested total RNA on a 2 % (w/v) agarose gel (Sambrook et al., 1989). The molecular mass standard used was l DNA digested with EcoRI and HindIII (Sambrook et al., 1989). Isolated RNA was electrophoresed at 100 V for 30 min and visualized under UV light with ethidium bromide (EtBr) staining.

mRNA Isolation
The mRNA was purified from the total RNA using Oligo(dT) Cellulose affinity chromatography (GibcoBRL, Life Technologies). The synthesis of cDNA was carried out using either the Roche Molecular Biochemicals cDNA Synthesis System according to manufacturers specifications, or the RLM-RACE system (GeneRacer Kit, Invitrogen). Both the uninfested and the infested wheat mRNA were used as the substrate for the cDNA synthesis reaction. The ds cDNA was purified by the QIAquick Spin Purification Procedure (QIAGEN). The cDNA was eluted with water and the concentration determined spectrophotometrically and stored at -20ºC.
When making use of the RLM-RACE system, the mRNA was dephosphorylated with calf intestinal phosphatase to remove the 5' phosphates and decapped with tobacco acid pyrophosphatase Small subunit c.
Miniature inverted terminal repeat element d.
High molecular weight e.
Leucine rich-repeat f. Resistance * Protein with discernable function (TAP) to remove the 5' cap. The dephosphorylated, decapped mRNA was ligated to a GeneRacer TM RNA oligo using the GeneRacer Kit (Invitrogen). The ligated mRNA was reversetranscribed using SUPERSCRIPT TM II RT (Invitrogen) and the GeneRacer TM Oligo dT Primer to create RACE-ready cDNA with known priming sites at the 5' and 3' ends. The 5' ends were amplified using a reverse degenerate nucleotide-binding site primer and the GeneRacer TM 5' Primer. The degenerate oligonucleotide primers were based on the amino acid sequences of two highly conserved motifs of the NBS in the tobacco N and Arabidopsis RPS2 genes (Yu et al., 1996). The 3' ends were amplified using a forward degenerate nucleotide-binding site primer and the GeneRacer TM 3' primer (GCTGTCAACGATACGCTACGTAACGGC ATGA CAGTG(T)18). The cycling parameters used for the GeneRacer TM reactions were five cycles consisting of 94˚C for 30 sec and 72˚C for 1 min, five cycles of 94˚C for 30 sec, 70˚C for 30 sec and 72˚C for 1 min and twenty cycles of 94˚C for 30 sec, 68˚C for 30 sec and 72˚C for 1 min.

Degenerate NBS-PCR
For the amplification of NBS sequences from the synthesized cDNA the following degenerate primers was applied:

Cloning and Analysis of NBS-PCR Products
The PCR products were purified from an agarose gel slice using a Geneclean III Kit (Bio101). These fragments were cloned into the pGEM Ò -T Easy vector system (Promega). Ligation mixtures were used to transform competent E. coli (JM109) cells. Plasmid DNA was isolated from candidate clones and purified. Sense and antisense strands of the clones were used in cycle sequencing using the dideoxy-DNA chain-termination method with the BigDye Terminator Cycle Sequencing Reaction kit (Perkin-Elmer) on the ABI-3100 Prism Automated sequencer (Perkin Elmer).

Sequence identity and functional annotation
The sequence identities were obtained after BLAST searching and alignment to other published sequences in GenBank (Altschul et al., 1997). Functions were assigned to ESTs based on the results returned from searches using the BLASTX algorithm. Any ESTs that did not produce a BLASTX hit were considered to have an unknown function. Sequences that produced hits to proteins with E values greater than 10 -5 were also considered to have an unknown function. Sequences with hits to proteins with no discernable function were placed into the miscellaneous category. Sequences with hits to plant defense (pest and pathogen) were placed into the Secondary metabolism category. The remaining sequences were placed into five broad functional categories: protein synthesis and modification, metabolism, regulatory, structural and genes of unknown function (miscellaneous).

RESULTS
We constructed cDNA libraries from Russian wheat aphid infested wheat leaves at the 3-4-leaf growth stage. The average titer of the cDNA libraries collectively were approximately 2 x 10 6 CFU, and with the average cDNA insert size of approximately 1kB. Following a single-pass, 5'-end sequencing approach, we obtained a total of 207 ESTs with sizes that ranged from 230 to 772 bases, and an average size of 489 bp.
To assign function to the proteins encoded by nonredundant sequences, the DNA sequences were translated into their corresponding amino acid sequences and searched against the nonredundant GenBank protein database using the BLASTX algorithm. A maximum probability threshold for a sequence match was set at 10 -5 . Following this approach we obtained a total of 194 ESTs with significant E-values already present in GenBank (Table 1).
After the sequence identities were obtained from GenBank, functions were assigned based on the results returned after BLAST searching of the obtained ESTs (Figure 1). The annotated functions comprise of 25% of sequences involved in protein synthesis and modification, such as the translation factors, tRNA ligases, protein kinases and hydrolases; 25% of the sequences were involved in structural functions, such as membranebound and cytoskeleton proteins; 22% of the sequences were involve in the general metabolic activities required for energy production. Only 3.5% of the obtained sequences represented hits with regulatory function. Of the obtained sequences, 6.5 % failed to give a significant hit with any known protein function and thus represent the miscellanous portion. Following this approach we obtained 18% sequences with functions assigned to the secondary metabolism, and most of these had significant hits to either specific resistance gene analogs or putative RGAs. Secondary metabolism: pathogenesis-related proteins; Miscellaneous: proteins with no discernable function. Expressed sequence tags (ESTs) that did not produce a BLASTX hit, or with hits with E-values greater than 10 -5 , were considered to have an unknown function.

Metabolism
The obtained RGAs were grouped accordingly to the main resistance gene classes (Table 2), and represent the major groups of resistance resistance genes, which include the serien/threonine kinases (2), transmembrane receptors (2); leucine-rich repeats (2); nucleotide binding sites (10) and leucine zippers (2). No hits were obtained that fall within the grouping of toll/interleukin-1. A further 18 sequences gave significant hits with functions either defined as putative resistance proteins or proteins with known linkages to pathogen resistance, but which does not fall within the assigned groupings.

DISCUSSION
The majority of plant disease resistance genes cloned so far contain nucleotide-binding sites (NBS) and leucinerich repeat (LRR) domains. This class of R genes belongs to a superfamily that is present in both dicotyledons and monocotyledons as suggested from sequence comparisons made between these isolated genes (Bent et al., 1994;Lagudah et al., 1997;Meyers et al., 1998). The use of PCR based approaches with degenerate oligonucleotide primers designed from the NBS region of cloned disease resistance genes has led to the cloning of resistance gene-like sequences in several plant species (Leister et al., 1998;Seah et al., 1998;Garcia-Mas et al., 2001). Co-segregation of some of these sequences with known disease resistance gene loci has been reported.
In the present study we tested the feasibility of using such a PCR-based approach. The degenerate oligonucleotide primers designed from conserved motifs in the NBS domain, was used to clone several disease resistance gene homologues from wheat lines. Out of the 207 ESTs obtained, 37 gave hits with significant homology to plant defense (E-values < 10 -5 ). In the present study, a clear bias for obtaining resistance gene analogs were found, when compared to other similar but randomized studies (Kruger et al., 2002;White et al., 2000;Yamamoto and Sasaki, 2000). In a similar study, where the expressed genes from Fusarium graminearum infected wheat spikes were analyzed, most of the obtained nonredundant ESTs were of miscellaneous nature, followed by sequences related to general metabolism and of importance to cell structure (Kruger et al., 2002).
The NBS and LRR domains are conserved amongst several disease resistance genes and this has led to the hypothesis of cloning additional resistance genes based on the homology to these conserved sequences. The procedure can be complicated by an excess of genes that contain the NBS region, but are not related to resistance genes (Yu et al., 1996). This is also true for this study, as only 8% of the RGAs could be linked to specific resistant genes, and 50% could be assigned to specific groupings, whereas the others contained only the specific conserved motif. Also many homologous resistance genes may be located throughout the genome in a plant species. Thus, the sequence homology among these genetically independent and functionally distinct disease-resistance genes will present a difficulty in isolating individual clones, which correspond to a specific resistance gene by hybridization. However, it proved useful in the present study, as these isolated clones will be utilized in a gene expression study approach in a future study.