Molecular cloning of WRKY transcription factor sequences in wild emmer wheat ( Triticum dicoccoides )

WRKY transcription factors are one of the largest families of transcriptional regulators in plants and the WRKY gene family is involved in several diverse pathways. WRKY genes contain one or two highly conserved DNA binding domains called WRKY domain interrupted by an intron. In this study, 20 sequence fragments with WRKY genes/proteins were identified by a pair of degenerate primer (WRKY 1 FP + WRKY 2 RP) from wild emmer wheat (Triticum dicoccoides), which were represented fragments of 7 unique WRKY loci. The T. dicoccoides WRKY (TdWRKY) putative loci ranged in size from 207 bp (TdWRKY-7) to 562 bp (TdWRKY-2). Size differences among the 7 TdWRKY putative loci were primarily due to variations in the length of the intron contained within the WRKY domain. These introns ranged in size from 97 bp (TdWRKY-7) to 449 bp (TdWRKY-2). According to the group assignments of 7 TdWRKY domains with 38 AtWRKY domains and the position of the putative intron for the TdWRKY loci, the 5 TdWRKY loci (TdWRKY-1, TdWRKY-3, TdWRKY-4, TdWRKY-5 and TdWRKY-7) belonged to WRKY group I domains; and the rest 2 loci’s domains (TdWRKY-2 and TdWRKY-6) were clustered into WRKY group IIb. This research provide some useful information for studying the WRKY transcriptional factors families in the genus Triticum L.

The WRKY proteins, one of the largest families of transcriptional regulators in plants, have a well-conserved amino acid sequence, the WRKY domain, from which their name originated (Eulgem et al., 2000;Zhang and Wang, 2005;Rushton et al., 2010).Most of the WRKY domains contain a signature WRKYGQK motif followed by a distinctive zinc-finger-like motif.
All recognized WRKY proteins contain either one or two WRKY domain.They could be classified into three main groups (I, II, and III) on the basis of both the number of WRKY domains and the features of their zinc-finger-like motif (Eulgem et al., 2000).Group II is further subdivided into five subgroups (a to e) based on the additional amino acid sequence present outside the WRKY domain (Eulgem et al., 2000;Rushton et al., 2010;Agarwal et al., 2011).A common element of WRKY genes is the interruption of the coding region of the C-terminal WRKY domain of group I and the single WRKY domain of groups II and III genes by an intron (Eulgem et al., 2000).The size and sequence of the intron vary in each gene, but its position is highly conserved, being localized after the codon encoding arginine that is N terminal to the zinc-finger-like motif, and aids in identifying the group/subgroup to which each gene belongs (Eulgem et al., 2000;Borrone et al., 2004).Since the first WRKY cDNAs was identified in the sweet potato (Ipomoea batatas) in 1994 (Ishiguro and Nakamura, 1994), lots of WRKY genes have been isolated from many plants, such as Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa L.), cotton (Gossypium spp.), wheat and tomato (Lycopersicon esculentum M.) (Ross et al., 2007;Dong et al., 2003;Xu et al., 2004;Wu et al., 2008;Hofmann et al., 2008).Till August 2012, there were more than 1131 WRKY genes registered in NCBI (National Centre for Biotechnology Information) website, including 83 from Arabidopsis (Arabidopsis Thaliana), 110 from rice (Oryza Sativa), 67 from corn (Zea Mays) and 13 from wheat.The WRKY proteins are involved in the regulation of various physiological programs, including regulating seed dormancy, germination and development, senescence, development, and plant responses to both abiotic and biotic stresses (Guo et al., 2004;Ulker and Somssich, 2004;Xie et al., 2005;Rushton et al., 2010;Zhao et al., 2012).While both the gene sequences and function of WRKY family of wheat are not well understood, some single WRKY genes and their functions have made big progress, such as TaWRKY1 tolerant to freezing, TaWRKY2 tolerant to drought and salt, and TaWRKY19 tolerant to drought, salt and freezing, and so on (Houde et al., 2006;Gregersen and Holm, 2007;Wu et al., 2008;Qin, 2009;Zhao et al., 2012).
Up to now, only one WRKY sequence from T. dicoccoides was released through the isolation of differentially expressed cDNA (Ergen and Budak, 2009).It has been reported that the WRKY genes could be obtained from degenerate primer pairs (Borrone et al., 2004).In this study, we reported that the isolation and analysis of WRKY gene fragments of T. dicoccoides.

Plant material
All the four materials (TD88, TD89 from population J'aba, TD98 from population Amirim and TD129 from population Dalliya) were provided by the cereal gene bank of the Institute of Evolution, University of Haifa.

DNA isolation
The seed of all genotypes were germinated under darkness at 23°C for 1 week.Young leaves were harvested and crushed into powder with the aid of liquid nitrogen.The genomic DNA was extracted by a CTAB method (Murray and Thompson, 1980) and diluted to 50 ng/μl for PCR.

PCR amplification
The degenerate primer pair WRKY 1 FP and WRKY 2 RP was released by Borrone et al. (2004).The detailed primer information was listed in Table 1 and Figure 1.WRKY analysis was conducted according to previously established protocols with minor modifications (Borrone et al., 2004).Each 20 μl PCR reaction mixture consisted of 2.0 μl of 10 × PCR buffer (2 mM of MgCl2), 300 μmol dNTPs, 0.3 μmol primers, 100 ng genomic DNA, and 1U Taq polymerase.The PCR reaction procedure was: 94°C for 5 min, followed by 35 cycles of three steps: 30 s denaturing at 94°C, 45 s annealing at 55°C, and 30 s elongation at 72°C, with a final elongation step at 72°C for 5 min.The PCR products were separated on 1.2% agarose gels, and then the targeted DNA fragments were recovered and cloned into the pGEMT-Easy vector (Promega).The ligated products were transformed into Escherichia coli (DH5α) cells and the recombinated plasmids were screened as a sequencing template.

Bioinformatic analysis
Individual sequences and the consensus sequence of each gene fragment were putatively identified using BLASTN, BLASTX and TBLASTS searches (Altschul et al., 1997)  The nucleotide sequences encoding WRKY domain region with or without the intron, and those of the putative open reading frames (ORFs) were aligned using softwares DNAman 5.2.2 (http://www.lynnon.com)and CLUSTAL W 1.81 (Thompson et al., 1994).The phylogenetic tree was generated based on the NJ (neighbour-joining) sequences distance method (Saitou and Nei, 1987) and depicted and edited by MEGA 3.1 program (Kumar et al., 2004).The used distance for the NJ grouping was 0.05 and the bootstrap value was estimated based on 500 replications.

RESULTS
Five distinctive bands ranging in size from 100 to 700 bp were observed in each of 4 tested T. dicoccoides accessions using the degenerate primer WRKY 1 FP + WRKY 2 RP.Of the random 54 colonies sequenced, 20 (37%) identified with WRKY genes/proteins using BLASTX and BLASTN or with the feature of WRKY genes/proteins.The vast majority (15) of the identifying WRKY sequences (20) ranged in size from 207 bp to 341 bp.
On the basis of multi-sequence alignment based on the WRKY domain, the 20 sequenced clones represented   fragments of 7 unique WRKY loci (Table 2, Figure 2).For all the 7 individual loci, the translated amino acid sequences of each locus were completely identical.When the coding regions of WRKY domains of multiple clones representing a single TdWRKY putative locus were compared, the nucleotide sequences of the overlapping segments were more than 96% identical.While the results of comparing the multiple clones including the introns representing a single TdWRKY putative locus showed the nucleotide sequences of the overlapping segments were more than 94% identical, except TdWRKY-5, the identity is 73.33%.The 7 putative loci were different with one another based on the comparison of the nucleotide or amino acid sequence of the coding region of the WRKY domain.The closest nucleotide percentage identity between two TdWRKY putative loci for the WRKY domain coding region was 85% for TdWRKY-1 and TdWRKY-4.As the DNA-binding WRKY domain was expected to be the most conserved region of the gene, it was concluded that each identified TdWRKY putative locus represented an individual gene locus.Among the 7 TdWRKY putative loci, four (TdWRKY-1, TdWRKY-4, TdWRKY-5 and TdWRKY-6) were decided by 2 clones, while the rest three were represented by 5, 4 and 3 clones, respectively.The TdWRKY putative loci ranged in size from 207 bp (TdWRKY-7) to 562 bp (TdWRKY-2).The putative 7 loci contained only the C-terminal WRKY domain for each of these genes.
The difference of the size among the 7 TdWRKY putative loci was primarily due to variations in the length of the intron contained within the WRKY domain.Alignments of all the 20 nucleotide sequences of the 7 TdWRKY putative loci or of the WRKY domains were not possible, when the introns existed, due to their extreme variation in length and sequence.These introns ranged in size from 97 bp (TdWRKY-7) to 449 bp (TdWRKY-2).The introns within the domains were at positions identical to those described in Arabidopsis for each group and subgroup (Eulgem et al., 2000).According to the feature of the degenerate primers, all the putative loci should belong to Group I or Group II a, b or c.The position of the putative intron for two of the TdWRKY loci (-2, -7) following the conserved CX 5 C motif of the zinc-finger identified these as belonging to group II, subgroups a or b, WRKY genes.The position of the putative intron for the other five of the TdWRKY loci followed by the conserved CX 4 C motif and the whole zinc-finger identified these as belonging to Group I or Group II c, WRKY genes.
Group assignments were conducted by aligning the translated amino acid sequences of 7 TdWRKY domains with 38 AtWRKY domains, including Group I or Group II a, b or c, obtained from Eulgem et al. (2000) (Figure 3).The WRKY domains separated into distinct clusters representing each WRKY group and subgroup.The tree generated was consistent with trees from previous analyses (Eulgem et al., 2000).The association of individual TdWRKY domains with one another received high bootstrap support (>60%).For example, TdWRKY-7, TdWRKY-1 and TdWRKY-4 were clustered together, with   domains.AtWRKY domains are indicated by mumber only, that is, 1 = AtWRKY1.Bootstrap support is given as a percentage of 500 datasets at each node, and groups/subgroups are designated as has been previously described for AtWRKY sequences (Eulgem et al., 2000).The complete tree and a list of AtWRKY accession numbers used are provided in the ESM.

DISCUSSION
WRKY genes have been identified by three methods: sequence analysis of the entire genomes of A. thaliana (The Arabidopsis Genome Initiative, 2000) and O. sativa (Goff et al., 2002;Yu et al., 2002), or in the EST database of T. aestivum (Qin, 2009;Niu et al., 2012); the isolation of differentially expressed cDNA (Alexandrova and Conger, 2002;Hara et al., 2000;Hinderhofer and Zentgraf, 2001;Huang and Duman, 2002) and the isolation on the basis of the degenerate PCR primers (Chen and Chen, 2000;Trognitz et al., 2002;Borrone et al., 2004;Mauro-Herrera et al., 2006).It is an effective, cheap, easy and fast way to get WRKY genes from the plant with huge and un-sequenced genome using the method of PCR cloning.
In this research, 20 nucleotide sequences with WRKY domain were successfully isolated from T. dicoccoides using one degenerate PCR primer: WRKY 1 FP + WRKY 2 RP.However, up to now, in the NCBI database only one WRKY nucleotide sequence from T. dicoccoides was reported through the isolation of differentially expressed cDNA (Ergen and Budak, 2009).The size and sequence variability of the intron, the position of the intron within the WRKY domain and the conserved amino acid motifs within the WRKY domain were used to classify the cloned PCR products into 7 groups.For the nucleotide and amino acid alignments of the most conserved portion of the TdWRKY loci, the DNA-binding WRKY domain, the TdWRKY loci were distinct with one another.
According to the BLAST among 7 putative TdWRKY loci and other WRKY sequences in the NCBI database one by one, 7 other WRKY sequences (TaWRKY1, BnWRKY6-1, AtWRKY44, BnWRKY2, BnWRKY33-1, BnWRKY72 and HvWRKY6) got the highest similarity to TdWRKY sequences (from TdWRKY1 to TdWRKY7) (Table 2).The BnWRKY genes were response to fungal pathogens and hormone treatments (Yang et al., 2009).So we could presume the TdWRKY transcriptional factors should also be related to some function.Further research should be made to get the full lengths of 7 putative loci from T. dicoccoides and to analyze the function of them.Furthermore, some single WRKY genes and their functions in wheat have made big progress, which will provide useful information for studying TdWRKY genes and functions.
According to the feature of the primers, the WRKY loci obtained should belong to group I and group II a, b or c.All the four types WRKY putative loci were gotten from T. cacao (Borrone et al., 2004), and the three quarters are group I and IIb.In our T. dicoccoides study, two types: group I and group IIb were obtained successfully.So we can presume this primer pair (WRKY 1 FP + WRKY 2 RP) has the privilege to get group I (especially C-terminal WRKY domain) and IIb in T. cacao and T. dicoccoides genomes.In previous research, there was no group IIb after assigning 61 and 43 WRKY genes from T. aestivum cDNA database (Qin, 2009) and wheat EST database (Niu et al., 2012), and none of the 15 genes encoding WRKY transcription factor in wheat belonged to group IIb (Wu et al., 2008).While in our research, 2 of 7 TdWRKY putative loci belonged to group IIb.It must provide a very effective method for isolation group IIb from T. aestivum.However, because of the two potential binding sites for the forward degenerate WRKY 1FP (Figure1), theoretically the nucleotide sequences both N-terminal and C-terminal could be obtained from the primer pair.Only the C-terminal WRKY domains in group I were obtained in T. dicoccoides, and only 6 nucleotide sequenced out of 324 contained in T. cacao (Borrone et al., 2004).So we can predict that this primer pair (WRKY 1 FP + WRKY 2 RP) has the privilege to get the N-terminal WRKY domains.The reverse primer WRKY 2 RP determined the specificity of the WRKY 1FP + WRKY 2 RP combination for group 1 and group II, subgroups a-c, WRKY genes.WRKY domains of group II, subgroups d and e, and group III WRKY proteins differ in their amino acid sequence at the C-terminal portion of the zinc-finger motif (Eulgem et al., 2000).It was expected that WRKY genes belonging to these groups would not be amplified.None were identified in T. cacao (Borrone et al., 2004) and also in T. dicoccoides.This suggests that degenerate primers could be designed to specifically amplify WRKY genes from each group or subgroup.
N/S/G/A/D)(H/Q) Group I, C-terminal WRKY domain and Group II, subgroup a-c a FP, Forward primer; RP, reverse primer; b The nucleotide sequence is given in the 5 ' -3 ' direction using the standard IUB code where M = A or C; R = A or G; Y = C or T; B = T, C, or G; N = A, T, C, or G; I = Inosine.Amino acids are given in the standard one-letter code; c Deg, overall degeneracy of the primer omitting inosine.

Figure 1 .
Figure 1.A scheme depicting the organization of group 1 and group II, a-c, WRKY proteins and the expected results from PCR with WRKY 1 FP + WRKY 2 RP based only upon the potential binding sites for the primer.Arrows indicate the approximate location of potential binding sites for the degenerate primers.1, WRKY 1 FP, 2, WRKY 2 RP.The WRKY domain is indicated by the black box, and the intron interrupting the WRKY domain is indicated by diagonal lines.The locations of other introns are not indicated.The detailed information is from Borrone et al. (2004).

Figure 2 .
Figure 2. Alignments of the deduced amino acid sequence of the WRKY gene sequences from T. Dicoccoides.The serial numbers on the left are on behalf of the deduced amino acid sequence from the following WRKY sequences.TdWRKY-1: 1 and 15; 7, 8, 18 and 19; 9, and 14; 11, 12 and 20.

Figure 3 .
Figure 3. Phylograms depicting the relationship of TdWRKY domains with Arabidopsis thaliana (AtWRKY)domains.AtWRKY domains are indicated by mumber only, that is, 1 = AtWRKY1.Bootstrap support is given as a percentage of 500 datasets at each node, and groups/subgroups are designated as has been previously described for AtWRKY sequences(Eulgem et al., 2000).The complete tree and a list of AtWRKY accession numbers used are provided in the ESM.
a Clones indicate the number of clones sequenced from the degenerate PCR found to represent each locus; b Lengths do not include the degenerate primer binding sites; c Ta, Triticum aestivum; Bn, Brassica napus; At, Arabidopsis thaliana; Hv, Hordeum vulgare