In silico characterization and evolution studies of alcohol dehydrogenase gene from Phoenix dactylifera L . cv Deglet Nour

The aim of our study was to isolate the alcohol dehydrogenase (ADH) mRNA from Phoenix dactifera, and examine the molecular evolutionary history of this nuclear gene with others ADH genes from palms and other plants species. The DnADH gene has been isolated in silico by BLAST2GO from a cDNA library of date palm cv Deglet Nour. The prediction of candidate’s mRNA and protein for ADH gene from khalas were performed in silico from whole genome shotgun sequence (ACYX02009373.1) using FGENESH prediction program. Nucleotide polymorphism using DnaSPv5 was examined in four palm ADH mRNA sequences across the entire 1.098 kb length of ADH mRNA. A primary conclusion of the present study is that nucleotide diversity for ADH between palm species is very low. In order to assess selective pressure, we calculated the ratio of non-synonymous to synonymous substitutions. We conclude that ADH palms genes appear to be under very different selective constraints. Phylogenetic analyses using PHYLIP and Notung 2.8 programs suggest that ADH genes of some plants species resulted from relatively ancient duplication events. In this study, we present for the first time a molecular characterization of ADH protein of P. dactylifera L cv Deglet nour and a phylogeny analysis between plants ADH.


INTRODUCTION
Sequencing of date palm genome and cDNA or expressed sequence tags (EST) using Next generation sequencing provides a rapid method for gene discovery and can be used to identify transcripts associated with specific biological processes (Al-Mssallem et al., 2013).
The alcohol dehydrogenase (ADH) genes encode a glycolytic enzyme and have been characterized at the molecular level in a wide range of flowering plants (Clegg et al., 1997;Miyashita et al., 2001), California fan Palm (Washingtonia robusta) (Morton et al., 1996) and Oil palm *Corresponding author.E-mail: imenbmc@yahoo.fr.Tel: 0021674676616.Fax: 0021674274437.
Author(s) agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License Abbreviations: EST, Expressed sequence tags; ADH, alcohol dehydrogenase; CDD, conserved domains database; ML, maximumlikelihood; SNP, single nucleotide polymorphisms.
Recently, Small and Wendel (Small et al., 2000) suggested that some ADH gene duplications may have predated the origin of each of the flowering plant families.However, the details of the gene duplications and deletions experienced by the ADH genes of most groups of the angiosperms remain unclear.Additional studies are needed to understand the evolutionary history of the ADH genes in various plant groups.A number of studies have been conducted on the evolutionary dynamics of plant gene families, including the gene families coding for the R and MADS-box regulatory proteins (Purugganan et al., 1995), the small heat-shock proteins (Waters et al., 1995), chalcone synthase (Durbin et al., 1995), and the chlorophyll a/b binding proteins (Demmin et al., 1989).Most of these gene families consist of numerous loci and have a great deal of variation in copy number between species.The evolutionary picture emerging for these gene families is one of dynamic fluctuations of copy number through multiple gene duplication/deletion events.
The glycolytic proteins in plants are coded by small multigene families, which provide an interesting contrast to the high copy number gene families studied to date.Isozyme surveys covering an array of dicot and monocot species have revealed that most glycolytic enzymes have two forms in all species (Gottlieb et al., 1982), probably reflecting a small, and stable, number of loci.The apparent stability of these gene families raises important questions regarding evolutionary dynamics.One issue is whether any given gene family emerged once by duplication and then differentiated, as suggested by Gottlieb (Gottlieb et al., 1982).An alternative view posits a continuous, albeit slow, flux of gene duplication and loss that leads to an approximate dynamic equilibrium in copy number.The narrow range of gene family size for glycolytic enzymes suggests that additional constraints may act to determine copy number for this important class of genes.
An analysis of animal and plant ADH genes indicated that the grass ADH1 and ADH2 genes diverged following the divergence of monocots and dicots (Yokoyama et al., 1993).This result provides evidence that the gene family did not emerge from a single duplication event early in angiosperm evolution.Additionally, the isolation of a recent duplication product in barley (Trick et al., 1988), as well as the duplication of ADH l in other species, suggests that the gene family undergoes some copy number fluctuation.Here, we report the isolation of ADH mRNA from Phoenix dactilifera L cv Deglet Nour and the prediction of ADH from P. dactilifera L cv Khalas.In order to study ADH gene evolution between recalcitrant vs. orthodox palm species, we compared ADH mRNAs and proteins of Khalas, Deglet Nour, E. guineensis and W. robusta varieties.We also investigated the molecular evolutionary history of the palms species ADH genes with the others species to gain further understanding of the evolutionary dynamics of nuclear gene families.

cDNA library normalization and in silico isolation of DnADH
Fresh leaf tissue from P. dactylifera L cv Deglet Nour were processed and flash frozen in liquid nitrogen.Tissues were immediately sent to Bio S&T Inc. (Montreal, QC, Canada) where RNA extraction, cDNA synthesis and normalization were performed.Briefly, RNA was extracted using a modified TRIzol method (Invitrogen, USA).cDNA synthesis was carried out using 18 μg total RNA by a modified SMART™ Cdna synthesis method and then were normalized by a modified normalization method where fulllength cDNA was synthesized with two sets of primers for driver and tester cDNA.Single-stranded cDNA was used for hybridization instead of double-stranded cDNA.Excess amounts of sensestranded cDNA was hybridized with antisense-stranded cDNA.After hybridization, duplex DNA was removed by hydroxyapatite chromatography.Normalized tester cDNA was re-amplified and purified with tester specific primer L4N by PCR, while driver cDNA was unable to amplify using L4N primer.Size fractionation of reamplified cDNA was done in a 1% agarose gel.Greater than 0.5 kb cDNA fragments were purified by electroelution and after determining the concentrations, purified cDNAs were precipitated and stored in 80% EtOH at −80°C.The normalized cDNA library was prepared for sequencing and approximately 8 μg of purified cDNA was sheared into small fragments via Covaris E210 Acoustic Focusing Instrument and sequenced in three-fourths 454 plate run on a 454 GS-FLX Titanium platform (Roche).To identified DnADH cDNA, assembled contigs were analysed using Blast2GO2.8bioinformatic (Conesa and Götz, 2008) to provide Gene Ontology, BLAST and domain/Interpro annotation.Evaluation of DnMRE11 predicted protein was done based on the identification of domains in the NCBI Conserved Domains Database (CDD), phytozome of June 2013 (http://www.phytozome.net/) and most recent version of HMMER (HMMERV3.0;Eddy, 2009).

mRNA and protein prediction of KhADH
Candidate's mRNA and protein for ADH gene from P. dactifera L cv khalas (KhADH) were identified in silico from whole genome shotgun sequence (accession number: ACYX02009373.1)using FGENESH prediction program.Our approach to the annotation was based on applying basic gene prediction tools and using the BLASTN and BLASTP programs to improve the accuracy of gene prediction.Sequence motifs related to function were identify through PFAM (Henikoff et al., 1999), and PROSITE (Altschul et al., 1998).

Sequences analysis
The sequences of the ADH genes used in this study were obtained from the GenBank/EMBL/DDBJ database (Table 1).Alignments of four ADH proteins sequences (Table 1) for palms species were performed with ClustalX (Thompson et al., 1997) and ESPript (Robert and Gouet, 2014).The number of segregating sites and levels of nucleotide diversity Pi (π), the average number of nucleotide differences per site between two sequences (Nei et al., 1987) and θ, an estimate of 4Neμ, where Ne is the effective population size and μ is the mutation rate per nucleotide (Watterson et al., 1975), were computed in DnaSP (version 5.10.00).Tajima's D test and Fu and Li's D test were also performed in DnaSP for testing selections deviating from neutrality (Librado and Rozas, 2009).The divergence distance of all mRNA date palm sequence was estimated by using the Kimura two-parameter model (Kimura, 1980) employed by PHYLIP (Felsenstein, 2000) with a transition/transversion ratio of 2.0.Estimation of dN and dS values in order to get information about functional constraints on palm ADH sequences, we also estimated the number of synonymous substitutions per synonymous site (dS), and the number of nonsynonymous substitutions per non-synonymous site (dN), using PAML yn00 program with default parameters (Yang, 1979) using Yang and Nielsen (2000) method.A distance matrix based on the aligned amino acid sequences was constructed by using the jones taylor thornton method of the PROTDIST program on PHYLIP.
The phylogenetic relationships between the 38 ADH proteins (Table 2) for different species were analyzed using the maximumlikelihood (ML) method.For the ML analyses, we used the PROTML program of PHYLIP version 3.6 (Felsenstein, 2000).We employed the JTT model of amino acid substitution.All indels were counted as missing.We performed ten random sequence addition searches using the J option and global branch swapping using the G option to isolate the ML tree with the best log-likelihood.In addition, we performed boot-strap analysis with 100 replications.To infer the evolutionary events affecting the ADH genes, an analysis using Notung2.8(Chen et al., 2000) was performed.The ML tree with the highest log-likelihood was used for the gene tree.Both gene duplications and losses were considered to reconcile the gene tree with the species tree.Evidence of recombination was sought by the program RDP4 (Version 4.16) (Martin et al., 2010).

Sequence analysis of isolated DnADH and predicted KhADH proteins
DnADH (ADH gene from P. dactifera Deglet Nour) and KhADH (ADH gene from P. dactifera L cv Khalas) encode proteins of 380 residues, with the predicted molecular weight of 41.14 and 41.18 kDa, and isoelectric points of 6.16 and 6. 59, respectively.The alignment of DnADH with sequences of different palms ADH proteins shows the presence of a large number of conserved domains (Figure 1), that are typical of this sub-family (Chase, 1999).The identity at the amino acid level between DnADH and other palm species ADHs sub-family is very high and ranges between 78 and 91%.The genetic distance between the four proteins is very low (Table 3).Many very well conserved amino acids that have been implicated in the fixation of zinc are present in DnADH: Cys, His and Cys at the 48, 70 and 178 positions (Figure 1) and four Cys at positions 100, 103, 106, 114 (Figure 1) (Eklund et al., 1976;Yokoyama and Harry, 1993).The Asp in position 237, corresponding to DnADH sequence has been described as implicated in the preference of NAD as cofactor in the dehydrogenase reaction (Eklund et al., 1976;Fan et al., 1991).

Divergence of the palms ADH loci
Pairwise distances based on the Kimura two-parameter model for mRNA sequences of four palm ADH loci are given in Table 4. Two points are apparent from this table.One is that the two KhADH and EgADH mRNA are the most similar.The second is that the ADH gene from palms is moderately diverged and must represent duplication event.

Sequence diversity of ADH gene between palm species
Nucleotide polymorphism was examined in four palm ADH mRNA sequences across the entire length of ADH palms mRNA.The examination yielded 48 single nucleotide polymorphisms (SNP) and 2 insertions or deletions (Indels) in this region (1.098 kb).Nucleotide diversity p of the entire mRNA sequence was 0. 15523 and θ was 0. 10594.Several statistical tests were used to test the hypothesis that ADH sequences have been evolving in accordance with expectations under neutral theory.Several statistical tests were used to test the hypothesis that ADH sequences have been evolving in accordance with expectations under neutral theory.The tests of Tajima (-0.75403) and Fu and Li (-0.34314) compare different estimates of ɵ (4Ne m) and p; they made assumptions that the four ADH sequences have a

Salvia miltiorrhiza
SmADH : ACZ48689.1     et al., 1995;Wayne and Simonsen, 1998).None of these tests returned significant P values.This is not surprising, given the small number of variable positions and the relatively low statistical power of these tests (Wayne and Simonsen 1998).A primary conclusion of the present study is that nucleotide diversity for ADH between palm species is very low.

Solanaceae
Estimates reported here are lower than previously reported values not only for plant ADH sequences (Cummings and Clegg, 1998;Liu et al., 1998), but for other plant nuclear genes such as C1 in maize, (Hanson et al., 1996); ChiA in Arabidopsis (Kawabe et al., 1997); ChsA in Ipomoea (Huttley et al., 1997) and Pgi in Dioscorea (Terachi and Miyashita, 1997).Tests for conversion among the four mRNA palms sequences using RDP v 4.16 with Maxchi program, detected two recombination's: one between DnADH and WrADH (ADH from Washingtonia robusta) (KhADH is the recombinant) with a P value of 2.97 10 -2 and length of 801 nt; one between DnADH and WrADH with EgADH (ADH from E. guineensis) is the recombinant (P value of 1.45 10 -2 and length of 793 nt).

Selection pressure
In order to assess selective pressure, we calculated the ratio of non-synonymous to synonymous substitutions (dN/dS) among mRNA palms ADH.Estimates of dN and dS for the entire coding region between the four palms mRNA are given in Table 5. Comparisons within plant ADH genes show dN/dS < 0.3 (Table 5).For all genes, dS exceeded dN in both comparisons, as would be expected for genes under purifying selection (Nei, 1987).All comparisons with P values ≤0.001 remain significant after correcting for multiple tests.The same results were found for the others ADH evolution studies (Yokoyama et al., 1990).By comparing dN/dS ratios of ADH palms genes, we found that the ratio EgADH -KhADH has the lowest value.Therefore, we conclude that ADH genes appear to be under very different selective constraints.This result shows that dN/dS ratios are lower for duplicated genes than for unique genes (Davis et al., 2004;Jordan et al., 2004).

Phylogenetic analyses
The palm family emerged -80 million years ago and as such it represents one of the lineages that radiated early in monocot evolution (Wilson et al., 1990;Duvall et al., 1993).The comparative analysis of these four palms monocot families presents an ideal opportunity to investigate the dynamics of angiosperm gene family evolution, and in particular, to expand our understanding of the evolution of the ADH gene family.A previous analysis of ADH has provided evidence against ADH genes in grasses emerging from a single duplication event early in the evolutionary history of the angiosperms (Yokoyama and Harry, 1993).We conducted phylogenetic analyses of the ADH genes using sequence from Aspergillus niger as outgroups.To determine the phylogenetic position of the palms ADH genes isolated and predicted in this study, we subjected their sequences to ML analysis by employing a data set including the previously published ADH gene family sequences from various phylogenetic groups (Clegg et al., 1997;Small et al., 2000).Our resulting ADH gene tree roughly consisted of three monophyletic groups that we denoted "Clade I","Clade II" and Clade III (Figure 2).Clade I contains only ADH genes from Papilionoideae species, while Clade II contains ADH genes from rosids species which contains the Brassicaceae species.The palms ADH proteins appeared with eudicots species in Clade III and not with the Commelinids cluster (monocots) (Figure 2).2) was reconciled using Notung 2.8 with a species tree complied from a phylogeny of model organisms.The reconciled tree involves 18 gene duplications (D) and 3 gene coduplication (cD).The solid red and pink boxes indicate gene duplications that were inferred on the basis of mismatches between the gene tree and the species tree.
Notung2.8 analysis using the ADH gene sequences suggested that the first ADH gene divergence event shown in Figure 3 by a circle separate monocot and dicots species from Papilionoideae species.The palms ADH proteins diverged within the monocots species (Figure 3) except WrADH and before the others monocots species (Figure 3) (Strommer, 2011).Our analysis revealed that in palms, the divergence of WrADH genes occurred after the others palms species diverged.This study revealed the complicated evolution of the ADH gene family that occurred during the course of plant diversification.
In our study, the phylogenic tree resulting from Notung 2.8 analysis showed that some ADH genes in flowering plants evolved in complex manner that included several duplication events (Figure 3).Duplication events in ADH genes have also been detected in other plant groups at various evolutionary levels.For example, we revealed duplication events in A. thaliana ADH copies (AtADH2 and AtADH3) and Triticum astivum ADH copies (TaADH2 and TaADH3) (Figure 3).Sang et al. (1997) showed that diploid species of Paeonia (Paeoniaceae) had two or three ADH sequences and that repeated duplication or deletion events occurred after the diversification of this genus.Small and Wendel analyzed ADH genes in Gossypium (Malvaceae) in great detail and found that these ADH sequences had experienced duplication events both before and after the divergence in Gossypium.Duplicated genes arise frequently in eukaryotic genomes through local events that generate tandem duplications, large-scale events that duplicate chromosomal regions or entire chromosomes, or genome-wide events that result in complete genome duplication (Dujon et al., 2004).Indeed, the existence of multigene families is evidence of the repeated gene duplication that has occurred over the history of life.One of the examples of the comprehensive analysis of gene duplication events in plants is the study of the MADS-box gene family.This gene family, which plays a central role in the morphogenesis of plant reproductive organs such as ovules and flowers, had experienced duplication events before the origin of angiosperms (Theissen et al., 2000).Moreover, some specific functions were gained through duplication events that took place after the diversification of flowering plants (Theissen et al., 2000).Thus, gene duplication has long been recognized as an important mechanism for the creation of new gene functions (Wagner, 1998;Wagner, 2001).It is likely that each of the ADH genes in the palms that were identified in the present study would have been subjected to different selective pressures over a long period.To determine whether this resulted in new functions, functional analysis of the palms ADH genes in each clade will have to be performed in the future.

Conclusion
The Adh genes in the date palm that were identified and analysed in the present study would have been subjected to different selective pressures over a long period.This is the first report revealing that palms species have a ADH genes loci belonging to the same clade.Phylogenetic analyses suggest that these genes resulted from relatively ancient divergence and duplication events.

Figure 1 .
Figure 1.Amino acid sequence alignment of ADH palms sequences using Clustal X and ESPript programs.Conserved residues are shaded in red.The arrows represent the conserved amino acids in short-chain ADHs.

Figure 2 .
Figure 2. The phylogenetic tree based on ADH gene sequences obtained by the maximum-likelihood method.The log-likelihood of the best ML tree is -3981.05.The numbers below the branches are the bootstrap values of 50% or more support.The ADH genes from plants roughly fall into two clades that we denoted as Clade I and Clade II.

Figure 3 .
Figure 3. Reconciled tree for the ADH plants family.The ML tree of ADH proteins (figure2) was reconciled using Notung 2.8 with a species tree complied from a phylogeny of model organisms.The reconciled tree involves 18 gene duplications (D) and 3 gene coduplication (cD).The solid red and pink boxes indicate gene duplications that were inferred on the basis of mismatches between the gene tree and the species tree.

Table 1 .
List of accession numbers of mRNA and protein for palms species used in this study.

Table 3 .
Homology percent of amino acid sequence between full length palm ADH protein and protein distance using Jones Taylor Thornton method of PHYLIP program.

Table 4 .
Divergences based on Kimura's two-parameter model, between palms ADH mRNA.

Table 5 .
Ratio of non-synonymous to synonymous substitutions rate among mRNA palms ADH