Whole genome homology-based identification of candidate genes for drought tolerance in sesame ( Sesamum indicum L . )

1 Centre d’Etudes Régional pour l’Amélioration de l’Adaptation à la Sécheresse (CERAAS), BP 3320 Route de Khombole, Thiès, Sénégal. 2 Laboratoire d’Ecologie Appliquée, Faculté des Sciences Agronomiques, Université d’Abomey-Calavi, 01 BP 526, Calavi, Bénin. 3 Laboratoire Campus de Biotechnologies Végétales, Département de Biologie Végétale, Faculté des Sciences et Techniques, Université Cheikh Anta Diop, BP 5005 Dakar-Fann, Dakar, Sénégal.


INTRODUCTION
Sesame (Sesamum indicum L., 2n = 2x = 26) is one of the most commonly grown oilseed crops a seed production of more than 4.8 million tons worldwide in year 2013 (FAOSTAT, 2013) and has been suggested as the most ancient oil crop (Nayar and Mehra, 1970).Its seeds are an important source of high-quality oil and contain natural antioxidants such as sesamin and sesamol (Zhang et al., 2013).Sesame is a good source of vitamins (pantothenic acid and vitamin E), minerals such as calcium (1.450 mg/100 g), phosphorous (570 mg/100 g) for human consumption and the seed cake is also an important nutritious livestock feed (Balasubramaniyan and Palaniappan, 2001).
It is mainly grown in tropical and subtropical regions of Asia, Africa and South America, in marginal lands or under very difficult conditions with drought, high temperatures, high solar radiation and high evaporation demand which make sesame a drought tolerant plant (Langham, 2007;Witcombe et al., 2007).Despite its tolerance, drought is one of the most important environmental factors that limit sesame production by affecting the number of capsules per plant, grain yield as well as oil yield and quality depending on the genotypes and drought intensity (Betram et al., 2003;Hassanzadeh et al., 2009;Bahrami et al., 2012).Recently, drought will be a serious threat in the coming decades as the Intergovernmental Panel on Climate Change (IPCC) has concluded that elevated greenhouse gas concentrations are likely to lead to a general drying of the subtropics by the end of this century, creating widespread drought stress in agriculture (IPCC, 2007).Therefore, improvement of drought tolerance in sesame genotypes is one of the major objectives of sesame breeding programs which can be achieved by integrating new approaches (Pathak et al., 2014).In the past years, many investigations have been carried out to enhance our understanding on the genetic basis of drought tolerance by using the genomics, transcriptomics and transgenesis approaches in the model plant Arabidopsis thaliana (Shinozaki et al., 2003;Jang et al., 2004Jang et al., , 2007;;Ramirez et al., 2009;Lata et al., 2011;Harshavardhan et al., 2014).These studies showed that the main genes involved in drought tolerance were transcription factors (TFs).In Arabidopsis, about 1,500 TFs are considered to be involved in stresses response (Riechmann et al., 2000) including drought.So far, many drought associated genes have been identified including TFs belonging to basic leucine zipper (bZIP), AP2/EREBP, ABA-binding factor (ABF), MYC, MYB, NAM, ATAF1-2, NAC, CCAATbinding and zinc-finger families and have been characterized in detail (Abe et al., 1997;Bartels and Sunkar, 2005;Sakuma et al., 2006;Nakashima et al., 2014;Mondini et al., 2012Mondini et al., , 2015)).
The availability of genome sequences in a number of plant species combined with comparative genomics analysis can improve our understanding of the fundamental aspects of plant biology including the identification and analysis of genes involved in adaptive traits of crops (Foucher et al., 2003).In fact, plant genomes share extensive similarities known as synteny, even between distantly related species (Guyot et al., 2012).Through comparative analysis against the Arabidopsis genome, many functional genomics regions and candidate genes such as flowering time FLC genes (Schranz et al., 2002), clubroot resistance genes (Suwabe et al., 2006), aliphatic glucosinolate biosynthetic pathway (Bisht et al., 2009) and genes for male fertility (Ashutosh et al., 2012) have been identified in Brassica.Similar strategies have been used to predict stressresponsive TFs in soybean, maize, sorghum, barley and wheat based on Arabidopsis and rice genome analyses (Mochida et al., 2009;Tran and Mochida, 2010b).In addition, alike analyses have been performed in tomato (Solanum lycopersicum) and potato (Solanum tuberosum) two economically important and naturally drought sensitive crops (Li et al., 2013;Obidiegwu et al., 2015) belonging to the Asteridae subclass which includes sesame, leading to the identification of drought tolerant genes in these crops (Reiter and Vanzin, 2001;Vasquez-Robinet et al., 2008;Evers et al., 2010;Anithakumari et al., 2011Anithakumari et al., , 2012;;Solankey et al., 2014).Thus, it is well documented that using synteny approach in closely related species is suitable for the identifying orthologous genes (Rubin, 2001).
The identification of drought related candidate genes in sesame will provide useful information for its improvement.In the best of our knowledge no data have been reported regarding drought tolerance genes identified in sesame.Based on the sesame genome sequence recently released by Wang et al. (2014), a set of candidate genes in whole genome of sesame were identified in this study through homology search of known drought associated genes from three relatives species, viz.tomato, potato, and Arabidopsis and these genes were analyzed for further functional and validation experiments.

Comparative genomics and genes expression assay
A local bank with the retrieved sequences was generated in order to make searches for similar sequences against the sesame genome (Wang et al., 2014) using the BLASTn and tBLASTn algorithms (Altschul et al., 1990) for DNA and protein sequences, respectively with a cut-off of 1e -30 .A threshold value of 70% identity was considered as significant level (Roy et al., 2011).After removing redundant genes, analyses of the candidate drought related genes in the whole genome of sesame were carried out including their identification, classification in functional groups, sequences analysis and chromosomal location.
The sesame genes with unknown functions were submitted to the AutoFACT program (Koski et al., 2005), and annotated according to the data available in the largest functional annotation databanks (KEGG, PFAM, SMART).The homologous genes found in the sesame genome were mapped onto the 16 Linkage Groups (LGs) according to their physical positions using MapChart 2.3 (Voorrips, 2002).The comparative orthologous relationships of the candidate drought associated genes among sesame and Arabidopsis were illustrated using Circos program (Krzywinski et al., 2009).To find out the whole genome AP2/ERF genes, the Hidden Markov Model (HMM) profile of the AP2/ERF domain (PF00847) obtained from Pfam v28.0 database (http://Pfam.sanger.ac.uk/) (Finn et al., 2014) was searched against the sesame proteome using Unipro UGENE (Okonechnikov et al., 2012).
Furthermore, a drought stress experiment was carried on to assess the expression of 6 AP2/ERF genes retrieved.For that, two contrasting sesame accessions (LC164-drought tolerant) and (hb168-drought sensitive), previously studied by Boureima et al. (2012), were sown in pots (25 cm diameter and 30 cm depth) filled with a mixture of soil, sand, and compost (5:2:2, v/v/v).The seedlings were grown and watered normally during 21 days before applying drought stress by withholding water for 5 days.At this stage, all plants were transferred under a plastic rain shelter.Total RNA of drought-stressed sesame seedlings were extracted from leaves using Trizol Reagent (Invitrogen, USA) according to the manufacturer's protocol and digested with DNase I (MBI, USA) to remove the genomic DNA contamination.One microgram RNA was reverse transcribed using the Reverse Transcription System (Promega).The semi quantitative reverse-transcription PCR (RT-PCR) amplification was carried out using gene specific primers (Table 1) and the cDNA libraries synthesized by using the following protocol: 4-min incubation step at 95°C for complete denaturation, followed by 30 cycles consisting of 95°C for 30 s, 55°C for 30 s, 72°C for 30 s and the final cycle at 72°C for 5 min.RT-PCR products were run on 2.0% (w/v) agarose gel, stained with ethidium bromide (5 µg ml -1 ) and the expression of each gene was qualitatively evaluated after imaging under UV light (Kodak EDAS 290).

RESULTS AND DISCUSSION
The identification of drought tolerance candidate genes which have a high potential to be used for breeding drought tolerant crops presents a challenge (Krannich et al., 2015).For species with largely unexplored genomes such as sesame, comparative genomics is a promising tool to gain information by utilizing the conservation between closely related plant species (Akpinar et al.,

Locus
Primer code Forward sequence (5'-3') Reverse sequence (5'-3') ).Hence, comparative genomics will broaden the ability to transfer information from model plants to other species that are fundamental to food production or as a source of alternative energy (Ma et al., 2012).From a total of 2,495 sequences downloaded from Arabidopsis, tomato and potato genomes, 75 candidate genes were found in the whole genome of sesame with high identity (Figure 1).Among these candidate genes, 42, 22 and 11 genes were homologous to Arabidopsis, potato and tomato genes, respectively.Arabidopsis, potato, tomato and sesame belong to the Asteridae subclass which includes nearly 60,000 species.Lee et al. (2005) reported that genes involved in adaptive processes tend to be highly conserved.Therefore, interspecies sequence comparison is a powerful tool to extract functional or evolutionary information from the genomes of organisms (Chiba et al., 2008).After functional annotation and classification, only 2 candidate genes with unknown functions remained (Table 2).The whole set of candidate genes could be classified into 2 main categories according to earlier reports on osmotic stress responsive genes in Arabidopsis (Seki et al., 2002(Seki et al., , 2003): (a) genes which protect the plant against drought effect and (b) signal transduction genes and TFs.Drought tolerance is a quantitative trait that exhibits complex genetic control (Mc William, 1989).It greatly affects the plant both at the micro (that is, membrane structure), and at the macro level (that is, the physiology of the whole plant), with results that reflect the variety of responses involved in the acquisition of tolerance (Soares et al., 2012).The complexity of this trait explains why there is slow progress in crop improvement in drought-prone areas (Cattivelli et al., 2008).The candidate genes were mapped onto the 16 Linkage Groups (LGs) in the sesame genome (Wang et al., 2014) (Figure 2).All LGs were represented and the distribution of these genes was globally uneven.However, some gene clusters existed on LGs 1, 3, 5, 6, 7, 8, 10 and 14.Recent works of Wei et al. (2015) on MADS gene family in sesame also find similar clustering patterns of some MADS genes along 14 LGs.In fact, there is evidence that functionally related genes tend to cluster more commonly than expected by chance (Boutanaev et al., 2002;Cohen et al., 2002).Our results suggested that these clustering regions of the genome might be highly active in drought tolerance in sesame.The maximum number of genes (10; 13.16%) was localized on LG 3, whereas LGs 13 and 16 have the lowest number of genes (1; 1.3%).
To trace orthologous relationships of the candidate genes associated to drought, the physically mapped candidate genes of sesame were compared with those of Arabidopsis since most of the genes in Arabidopsis have been functionally characterized (Figure 3).According to Wang et al. (2014), sesame and Arabidopsis share more than 2,200 homologous genes.Forty three orthologous gene pairs were detected between Arabidopsis and sesame, including 43 sesame candidate genes and 32 Arabidopsis drought associated genes.Most of the candidate genes in sesame showed syntenic bias towards the chromosomes 1, 2 and 4. The comparative mapping information offers a useful preface for understanding the evolution of genes among different species.
Many drought stress associated genes encode TFs that in turn control other various genes involved in diverse physiological and molecular responses to drought stress.TFs are therefore good candidates for genetic engineering to improve crop tolerance to drought because of their role as master regulators of clusters of genes (Rabara et al. 2014).AP2/ERF transcription factors are reported to be involved in drought stress in many plants (Licausi et al., 2010;Lata et al., 2014;Rabara et al., 2014).Whole genome scanning of AP2/ERF genes resulted in 132 putative AP2/ERF genes.Regarding the importance of this gene superfamily in abiotic stress tolerance in plants, many AP2/ERF genes were expected to be found among the set of candidate genes identified in this study.Only five candidate AP2/ERF genes (3.8%) were retrieved as probably associated to drought tolerance in sesame (Figure 4), suggesting that these AP2/ERF genes should be targeted for drought research in sesame.According to the classification of Sakuma et al. (2002), the AP2/ERF genes found in this study, were classified into the ERF (2), AP2 (2) and DREB (1) subfamilies, respectively.
Although, the power of similarity-based gene discovery at a genome scale has been demonstrated in many works and partially reviewed by Windsor et al. (2006), the importance of functional characterization cannot be ignored.An expression profiling of these AP2/ERF genes under drought stress in two contrasted sesame lines through RT-PCR were further performed.Gene expression patterns are usually closely correlated with their functions (Peng et al., 2015).One primer pair (SSAp5) designed for the gene LOC105157874 did not amplify any of the two accessions suggesting probably an inadequate primer.Three out of the four remaining genes expressed highly in the drought tolerant material compared to the sensitive one under drought stress (Figure 5).The expression level of the gene LOC105160523 was more striking in drought tolerant material compared to the sensitive one whereas the gene LOC105162917 showed similar expression pattern under both water regimes.The BLASTp search against Arabidopsis genome showed that the highest expressed gene LOC105160523 is the orthologs of CBF4 (AT5G51990) described as regulator of drought adaptation in Arabidopsis (Haake et al., 2002) suggesting that this gene plays a pivotal role in drought tolerance in sesame.Since, few sesame accessions have been used in this study, it was proposed that these genes should be more deeply studied on a large sample of contrasted materials to uncover their biological roles in drought tolerance in sesame.Our results corroborate well with that of Kamvysselis (2003), who reported that comparative genomics analysis can reveal biological findings that could not have been discovered by traditional genetic methods, regardless of the time or effort spent.Sesame is an oil crop that contributes to the daily oil and protein requirements of almost half of the world's population (Wei et al., 2015).One of the major constraints for its production is drought as it is mainly grown in semi-arid areas.Functions of most sesame genes are still uncharacterized.Hence, the identification and functional analysis of valuable genes in sesame genome is necessary for its improvement.Since reports on drought associated genes in sesame are lacking, this study provided, a set of candidate genes spanning the whole genome and including different functional genes for drought research in sesame using comparative genomic approach.Further and thorough functional experiments including transgenic studies could rely on these gene resources to validate their functions and decipher the mechanisms of drought tolerance in sesame.

Figure 1 .
Figure 1.Phylogenetic relationships of the species studied and number of sequences downloaded.

Figure 3 .
Figure 3. Syntenic relationships of drought associated genes between Arabidopsis and sesame genomes.Chr1~Chr5 represent pseudo-chromosomes of Arabidopsis genome and represented by gray bars.LG01~LG16 represent linkage groups of sesame genome and drawn in green bars.Colorful lines stand for the relationships of orthologous gene pairs between the two species.

Figure 4 .
Figure 4. AP2/ERF transcription factor genes found within the candidate genes compared to whole genome AP2/ERF genes.

Table 1 .
Primers used for the RT-PCR.

Table 2 .
List and functions of orthologous genes retrieved from sesame genome.
Figure 2. Distribution of the candidate genes on the sesame linkage groups.LG1~LG16 represent linkage groups of the sesame genome.Locus names in green, blue and red indicate orthologous genes of Arabidopsis, tomato and potato, respectively.