Characterization and comparison of key genes involved with flowering time regulation from Arabidopsis thaliana, Oryza sativa and Zea mays

1 State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing 400715, China. 2 Engineering Research Center of South Upland Agriculture of Ministry of Education, College of Agronomy and Biotechnology, Southwest University, Chongqing, P. R. 400715, China. 3 Centre for Integrative Legume Research and School of Agriculture and Food Sciences, The University of Queensland, Brisbane 4072, Australia. 4 Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana 141001, India. 5 Key Laboratory of Crop Physiology, Ecology and Genetic Breeding, Ministry of Education, Jiangxi Agricultural University, Nanchang 330045, China.


INTRODUCTION
The onset of flowering is one of the most important, complex stages in the life cycle of angiosperms and it marks the transition from vegetative to reproductive phase.Flowering time affects sexual reproduction of plants, seed and fruit development and consequently plant productivity.For example, the flowering locus CA (FCA) gene not only accelerates the flowering but also affects root development (Macknight et al., 2002).Similarly, days to heading on chromosome 8 (DTH8) regulates both flowering time in the photoperiod pathway as well as plant height and yield potential (Wei et al., 2010).Appropriate flowering time not only facilitates niche farming but also helps to realize the genetic potential of crops.Increasing numbers of flowering genes have now been cloned, especially in Arabidopsis thaliana and rice.Furthermore, sketch maps of the regulation network of key flowering genes are now available in A. thaliana (Wellmer and Riechmann, 2010;Jung and Muller, 2009), rice (Komiya et al., 2009).However, the common characteristics, divergence and evolutionary differences between key flowering candidate genes remain unknown.
In A. thaliana, four pathways simultaneously control flowering.These include, the photoperiod pathway, the vernalization pathway, the autonomous pathway, and the gibberellin pathway (Jung and Muller, 2009;He and Amasino, 2005;Boss et al., 2004).In fact, these four pathways mutually interact to regulate flowering with some genes acting as bridges.For example, the autonomous and vernalization pathways are integrated by FLC gene (Kim et al., 2009;Amasino, 2010).The SOC1 (Lee et al., 2000), and FT (Halliday et al., 2003) genes are also regarded as integrators between the temperature-and light-signaling pathways, with only scattered studies on other pathways (Komiya et al., 2009).There is now significant interest in the core genes or integrator genes in these pathways because the loss or alteration of these key genes can greatly influence flowering time.
The few comparisons of homology between flowering genes have shown that the flowering genes are highly conserved across plant species.For example, the Hd3a gene in rice is highly homologous to the FT gene in A. thaliana (Kojima et al., 2002).Hd1 is also closely related to the A. thaliana flowering time gene CO (Yano et al., 2000).These results suggested that the GI-CO-FT pathway in A. thaliana is conserved in rice (OsGI-Hd1-Hd3a).Taylor et al. (2010) also found that the flowering genes, AcGI and AcFKF1 in onion are homologous to genes involved in photoperiod regulation of A. thaliana.These results suggest that extensive homology of flowering genes exists across species, but degree of key genes across diverse species is still unknown.
In the present study, a comprehensive comparative analysis of key flowering genes was performed in the three typical plant species A. thaliana, Oryza sativa, and Zea mays.The objective of this comparison was to reveal the extent of sequence conservation, parallel nucleotide divergence, and functionality of key flowering genes between these species.The results are expected to aid in molecular investigations into trait variation, evolutionary patterns and regulatory mechanisms of flowering genes in other less-investigated plant species.

Mining sequences of key flowering genes
Two approaches were followed to collect information on the flowering genes from A. thaliana, O. sativa, and Z. mays.Initially, the first batch of flowering genes of each species was retrieved and downloaded from National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/) by the key word "flowering" and the organism name.Three basic screening criteria were used: (1) Selection of full-length cDNAs of flowering genes rather than truncated genes; (2) removal of duplicated genes derived from different databases in the NCBI website to retain only one set of genes, and (3) involvement of selected genes in flowering time regulation rather than in floral organogenesis.Finally to mine more putative flowering genes, the first batch of downloaded flowering time genes (full-length cDNA sequences) were queried in each of the three species against the remaining two species using BLASTN with e<10 -30 and > 75% identity (alignment length *100 / query length).Both the hits and the first batch of original flowering genes were used for further investigation.BioEdit (Hall, 1999) was used to analyze the length of flowering gene sequences.

Comparative homologies of flowering genes
To compare the homology of flowering genes from A. thaliana, O. sativa, and Z. mays, flowering genes from each species were used as queries to perform BLASTP (http://www.ncbi.nlm.nih.gov/)queries against flowering genes of the remaining two species.If two genes from different species showed more than 75% (alignment length *100/query length) identity with e<10 -20 , they were considered homologous genes.An open source platform for complex network analysis and visualization, Cytoscape (http://www.cytoscape.org/)(Angelovici et al., 2009) was used to produce an intuitive graphical representation of the homology of the flowering genes.

Nucleotide polymorphism analysis of key flowering genes
Key flowering genes are generally located in the central node positions of regulatory networks.TBLASTN (http://www.ncbi.nlm.nih.gov/)analysis was conducted against gene databases of other species except for A. thaliana, O. sativa and Z. mays to acquire homologies of other species at 75% (alignment length*100/query length) identity and e<10 -20 .Each flowering gene and flowering gene homologue was used to perform DNA sequence polymorphism analysis (DnaSP, http://www.ub.edu/dnasp/, version 5.10.01)(Rozas and Rozas, 1999).The characteristics of homologue divergence were calculated by nucleotide variation patterns, including π value (a parameter which is used to illustrate insertion/deletion (InDel) diversity per site and the nucleotide diversity) (Nei, 1987;Tajima, 1983), the total number of mutations (θ) (Watterson, 1975), Tajima's D value, which is used to determine whether sequence divergence is consistent with the neutral evolution model (Tajima, 1989) and InDel information (Jander et al., 2002).All parameters were set to default commands in accordance  with the standard procedures for DnaSP.

Detection of simple repeat sequences in flowering genes
To judge whether flowering genes harbored abundant simple repeat sequences and to analyze the possible relationship between variations of flowering genes and simple sequence repeats (SSRs) Locator (Da Maia et al., 2008) was used to detect SSR loci in flowering genes of the three species with the parameters set at default values.The minimum numbers of repeats used to identify presence of SSR loci were 20, 10, 7, 4, 5, 5, 3, 3, 3, and 3 for monomers, dimers, trimers, tetramers, pentamers, hexamers, heptamers, octamers, nonamers, and decamers, respectively.A space of less than 100 and more than 5 between imperfect SSRs was permitted.The number and frequency of SSRs in all known genes of the three species were also analyzed for comparison.

Functional annotations of flowering genes
To determine whether flowering genes from each species showed similar spatiotemporal expression (Conesa and Gotz, 2008;Conesa et al., 2005), functional annotations were performed by Blast2GO (http://www.blast2go.org/)to categorize flowering time genes by cellular component, biological process, and molecular function.Blast2GO is a popular tool to annotate, visualize and analyze DNA sequences.Blast2GO analysis comprises five steps: (1) Blasting: a set of DNA sequences are blasted against the NCBI database, (2) mapping: the blast results are matched with GO terms (cellular component, biological process, and molecular function) using annotation files provided by the GO Consortium, (3) annotation: certain annotation rules are used annotate sequences, (4) enrichment analysis (optional): statistical analysis of different annotation results and (5) visualization: display of annotation and statistics results by the GO DAG (GO directed acyclic graph).Figure 1 shows the schematic diagram of Blast2GO annotation for the flowering time genes.

Mining flowering genes of A. thaliana, O. sativa, and Z. mays
The entire set of flowering genes from each species was obtained using two approaches.Initially, 40 genes in A. thaliana, 101 genes in O. sativa, and 84 genes in Z. mays were collected by searching with the keyword "flowering" and the species name in the NCBI website.Finally, 1 gene in A. thaliana, 9 genes in O. sativa, and 15 genes in Z. mays were added to the first batch of flowering genes of the three species by BLASTN analysis.Forty-one flowering genes were thus collected in A. thaliana, 110 in O. sativa, and 99 in Z. mays.Average lengths of cDNA sequences were 2206.0,1896.3, and 1454.2 bp in A. thaliana, O. sativa, and Z. mays, respectively.The average protein lengths were 586.5, 448.5, 337.3 amino acids (aa) in A. thaliana, O. sativa, and Z. mays, respectively (Table 1).The number of flowering genes from NCBI found by the species name and "flowering". b The number of flowering genes obtained by BLASTN analysis.

Comparative homologies of flowering time regulatory genes
Homologous relationships between the three species are presented in Figure 2.These comprised a total of 130 nodes and 627 edges, comparing 27 (62.8%),87 (81.3%), and 16 (16%) nodes in A. thaliana, O. sativa, and Z. mays, respectively.Twenty-two of these (16.9%) node genes (marked in yellow) were inferred to be critical flowering genes due to the high degree of homology in these genes between the three species.These genes are likely to become principal targets of research in species for which whole genome sequence is not available.
The proportion of homologous flowering genes between any two of the three species is depicted in Figure 3. A. thaliana flowering genes had 77.5% homologous identity with O. sativa, and 80.0% homologous identity with Z. mays (average 78.8%).O. sativa flowering genes shared 50% identity with A. thaliana and 60% with Z. mays (average 55%).Z. mays flowering genes shared 35.7% identity with A. thaliana and 64% identity with O. sativa (average 49.9%).Thus, the average homology between the flowering networks of the three species was 61.2%, suggesting that the components of the regulatory networks of flowering genes were highly conserved during species evolution.

Nucleotide polymorphism analysis of the key genes
In order to find out whether the key genes had homologous sequences in species other than the three target species, the BLASTP analysis with default parameters was performed in NCBI.Each key gene shared sequence homology with about15 to 67 different species, with an average of 45 species (Figure 4).For example, homologies for SOC1 (A. thaliana) were found for at least 51 species.We also analyzed the number of homologues that each key gene had within each species.On average, only one homologue of a key gene was found in 59.3% of species, two homologues in 22.6% of species and three or more homologues in 18.1% of species.
Each of the 22 key genes and their corresponding homologues were analyzed for divergence and evolution by DNAsp.The InDel information for nucleotide diversity and the Tajima's D information for the neutral evolution model are presented in Tables 2 and 3, respectively.The diversity index of these key flowering genes ranged from 0.15675 to 0.52519 with an average of 0.27983, exhibiting a high level of variation.The average InDel length event was 5.594 (range = 5.594 ± 5.450).The average InDel length was 7.323 (range = 7.323 ± 10.421).The statistics of the Tajima's D test (Table 3) showed that the key flowering genes were under negative selection pressure, was although this effect was not significant for all key genes.

Functional annotations
To assess whether the flowering genes could have similar expression across different species, Blast2Go analysis was performed to annotate the flowering genes.flowering genes, respectively.Apart from these three larger subclasses, other subclasses, such as the macromolecular complex, membrane, and cytoskeleton showed clear differences among the three species.For example, 23.2 and 9.8% of the flowering genes of A. thaliana and O. sativa, respectively, were expressed in the macromolecular complex, whereas 2.9 and 8.3% of A. thaliana and O. sativa functioned in membrane subclass.
For the annotation of biological processes, flowering genes were divided into nine subclasses (Figure 6b).The largest subclass was metabolic process, containing 33.2, 57.9 and 80.0% of the flowering genes of A. thaliana, O. sativa, and Z. mays, respectively.The next largest subclass was biological processes, occupying 17.2, 15.0 and 16.0% of A. thaliana, O. sativa, and Z. mays genes, respectively.In A. thaliana and O. sativa, 24.9 and 6.5% of genes respectively fell into the subclass of cell division, whereas 13.0 and 6.5% of the flowering genes of A. thaliana and O. sativa The key flowering genes were used to perform TBLASTN analysis to acquire homologous genes from other species.The hit and key flowering genes were used together to perform Tajima's D test.Not significant, p > 0.10.
were involved in the defense subclass of defense.The remaining subclasses were inconsistent.Overall, most flowering genes participated in the metabolic process subclass.
In the annotation of the molecular function, the flowering genes were assigned to six subclasses (Figure 6c).The subclass with the highest proportion of genes was binding, which covered 35.0, 54.9 and 69.7% of the flowering genes of A. thaliana, O. sativa, and Z. mays.Second, catalytic activity occupied 40.8 and 13.7% of A. thaliana and O. sativa genes respectively.The third largest subclass was molecular transducer activity, which constituted 11.0% of A. thaliana flowering genes and 8.4% of O. sativa genes.The remaining minor subclasses mainly included transcription regulation and electron carrier activities.The results indicated that the main molecular function of the flowering genes is binding.

DISCUSSION
The dissection and comparison of the flowering genes of different plants is important for understanding molecular mechanisms behind the regulatory network of flowering.In the present    study, we characterized and compared the flowering genes of A. thaliana, O. sativa, and Z. mays.Nucleotide polymorphism analysis of key genes revealed universal sequence conservation during evolution, though some small InDels were also detected.Furthermore, many of these flowering genes harbored simple repeat sequences and these repeats perhaps drove the variation and divergence of flowering genes.These results will identify key flowering genes, facilitating their efficient cloning and an in-depth understanding of the molecular mechanism of flowering.Approximately 61.2% homology was detected between key flowering genes in the three species.It has been previously observed that 80.6% of the genes of A. thaliana were homologous to those of O. sativa (the number of homologous genes/ the number of genes of A. thaliana), whereas only 49.4% genes of O. sativa are homologous to those of A. thaliana (the number of homologous genes/ the number of genes of O. sativa) (Yu et al., 2002).A large proportion of rice genes have no recognizable homologs, possibly accrued from a gradient in the GC content of rice coding sequences (Yu et al. 2002).In our studies, we observed that A. thaliana genes had 77.5% identity with O. sativa, whereas only 50% of the flowering genes of O. sativa were homologous to those of A. thaliana.The overall homology observed in our study (77.5%) is close to the previously produced genome analysis figure (80.6%), supporting our approach for the comparative analysis of flowering genes between the three species.Twenty-two (16.9%) of genes were considered to be highly conserved key genes in the regulatory network of flowering genes (marked by yellow in Figure 2).These key genes also had counterparts in many other species with an average of 45 homologues per gene.This result indicates that these conserved genes have some relative conservation and that the rapid spread of the ancestral genes across more species may be related to gene functionality and species adaption to environments (Blanc and Wolfe, 2004).Previous studies in O. sativa (Tsuji et al., 2011) andB. vulgaris (Abou-Elwafa et al., 2011) have indicated that a certain degree of gene differentiation helped facilitate the evolution of unique genes that were more suitable for specific ecological niches.Tajima's D values of some genes indicated that these genes were subjected to negative selection and were not significant (Baudry et al., 2004;Tajima, 1989;Carlson et al., 2005).
Gene annotation studies suggested that most of flowering genes across different species displayed high functional similarity.There were obvious biases of flowering genes in cellular components, biological process and molecular function classifications: the organelle was the main cellular component in which the flowering genes of A. thaliana and Z. mays operated, whereas the largest proportion of O. sativa genes operated in the cellular component.Metabolic processes were the main biological process of the three species.In addition, binding was the main molecular function for most of the flowering genes of O. sativa and Z. mays.For A. thaliana, the main molecular function appeared to be catalytic activity.These results were consistent with the previous research outcomes on the floral regulation pathway, which suggest that photoreceptors (Simpson and Dean, 2002) and specifically phytochromes, cryptochromes, phototropin, and circadian clock regulators (Dunlap, 1999), function in the organelles for many steps in the photoperiod pathway.The autonomous flowering pathway controls flowering time by transcriptional modification of chromatin and RNA (Simpson, 2004;He et al., 2003).DNA and histone methylation are involved in the vernalization pathway (Jung and Muller, 2009), and flower development is also regulated by the activity of gibberellin in the gibberellin pathway (Davidson et al., 2005;He and Amasino, 2005).All of these mechanisms are associated with metabolic processes and binding functions.
Gene polymorphism includes both length polymorphism and nucleotide-composition polymorphisms.Gene divergence directly results in the diversification of biological process to adapt to the surrounding biological environment.Sixty-nine SSR loci were detected in 250 flowering genes with an average of 0.276 SSRs per gene, which suggested that flowering genes are SSR enriched.SSR's are among the most mutable tracts in plant genomes (Tautz, 1989) and are not only found frequently within genes, including within protein-coding regions, untranslated regions, and introns, but are also common in transcribed spacer regions (Morgante et al., 2002).Some SSR mutations can result in the alteration of the gene function (Li et al., 2004;Kashi and King, 2006).A high ratio of trinucleotide-motif SSRs in flowering time genes in A. thaliana and O. sativa suggested that mutations in these genes increase and decrease the number of repeats to change a few amino acids might be tolerated, as these mutations do not thoroughly change the composition of proteins through frameshift mutations.However, in Z. mays, the mononucleotide-motif SSR A was the predominant SSR motif type (68.0%).The results suggest SSR enrichment in flowering genes has distinct characteristics and may be related to flowering genes mutation.Flowering is a complex trait controlled by multiple genes and is generally influenced by environment.The variability of SSRs in flowering genes may create allelic variation in species, allowing swift response to changing external environments (Kashi et al., 1997).Thus, SSRs of flowering genes may be an internal force to drive the evolution and variation of these genes.Such abundant SSRs of flowering genes can also be used to design SSR primers as functional markers for use in gene mapping and marker-assisted selection in plant breeding.
In summary, we collected the flowering genes of A. thaliana, O. sativa, and Z. mays and compared the commonalities and divergence points between them with respect to structure, nucleotide variation pattern and trait differentiation.We also posit the possible reason for the differences between the flowering genes, and describe the functional annotation of the flowering genes in detail.Overall, these flowering genes or regulatory network showed higher conservation and mutation comprised the variation of mostly small fragments.Therefore, for a species without available genome sequence, it is possible to select and clone some key flowering genes by way of homology with established genome sequences, to achieve rapid understanding of the mechanism of flowering genes.

Figure 1 .
Figure 1.Schematic representation of Blast2GO annotation.GO annotations are generated through a five-step process: blast, mapping, annotation, enrichment analysis (optional) and visualization.

Figure 1 .
Figure 1.Schematic representation of Blast2GO annotation.GO annotations are generated through a five-step process: Blast, mapping, annotation, enrichment analysis (optional) and visualization.

Figure 2 .
Figure 2. The similarity relationship analysis results for the regulatory networks of A. thaliana, O. sativa, and Z. mays.The edges (blue lines) suggest the relationship between two homologous node genes.A total of 22 (16.9%)node genes are marked by yellow and were inferred to be critical flowering genes (Edges  14).The pink nodes indicate the common flowering genes between these three species (Edges < 14).

Figure 3 .
Figure 3.The pair-wise proportions of homologous flowering-related genes between A. thaliana, O. sativa, and Z. mays.The x-axis represents the three different species, and the y-axis denotes the proportion of homologous flowering genes between each two species.

Figure 4 .
Figure 4.The number of species, which have identified homology, sequences with the key flowering genes other than A. thaliana, O. sativa, and Z. mays.The x-axis represents the proteins related to flowering genes in the three species, and the y-axis denotes the number of other species which have homologues to key flowering genes of these three species.

Figure 5 .Figure 6 .
Figure 5.The percentage of flowering related genes with SSR loci.The x-axis represents the three different species, and the y-axis denotes the percentage of genes with SSR loci.

Table 1 .
The basic statistics of flowering-related genes of A. thaliana, O. sativa, and Z mays.

Table 2 .
DNA polymorphism in 22 groups of key flowering related genes.
a The key flowering genes were used to perform TBLASTN analysis to acquire homologous genes from other species.The hit and key flowering genes were used together to perform DNA polymorphism analysis.* p < 0.05.Not significant, p > 0.10.0% of A. thaliana, O. sativa, and Z. mays flowering genes, respectively.The intracellular localization subclass also constituted 22.8, 17.3 and 20.0%, of A. thaliana, O. sativa, and Z. mays

Table 3 .
Tajima's D information for neutral versus selective bias in DNA sequence evolution. a