A clue for generating a new leucine-rich repeat gene in maize

Plant leucine-rich repeat (LRR) proteins play an important role in cell adhesion and signaling, neuronal development, disease resistance response and pathogen recognition. Therefore, the origin and evolution of the LRR genes had been studied deeply. However, there were no evidence of generating new LRR genes. In this study, the genomic and amino acid sequences of the LRR genes or proteins were aligned based on Maize GDB and NCBI databases. The result showed that the part sequences of GRMZM5G851515 and part sequences of GRMZM2G167560 were consistent with parts of GRMZM2G343449. It was indicated that GRMZM2G343449 generated from GRMZM5G851515, GRMZM2G167560 and other genes. Evolution analysis also supported that GRMZM2G343449 and homologous genes were newborn. Meanwhile, they existed in only angiosperm that is closely related to human life, suggesting that they might be retained through the artificial selection. GRMZM2G343449 and its homologous genes in Poales were an independent branch of evolution which might be related with environment adaptability because there were more diseases and insect pests in the growth and development of crops than those of other species. The results indicated that GRMZM2G343449 and its homologous genes generated from stress resistance. This study provided key information for finding new generated LRR, which might be a clue for searching newborn genes from maize and other species in the big data era currently.

PLRR had different types based on the other different conversed domains (Bella et al., 2008).PLRRs contained tandems of two or more LRRs forming the continuously expanding LRR superfamily (Buchanan and Gay, 1996).For example, there were six PLRR types, such as LRR-Transmembrane (T)-Protein kinase (PK) (LRR-T-PK), LRR-PK, NB-LRR, LRR-T, LRR, and ATPases (A)-LRR (A-LRR) in maize (Li et al., 2016).More than 200 PLRR were distributed on the whole chromosomes of maize (Li et al., 2016).LRR domain was an essential part of PLRR.It was one of the six prolific types of protein which was a widespread structural motif (Andrade et al., 2001).The solvent-exposed amino acid residues of LRR in some PLRRs were involved in recognizing pathogens (Bergelson et al., 2001).Therefore, the study of evolutionary dynamics of PLRR is important for plant disease resistance.
In the evolution of LRR, The LRR existed in Eubacteria, Archaebacteria, Protista, Fungi, Plantae and Animalia (Yue et al., 2012).The number of LRR in Eubacteria, Archaebacteria and Fungi was far less than that in Plantae and Aminalia (Yue et al., 2012).It indicated that the evolution tendency of the LRR was similar to the tendency of evolving organisms and the evolution analysis of the LRR was important for the whole kingdom.
Previous studies showed that LRR and NBS domains existed before the split of prokaryotes and eukaryotes and the fusion of LRR and NBS domain was observed only in land plant lineages (Yue et al., 2012).However, there were no evidences to explain how to generate a new PLRR currently.In this study, we used three maize proteins for explaining the generation of a new PLRR gene, although the evidences were not integrated.Meanwhile, we analyzed the elevation of the new generated PLRR.This study aims to provide a clue for other research on generating new genes in the era of big data currently.

Data sampling
For each species, protein entries matching the LRR domain of GRMZM2G343449 in the NCBI database (http://www.ncbi.nlm.nih.gov/guide/) were identified as LRRencoding protein using the blastp search with an E-value cut-off of 10 -4 (Eddy, 1998;Yue et al., 2012).For all LRR-encoding proteins in our data set, the amino acid sequences were aligned and constructed as a phylogenetic trees using MEGA6 with the bootstrap of 1000 replicates (Tamura et al., 2013).

New born for one PLRR based on the DNA and protein sequences
On the basis of gene-duplication events of LRR genes (Li et al., 2016), we only found that GRMZM2G343449 on chromosome 4 and GRMZM5G851515 on chromosome 2, their amino acid sequences of N-terminals were identical among the duplication LRR genes.The identical region consisted of 295 amino acids (Figure 1).The percentages of identical continuous amino acids were 89.9 and 74.1% in GRMZM2G343449 and GRMZM5G851515, respectively.GRMZM2G343449 possessing 328 amino acids was on bin4.06,whereas GRMZM5G851515 possessing 398 amino acids was on bin 2.04 (Table 1).Both of them had four introns and belonged to the LRR-type (Figure 2A and 2B).Identical amino acids were not equal to identical DNA sequence.Therefore, the nucleic acid sequences of GRMZM2G343449 and GRMZM5G851515 were subsequently aligned.The result showed that there were 1298bp identical nucleic acids between GRMZM2G343449 and GRMZM5G851515 at the 3'terminal except for six SNPs.Meanwhile, most different regions of the two proteins were in the first exon (Figure 3), suggesting that the first exon of GRMZM2G343449 or GRMZM5G851515 might derived from another encoded protein or or other unknown fragment on Chromosomes.

Looking for origins of generating one PLRR
To detect which gene was the origin, we blasted the DNA sequences of the two genes in NBCI and MaizeGDB databases, respectively.The references were the default values.On the basis of the about 13Gb genome from MaizeGDB that consist of four types of bases (A, T, C and G), more than 17 continuous DNAs might be a new gene using the power method of math.To a certain extent, continuous 170bp DNAs stand for a new gene in maize.There were no genes that had more than 170 bp identical DNAs to GRMZM5G851515.However, GRMZM2G167560 (NCBI accession number: EU972243) had 200 bp identical DNAs to 851th bp-1050th bp of GRMZM2G343449 (Figure 4), suggesting that GRMZM2G343449 might consist of GRMZM2G167560,   parts of GRMZM5G851515 and other DNA sequences with unknown origins (Figure 5).It is indicated that GRMZM2G167560 and GRMZM5G851515 existed before generating GRMZM2G343449.It was a clue for analyzing how new LRR or other genes were created in the period of big data.

The evolution analysis of the new generated PLRR
On the basis of an E-value cut-off of 10 -4 , 23 homologous genes of GRMZM2G343449 were from 22 species in angiosperm except for GRMZM2G343449 in maize, whereas were not from gymnosperm, bryophyte, fungi and animals (Figure 6).It indicated the GRMZM2G343449 and its homologous were new-born and angiosperm-specific.The 23 species were closely related to human life.Among these species, most of them were edible.Though monocotyledon was not separated from dicotyledon completely based on clustering result, the species in Poales were clustered into together, which were distinguish from other orders (Figure 6).It indicated that the GRMZM2G343449 and its homologous genes in Poales were relatively independent in the process of

DISCUSSION
LRR domain exists before the split of prokaryotes and eukaryotes (Yue et al., 2012).However, the LRR protein GRMZM2G343449 might be born in modern times which might be born after GRMZM2G167560, part of GRMZM5G851515.It might result from the help of transposons because transposons usually contributed into the generation of new genes.There was a large number of transposons in maize, such as Helitron transposons (Li and Dooner, 2009), Ac/Ds (Lazarow et al., 2013) and Mutator (Walbot et al., 1988).They might result into the generation of GRMZM2G343449.
In plant, LRR domains of several R-proteins is the major determinants of recognizing the specificity of Avr factors (Jones and Jones, 1997;Ellis et al., 2000;Leister and Katagiri, 2000).Amino acids in the LRR might also influence the interaction with host factors (Banerjee et al., 2001).Adaptive divergence among LRR proteins had been investigated in tomato (Parniske et al., 1997), rice (Wang et al., 1998) and Arabidopsis (Botella et al., 1998;McDowell et al., 1998;Noel et al., 1999).The LRR region often evolved at fast rates unusually (Bergelson et al., 2001).However, there were no direct evidences of evolution for LRR protein.In this study, the generation of GRMZM2G343449 provided the clue of evolution for LRR protein.It is a clue on how to born new LRR or other genes based on current big data.
Adaptive divergence and allelic polymorphism were the two types of evolutionary dynamics (Bergelson et al., 2001).The adaptive variants coexist with other alleles (Bergelson et al., 2001).R-proteins possessing LRR domains were usually associated with recognize pathogens (Bergelson et al., 2001).The generation of GRMZM2G343449 might result from the interaction of Figure 6.The phylogenetic tree of GRMZM2G343449 and its homologous genes.The NCBI ID of maize and pathogen.It indicated that GRMZM2G343449 might be associated with pathogen resistance of maize.
GRMZM2G343449 and its homologous genes existed in only 23 species of angiosperm, which does not exis extensively in angiosperm.These species were closely related to human life.It suggests that not only GRMZM2G343449 and its homologous genes might be new-born, but also they might be retained through the artificial selection.The homologous genes of GRMZM2G343449 in Poales were independent branch of evolution, it might be related with environment adaptability because there are more diseases and insect pests in the production of crops than in other species.It further indicated that GRMZM2G343449 and its homologous genes might be associated with stress resistance.

Conclusion
The current study revealed that GRMZM2G343449 might be generated from GRMZM5G851515, GRMZM2G167560 and other genes.Evolution analysis also supported that GRMZM2G343449 as an homologous genes might be newborn and retained through the artificial selection.GRMZM2G343449 and its homologous genes in Poales being an independent branch of evolution indicated that GRMZM2G343449 and its homologous genes generated from stress resistance.This study provided a clue for searching newborn genes from maize and other species

Figure 1 .
Figure 1.The amino acid sequence alignment of GRMZM2G343449 and GRMZM5G851515.The amino acids in Black background mean same amino acids between the two proteins.

Figure 2 .
Figure 2. The structures of genes and proteins of GRMZM2G343449 and GRMZM5G851515.A, The structure of genes GRMZM2G343449 and GRMZM5G851515.B, The structure of proteins GRMZM2G343449 and GRMZM5G851515.

Figure 3 .
Figure 3.The genomic alignment of GRMZM2G343449 and GRMZM5G851515.Black bases mean same bases between the two genes.

Figure 4 .
Figure 4.The genomic alignment of GRMZM2G343449 and GRMZM2G167560 at the 5' terminal.Black bases mean same bases between the two genes.

Figure 5 .
Figure 5.The mimetic diagram of generating GRMZM2G343449.?Means unknown genes or fragments on chromosomes.

Table 1 .
The details of GRMZM2G343449_T01 and GRMZM5G851515_T01 on chromosomes.