Functional genomics in forage and turf-present status and future prospects

The recent advances in plant genomics have greatly influenced basic research in many agriculturally important crops. Along with the availability of complete genome information from the model grass species rice and the model legume species M. trucncatula, the functional genomics activities in other crop species will accelerate the genomics studies of forage and turf. Brachypodium distachyon was recently proposed as a new model plant for forage and turf grass genomics studies. The combination of bioinformatics and genomics will enhance our understanding of the molecular functions of forage and turf species. This review focuses on recent advances and applications of functional genomics for largescale EST projects, global gene expression analyses, proteomics, and metabolic profiling, as well as the impact of functional genomics on improvement of forage and turf crops.


INTRODUCTION
Over the past decade, major advances have been made in genomics of model plants in the areas of structural, comparative, and functional studies.The growing array of sequence-based tools is helping to reveal the organization, evolution, synergic relationships, and gene functions of plant genomes.Genomics efforts are now turning to a number of agriculturally important organisms to enhance our understanding of agriculture-related biology.A draft of the sequence of the rice genome has recently been released, and the target date for the final version is 2004 (Goff et al., 2002;Yu et al., 2002).The sequencing of the genome of the model legume Medicago truncatula is also in progress (Bell et al., 2001).
The Poaceae family includes all major cereal crops and forage grasses (Keller and Feuillet, 2000).Approximately 10,000 species in this family are classified into 650 to 785 genera (Watson and Dallwitz, 1992).However, only 40 species are currently used for forage and turf purposes (Moser and Hoveland, 1996).Due to their ability for adaptation, grasses occupy twice the land area than that of grain crops worldwide (Jauhar, 1993).The cultivated grasses used as forage or turf provide tremendous benefits to humans, including livestock and wildlife sustenance, soil conservation, and environmental protection.Therefore, they have high economic values compared with other crop species.Considering the productivity of livestock depending largely on forage utilization, the cash value of forage species far exceeds that of any other crop in the United States of America.(Barnes and Baylor, 1995).Turf grass constitutes the second-largest seed market in America surpassed only by hybrid seed corn (Lee, 1996).Among all the grasses in Poaceae, fescues, ryegrasses, bentgrasses, bluegrasses, bromegrasses, orchardgrass, bermudagrass, and Panicum spp.are the most intensively used forage and turf species.
Legumes are the other plant family used as forage.With more than 650 genera and 18,000 species, legumes are the third-largest family of higher plants and are second only to grasses in agricultural importance (Doyle, 2001).They form symbiotic relationships with members of the Rhizobium family of bacteria (Schultze and Kondorosi, 1998), together producing enormous quantities of biological nitrogen.Legumes are a major source of organic fertilizer and provide the largest single source of vegetable protein in human diets and in livestock feed (Young et al., 2003).
Traditionally, plant genetic research has involved a "one gene at a time" approach to study the functions of genes.This type of research has led to breakthroughs in biotechnology with major advances in medicine and other fields.However, the species of Poaceae have a wide range of genome size (100 to 100,000 Mbp) with large repeated sequences present in the genomes (Draper et al., 2001).Most likely, a genetic network comprising several or many genes with interactions determines all aspects of biology in these plant species.Functional genomics is a general approach toward understanding how the genes of an organism work together by assigning new functions to unknown genes.This approach generally combines the systematic analyses of gene expression, protein synthesis/degradation, and changes in the metabolism of plants.Some techniques have already become routine tools for analyzing gene function, whereas others are still in their infancy.However, comprehensive genomics studies will provide us with large amounts of functional information on genes that regulate traits of agricultural importance, such as yield, environmental stress tolerance, and disease resistance.
Considering the genetic complexity of forage and turf, relatively little investment has been directed to the improvement of these economically important species.Functional genomics of forage and turf lag far behind other major crop species.The establishment of colinearity (synteny) among grass genomes may help in the molecular analysis and manipulation of grass genomes (Wang et al., 2001).Unfortunately, many of the fundamental tools required for forage and turf to benefit fully from the revolution in genomics do not exist or are incomplete.Filling these critical gaps in forage and turf genomics will allow geneticists and breeders to take advantage of the existing genetic variation, exploit advances in genomics of other grasses, and make genomics information useful for improvement of forage and turf grass species.
In this review, we focus on recent studies in forage and turf functional genomics.The importance and limitations of interdisciplinary studies for functional genomics including ESTs, expression profiling, proteomics, metabolomics, and bioinformatics in forage and turf will be discussed.

EXPRESSED SEQUENCE TAGS (ESTs)
The hypothetical function of an unknown gene may be predicted from its sequence structure using the known functions of similar genes by comparison.In addition, the location of a given gene in a chromosome may allow speculation with respect to gene function by comparing the function and chromosomal location of genes with similar sequences (Holtorf et al., 2002).
In order to obtain insight into transcribed portions of plant genomes, numerous large-scale EST sequencing projects have been successfully launched for grass species.This approach of randomly selecting and sequencing a large set of cDNA clones allows putting together a collection of sequence fragments of expressed genes.The EST sequence data from different grass genomes are constantly being compared and refined due to their high conservation.Within the Poaceae family, rice (Oryza sativa) is used as the model plant because of the small genome size (400 Mb) (Arumuganathan and Earle, 1991).Due to the same reason, there are fewer than 260,800 ESTs generated in rice.Another Poaceae species, wheat (Triticum aestivum), has approximately 500,000 ESTs generated by September 2003 because of its large genome size at 16,000 Mb (Arumuganathan and Earle, 1991).Wheat has the highest number of released ESTs among all plant and grass species in the GenBank dbEST database, behind human, mouse and rat.Compared with model crop species, forage and turf have smaller numbers of ESTs.Table 1 lists the number of public entries of ESTs from species commonly used as forage and turf, which are summarized from GenBank dbEST (Boguski et al., 1993).Only nine species have ESTs in GenBank dbEST database -and with fairly small numbers, except for M. truncatula.M. truncatula, a nitrogen-fixing legume, has close phylogenetic relationships with many agronomically important legume crop plants such as pea, lentil, and alfalfa.M. truncatula has a small diploid genome (450 Mbp) and has been developed into a model for legume genetics and Poa secunda big bluegrass 30
Although tall fescue (Festuca arundinacea Schreb) has high agricultural importance, there is no EST data available in the public domain.Starting in 2000, a largescale tall fescue EST project was initiated at the Samuel Roberts Noble Foundation.The goal of this large-scale EST sequencing is to catalogue the majority of the expressed genes of tall fescue, including tissue-specific, developmental stage-specific, and stress-specific genes.This EST database will serve as a starting point for gene expression and regulation studies in tall fescue as well as the related Festuca-Lolium species.So far, approximately 40,000 ESTs have been generated and will be submitted to the GenBank dbEST database soon.

MOLECULAR MAPPING AND REVERSE GENETICS
Genetic maps are being extensively used in studies of genome organization and the dissection of complex traits for plant breeding.Detailed genetic maps based on molecular markers have been developed in a number of forage grasses, including tall fescue (Xu et al.,1995), alfalfa (Kaló et al., 2000), ryegrass (Jones et al., 2002), and meadow fescue (Alm et al., 2003).Locating genes on genetic maps provides the opportunity for transfer of genetic information among grass species by comparative genetics approaches, particularly from the complete genome sequence of rice.
Comparative genetics is especially important for forage and turf, for which little genomic information is available.Detailed genetic maps of these species allow the alignment of chromosomal regions to model grasses, such as rice, to rapidly identify chromosomal regions containing QTL (quantitative trait loci) and genes of interest.The first comparative mapping study between forage grasses and cereals has been conducted between Lolium with Triticeae, oat and rice (Jones et al., 2002).They found that the genetic maps of perennial ryegrass and the Triticeae cereals are highly conserved in terms of orthology and colinearity.In 2002, the molecular genetic linkage map of M. truncatula was established, which will be a useful tool for comparative legume genomics and isolation of agronomically important genes (Thoquet et al., 2002).
Once genes are identified, the next step is to identify their function by characterizing a mutant that lacks the gene of interest.To study higher plant gene function, reverse genetics have been applied via the gene knockout approach.A large number of T-DNA and transposon insertion lines have already been generated for rice (Hirochika, 2001).Other methods that are used to produce mutants with silencing-related phenomena are co-suppression (Wesley et al., 2001), virus-induced gene silencing (Lacomme et al., 2003), double-stranded RNAmediated interference (Waterhouse and Helliwell, 2003), and even targeted mRNA degradation by enticing the endogenous and ubiquitous RNase P (Pulukkunat et al., 2003).All these approaches are likely to be applied for investigating plant gene function in a high-throughput manner.
Several studies have been conducted using genetics approaches to investigate gene functions of forage and turf species -for example, studying nitrogen remobilization in leaf senescence with stay-green mutants of Festuca and Lolium (Thomas et al., 2002), using mutants of ryegrass pollen allergen, Lol p 5, for reduced IgE-binding capacity (Swoboda et al., 2002), and characterizing chitinase function on suppressing summer patch disease in Kentucky bluegrass with mutant strain C5 of Stenotrophomonas maltophilia (Kobayashi et al., 2002).Although these studies are far from highthroughput in manner, they provided some basic knowledge on the functional aspects of forage and turf genes.

GENE EXPRESSION ANALYSES ON MICROARRAYS
The systematic, nonbiased, and large-scale acquisition of data using array-based technology provides a global overview of biological mechanisms.Expression studies using microarray produce massive quantities of data, which provide key insights into gene function and interactions within and across metabolic pathways (Young, 2000).At present, both cDNA microarrays and oligonucleotide microarrays are used for gene expression monitoring in forage and turf functional genomics.Recently, this approach has been used to quantify differential gene expression by hybridizing a complex mRNA-derived probe onto an array of PCR products of 91 putative nodule-specific consensus sequences of M. truncatula for identification of nodule-specific transcripts (Fedorova et al., 2002).Suzuki et al. (2002) has conducted a preliminary study on gene expression analysis of saponins biosynthesis in M. truncatula.They found induction of the gene transcripts encoding three early enzymes of triterpene pathway in a cell culture system, which provides a new tool for saponin pathway gene discovery by DNA array-based approaches (Suzuki et al., 2002).The flexible nature of the fabrication and hybridization methods of cDNA microarrays allows the application of the technology to non-model organisms, e.g., forage and turf species.A 16 K optimized oligonucleotide array set has been developed from M. truncatula by Qiagen Operon in collaboration with the Noble Foundation and is commercially available to the scientific community.
However, comprehensive arrays may not be available for most of the forage and turf species anytime soon because of their large genomes and small collection of sequence information.A cDNA library enriched for genes involved in a certain biological problem can be used to make a dedicated array (Yang et al., 1999).Approaches designed to enrich for targeted clones and perform preselection for identifying differentially expressed genes includes techniques such as suppression subtractive hybridization (SSH) (Diatchenko et al., 1996) or representational difference analysis (RDA) (Welford et al., 1998).These strategies allow maximal focus on the tissue of interest and identify genes that are generally not in public databases.Recently, SSH, expression profiling, and EST sequencing were combined for identification of 12 plant genes and six fungal genes that are expressed in the arbuscular mycorrhizal symbiosis between M. truncatula and Glomus mosseae (Brechenmacher et al., 2003).A similar study was also conducted using SSH to clone a number of new arbuscular mycorrhiza-regulated genes in M. truncatula (Wulf et al., 2003).Recently, SSH was conducted in our group to study the molecular basis of heat tolerance/acclimation in tall fescue.Three subtractions were conducted between samples of heat tolerant and sensitive genotypes collected after 12 h of exposures to 39, 42 and 44 o C. A total of 2,495 ESTs were generated.The unique gene transcripts will be arrayed to monitor gene expression profiling and to identify heat tolerance-related genes (unpublished data).
Although use of gene expression arrays are common, it has been realized that it is not simply mRNA levels but also the amount and modification of expressed proteins within the particular cellular context that determine true effect of genes.Therefore, it is important to couple transcriptome data to other functional knowledge derived from protein and metabolite analyses.

PROTEOMICS AND METABOLOMICS
Proteomics is the study of proteins expressed at a given time or under certain environmental conditions (Wilkins et al., 1997).According to Holtorf et al. (2002) proteomics can be divided into three main areas: large-scale identification of proteins, identification of proteins' response to any biological variations (e.g., hormones, environmental changes and/or genetic mutations), and studies of protein-protein interactions.By gathering all information available, a proteome reference map for M. truncatula root proteins (http://semele.anu.edu.au/2d/2d.html) was established using two-dimensional gel electrophoresis combined with peptide mass fingerprinting to aid the dissection of nodulation and root developmental pathways by proteome analysis (Mathesius et al., 2001).The proteomic approach was used to investigate seed development in M. truncatula at specific stages of seed filling corresponding to the acquisition of germination capacity and protein deposition (Gallardo et al., 2003) and identify symbiosis-related proteins from M. truncatula by two-dimensional electrophoresis and mass spectrometry (Bestel-Corre et al., 2002).
It is believed that Arabidopsis contains more than 5,000 different low molecular-weight compounds with the majority of these compounds being products of secondary metabolism (Roessner et al., 2001).Metabolites are the end products of cellular regulatory processes, and their levels can be regarded as the ultimate response of biological systems to genetic or environmental changes.Analyzing a large subset of defined or undefined metabolites is attractive because a biochemical phenotype can be detected directly by coupling chemical analysis with genetic analysis (Sumner et al., 2003).Therefore, the term of metabolomics describes recent high-throughput approaches in the field of metabolic genomics that aim to identify gene function on the basis of analyzing the full complement of metabolites of an organism (Fiehn, 2002).
Unlike other functional genomics approaches, the unbiased simultaneous identification and quantification of plant metabolomes has been largely neglected due to technical difficulties.Until recently, most analyses were restricted to profiling selected classes of compounds or to fingerprinting metabolic changes without sufficient analytical resolution to determine metabolite levels and identities individually (Fiehn, 2002).This technique was used for detecting certain metabolites present as a marker in endophyte-infected forage grass.For example, lolicines A (1a) and B (2a), late-eluting lolitrem-like compounds, were identified in extracts of perennial ryegrass (Lolium perenne) seed that was infected with the endophytic fungus Neotyphodium lolii by mass spectrometry and one-and two-dimensional NMR spectroscopy (Munday-Finch et al., 1998).Despite the problem of high metabolic variability even among genetically identical plants, promising technologies have been developed to date, including gas chromatography coupled with mass spectrometry (GC/MS) and highperformance liquid chromatography (HPLC) (Bailey et al., 2000), combination of GC and HPLC with time-of-flight (TOF) mass spectrometers (Glassbrook and Ryals, 2001), and nuclear magnetic resonance (NMR) spectroscopy-based techniques (Bligny and Douce, 2001).Metabolite profiling was used for identifying triterpene saponin glycosides in alfalfa and M. truncatula using an optimized, reversed-phase HPLC with on-line photodiode array detection and electrospray ionization mass spectrometry method (HPLC/PDA/ESI/MS) (Huhman and Sumner, 2002).

Brachypodium Distachyon -A NEW MODEL SYSTEM FOR GRASS FUNCTIONAL GENOMICS
Use of model eukaryotic organisms to aid studies on species of significant commercial or biological interest has increased tremendously in the past two decades.Studies on the model plant species Arabidopsis thaliana and rice have boosted the researches on biology of many important agricultural species.M. truncatula, a close relative of an important forage legume, alfalfa, has been chosen as a model species for legume genomic studies due to its small diploid genome, short generation time, self-fertility and high transformation efficiency (Cook, 1999).
Rice (Oryza sativa) has a compact genome (Bennett et al., 2000).Cultivated as a staple food source and with many years of intensive breeding, rice has been promoted as a model for cereal genomics (Goff, 1999).However, rice is phylogenetically distant from the Pooideae subfamily that includes most forage and turf grasses (Catalan and Olmstead, 2000).Rice does not exhibit all the traits (e.g., resistance to specific types of pathogens, over-wintering and freezing tolerance, vernalization, perenniality, grazing tolerance, sward behavior, and post-harvest biochemistry of silage) that are relevant to forage grasses.In addition, a relatively large plant with a long life cycle and demanding growth requirements, rice may not be compatible for highthroughput functional genomics (Goff, 1999).
Brachypodium distachyon, a new model for grass functional genomics, was described by Draper and his colleagues (2001).Various molecular phylogenetic analyses have demonstrated that the genus Brachypodium diverged from the ancestral stock of

Zhang and Rouf Mian 525
Pooideae immediately prior to the radiation of the modern "core pooids" (Triticeae, Bromeae, Poeae, and Aveneae), which includes the majority of important temperate cereals and forage grasses (Shi et al, 1993;Catalan and Olmstead, 2000).Draper et al. (2001) also reported that diploid ecotypes of B. distachyon (2n =10) have five easily distinguishable chromosomes that display high levels of chiasma formation at meiosis.The B. distachyon nuclear genome was indistinguishable in size from that of Arabidopsis, making it the simplest genome described in grasses to date.Furthermore, B. distachyon is a selffertile, inbreeding annual with a life cycle of less than four months.These features, coupled with its small size (approximately 20 cm at maturity), lack of seed-head shatter, and undemanding growth requirements make it amenable to high-throughput genetics and mutant screens (Draper et al., 2001).
B. distachyon has displayed many traits that are relevant for forage grass improvement (Draper et al., 2001).For example, selected B. distachyon ecotypes were resistant to all tested cereal-adapted Blumeria graminis species and cereal brown rusts (Puccinia reconditia).In contrast, different ecotypes displayed resistance or disease symptoms following challenge with the rice blast pathogen (Magnaporthe grisea) and wheat/barley yellow stripe rusts (Puccinia striformis).Despite its small stature, B. distachyon has large seeds that should be suitable for studies on grain filling (Khan and Stace, 1999).Further features include freezing tolerance, perenniality, repetitive injury (mowing and trampling) tolerance, meristem dormancy mechanisms, post-harvest biochemistry of silage and hay, mycorrhizae, and sward ecology.All of these are important traits to forage and turf species not exhibited by or difficult to study in rice, but are possible targets for functional genomics in Brachypodium.Therefore, B. distachyon was proposed as a complementary model to rice for functional genomics to study temperate cereals and forage grass traits that are not well exhibited by rice (Draper et al., 2001).

CONCLUSION AND PERSPECTIVE
The functional genomics studies in forage and turf have increased significantly over the past few years.Many studies are being conducted presently in forage and turf crops using high-throughput genomics technologies.A good summary of such activities was presented at the Third International Symposium on Molecular Breeding of Forage and Turf (MBFT) held in May 2003, in Dallas, Texas, U.S.A.A large number of studies presented at this symposium included genomic approaches, e.g., using microarray to study bermudagrass resistance to fungal infection, heat tolerance in tall fescue, aluminum tolerance in M. truncatula, genome-wide gene expression profiling and promoter discovery of ryegrass and clover, tall fescue/ryegrass symbiosis with endophytes, N. coenophialum and N. lolii, lignin modification in M. truncatula, and genetic diversity in indigenous Australian grasses and legumes.
Functional genomics include three phases: building of resources, implementation of analytical methods integrating parallelism and increased throughput, and standardization and integration of diverse datasets (Hilson et al., 2003).For most of the forage and turf species, functional genomics is still in the first phase of building genetic and genomic resources.Building resources is time consuming and expensive, therefore, inter-institutional cooperation to establish optimized protocols and standardized applications will enhance the success of studies on forage and turf.
With the great advancement in high-throughput genomics technologies, biological experiments now generate huge amounts of data, ranging from gene sequences to expression profiles, and protein structures.Therefore, development of computational tools for the conversion of the data to usable knowledge is required for functional genomics.Many database systems have been established based on genomics and genetics studies for various plant species (Perrin and Wigge, 2002).Gramene (http://www.gramene.org) is a relational database and Web site that serves as a comparative mapping database for the grasses and a community resource for rice (Ware et al., 2002).It combines sequence databases of cereals, ESTs, genetic maps, map relations, a database of rice mutants (genes and alleles), molecular markers, and proteins, and publications.Gramene can serve as a foundation for the study of comparative genomics of agriculturally important members of the grass family, including forage and turf grasses.The M. truncatula genomics database (http://www.noble.org/medicago)was set up by the Samuel Roberts Noble Foundation to integrate all biological information generated from this model legume by functional genomics studies.This database covers the updated information on M. truncatula genomics research, and has links to other Web databases (e.g., M. truncatula Gene Index at http://www.tigr.org/tdb/mtgi).

Table 1 .
Number of ESTs from forage and turf species in GenBank dbEST database (by September 5, 2003).