Genetic diversity, taxonomy and legumins implications of seed storage protein profiling in Fabaceae

Proteomic evidences can be pivotal to the discovery of new plant proteins and plant relationships, due to the diversity of form it can reveal. Seed storage protein profiles of 20 Fabaceae species: 4 grainlegumes and 16 non-pulses; of 16 genera and 10 tribes were analysed by sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) to estimate protein content diversity and the possible genetic relatedness. 28.3% similarity and 71.7% proteomic polymorphism was scored for the species. The high variability expressed by the lot reflects the genetic diversity amongst Fabaceae population. Dendrogram based on the proteomic data clustered the species into four groups. Aside two species, Albizia lebbeck and Albizia zygia belonging to the tribe Ingeae and those of the tribe Caesalpinieae, the other species clustered with several other non-traditional cohorts resulting in a rearrangement that showed least semblance with phylogenetic relationships based on traditional morphology taxonomic delimitation. The similarity in profiles can be preliminarily forensic for proteins of importance whether for nutritional, industrial or for improvement of existing crops or for entirely new plants as crops. The protein mix, and the resultant relationship based on seed storage proteins instigates a review of erstwhile taxonomic, agricultural and research perspectives for the Fabaceae.


INTRODUCTION
Legumes vary from annual and perennial herbs to shrubs, trees, vines/lianas, and even a few aquatics; in size from some of the smallest plants of deserts and arctic or alpine regions to the tallest of rain forest trees as well as constitute conspicuous, and often dominant component of most of the vegetation types distributed throughout temperate and tropical regions of the world (Rundel, 1989). Legumes are particularly diverse in tropical forests and temperate shrub lands with a seasonally dry or arid climate. This preference for semiarid to arid habitats is related to a nitrogen-demanding metabolism (Sprent and McKey, 1994;Sprent, 2001).
Taxonomically, the family Fabaceae is traditionally divided into three subfamilies, the Caesalpinioideae, Mimosoideae and Papilionoideae; a recognition that is based mainly on floral characteristics with 39 tribes and some 670 genera recognized (Polhill and Raven, 1981;Polhill, 1994). However, recent update of the tribal and generic re-evaluation of the classification of Fabaceae, have resulted from more than 10 years of intensive molecular phylogenetic studies; recognizes 36 tribes, 727 genera and 19,327 species (Lewis et al., 2005).
Seed proteins are physiologically stable, easy to handle and they operate at the level of gene product where the environment has very little influence (Javaid et al., 2004;Iqbal et al., 2005). These proteins are expressed form of genome and can be used as biomarkers and were properly traced between specimens, they can lead to the identification of proteins of industrial, medicinal or nutritional prospective applications.
Exploring plants' genetic potential is becoming a frontline study, because the information it provides on depleted gene pool of cultivated plants and the genetic erosion that have accompanied human developmental tendencies, are integral to making informed decisions about plant protection, conservation and improvement programs.
The objective of the present study was to evaluate the genetic diversity, taxonomical relationships, and possible exposition of important proteins among 20 Fabaceae species, using SDS-PAGE analysis of seed storage proteins.

Samples and protein extraction
Seed samples of 20 species of the plant family Fabaceae were received from field surveys collections of eight states in southern Nigeria (8.5°N -6.45 o N; 3.38°E -7.5°E); and seed lot from Boanisscher Garten, Botanischer Museum, Berlin Germany (Table  1). Seed whole protein extraction was carried out with Norgen Allin-One purification kit® Norgen Biotek Corporation. The flowthrough proteins content were stored at -200°C until separation analysis. Gel was fixed with a 500 mL of USP-grade 95% (v/v) ethanol in water and stained in 0.1% (w/v) Coomassie blue R350, 20% (v/v) methanol and 10% (v/v) acetic acid and afterward resultant banding were capture with a digital transilluminator.

Analysis
The numbers of monomorphic and polymorphic protein bands were scored for each sample based on staining intensity as: low, medium and high, which is an index of the subunit constituents of the protein bands. A similarity matrix based on Jaccard's similarity coefficient was generated from protein bands scored as 0 (absent) and 1 (present) and followed by a distance matrix and analysed using SPSS 15.0 for Windows. A hierarchical cluster was generated from the similarity matrix and compared with a previous similarity cluster generated from morphological data according to Polhill (1994).

RESULTS
The SDS gel electrophoresis of reproducible storage proteins for the 20 Fabaceae species resulted in bands that ranged from molecular weights of 14 to >100 kDa. Eighty nine (89) bands were detected (Figure 1).
The total seed storage proteins segregated into six distinct groups based on molecular weight of the proteins, ranging from 14 to 24, 25 to 30, 31 to 40, 41 to 62, 63to 100 and >100 kDa proteins. Each group of protein phenotypes consist of several subunits, from which specific proteins can be identified (Kottapalli et al., 2008). Higher polymorphism was recorded for proteins ranging from 31 to 62 kDa, accounting for 43.5% of the total protein bands. The least polymorphism and numbers of protein bands (9.6%) were recorded for the lower molecular weight (<24 kDa) proteins (Table 2). This indicates that the Fabaceae may predominantly express medium molecular weight genome products.

Protein profile similarity index
Analysis of the Jaccard similarity coefficient and distance for the protein profile resulted in a mean similarity of 28.3% and thus 71.7% (approximately 72%) dissimilarity. The pairwise analysis of the species (20) against the protein groups (6) recorded the least (5%) similarity and the highest (50%) similarity. Likewise, a high mean Jaccard distance of 0.668 was recorded for the taxa proteins profiled, an indication of the degree of dissimilarity between the taxa.

Cluster analysis
A dendogram of the protein polymorphism generated resulted in a phylogenetic tree construct that highlighted the clumping of the taxa into related groups based on the protein polymorphism data. Using average linkage distance, the hierarchical cluster was grouped into four distinct clusters (Figure 2) defining the phylogenetic relationship at intra-subsectional for the taxa studied.
The dendogram at 40% cluster distance revealed four clusters. The first cluster (Group A) with eight members:

Genetic diversity
Protein homology (28.3%) was observed across the taxa studied, which represent the measure of similarity and thus the degree of inter-specific closeness amongst the species. 71.7% polymorphism was recorded for the seed lot. These results confirm as expected, some degree of closeness between the taxa as well as reveals the level of variations that can exist among species members of the same family (Carmona et al., 2010). The taxa studied presented three categories of high, moderate and low molecular weight polypeptides. Recording 89 bands and a similarity of 28.3 and 71.7% protein polymorphism, the protocol applied could be useful for species/cultivar identification and protein markers generation for the family, particularly with majority of the protein homology within the 30 to 40 and 41 to 62 kDa molecular weight protein range ( Figure 1 and Table 2). SDS-PAGE studies recording protein similarity and differences across different species, accessions have been highlighted by Valizadeh (2001) for grain legumes in Iran, Ghafoor and Ahmad (2005) for Vigna mungo, Ishtiaq et al. (2010) for Ranunculaceae and De Britto et al. (2012) for Apocynaceae. These studies including the present one, establishes that profiling seed storage protein regarded as independent of environmental fluctuations using SDS-PAGE is a reliable tool for investigating intra and inter-specific variations in plants (Hames, 1990;Carmona et al., 2010;Tchiagam et al., 2011).
The degree of polymorphism recorded shows the diversity of DNA products (proteins) within the Fabaceae and offers potentials for research, industrial and economic applications of such proteins when deciphered.

Taxonomic characterization
Taxonomically, the family Fabaceae have been traditionally divided into three subfamilies; Caesalpinioideae, Mimosoideae and Papilionoideae; and according to last formal classification by Polhill (1994), prior to the advent of molecular evidences, 39 tribes and 670 genera were determined for the family.
In the present study, emerging groups (A to D) in the dendogram from the Jaccard's similarity matrix revealed a divergence from the traditional taxonomic status of the Fabaceae by majority of the species studied. Comparatively, only two (A. lebbeck and A. zygia) of the 20 species maintained parallel positions in the phylogenetic trees (Figure 3) from the morphological (Polhill, 1994) and the proteomic evidences (Figure 2). In Figure 3. Traditional delimitation of the 20 Fabaceae species into three sub-families and ten tribes based on morphological characteristics according to Polhill (1994). addition, the species A. hypogaea that is traditionally grouped in the sub-family Papilionoideae; increasing segregates from the group, forming an out-group (Group D) with D. guineense. In the seed protein profile of 11 grain legumes, Valizadeh (2001), recorded similar (single-member) out-group for A. hypogaea.
The species studied shared morphological characteristics that warranted their groupings, such as the tree habit for the tribe Cassieae; long cylindrical indehiscent pods for the genus Cassia and members of the tribe Phaseoleae are grain legumes with edible seeds. However, the present protein profile reflects a considerable re-alignment of the taxa across sub-family and tribal boundaries, which suggest that SDS-PAGE seed storage proteins profiling is a reliable tool for taxonomic studies and the evidences from such studies can lead to the identification of new groups and in some cases the delimitation into a new clade for species like A. hypogaea. When added to the growing pool of molecular (genomic and proteomic) evidences, these data will allow for better clarification and taxonomic review of the family, Fabaceae.

Single proteins and leaf protein concentrate applications
Comparative analysis of the study taxa profile highlighted possible legumins of importance. In the present study, D. guineense, clustered with A hypogaea and earlier profiling and identification of A hypogaea proteins (Kottapalli et al., 2008) recorded Arachi-specific as well as universal single polypeptides ranging from 18.0 to 96.6 kDa (Javaid et al., 2004). Two of such protein spots aligned with similar spots in D. guineense with molecular weight of 27.8 and 32.14 kDa corresponding to the proteins glycinin, and Gly1 (Catsimpoolas et al., 1971;Kottapalli et al., 2008). These 11S-globulin proteins have been implicated in reduction of cardiovascular, proatherogenic factors and incidence of chronic diseases (Duranti, 2006;Fassini et al., 2011). While genomic mapping may be required to mark the precise genes coding for such proteins, quantitative proofs using techniques like protein profiling are fundamental to such molecular studies.
Storage protein profiling of Fabaceae seeds from the study highlighted a high degree of genetic diversity within the Fabaceae. The level of inter-specific variations may warrants a review of the traditional taxonomic delimitations in the family. The comparative protein banding underscores possible important species for single proteins like glycinin as well as for LPC production.