A DNA-barcode for Melia volkensii Gürke ( Meliaceae ) and its phylogenetic relationship with some economically important relatives

The study reports the first DNA-barcode and molecular phylogeny of the East African endemic tree species Melia volkensii using the standard two-locus plant barcoding genes (rbcL and matK). The two genes were amplified and the PCR products sequenced. Complete coding sequences were obtained for both genes. The edited and aligned sequences had lengths of 1371 bp for rbcL and 1524 bp for matK. These DNA sequences were deposited into the DNA Data Bank of Japan (DDBJ) with cross-listing in the European Molecular Biology Labaratory (EMBL) and GenBank databases. The deposited gene sequences were then subjected to separate nucleotide BLASTs in NCBI’s GenBank database. Out of 100 Blast results in which the query (M. volkensii) had 96–100 percentage similarity in nucleotide sequence for the rbcL gene and 90-100% similarity for the matK gene, only 16 taxa had data for both rbcL and matK genes. These 16 taxa were used for the phylogenetic analysis and comprised of 6, 9 and 1 taxa respectively from the families Meliaceae, Simaroubaceae and Rutaceae. The barcode allowed adequate discrimination of the taxa into their respective generic and species clades. Availability of a barcode for M. volkensii will ease identification of the species, provide more robust phylogenetic reconstructions and allow for better tracking of its exotic dispersal.

The primary objective of the study was to develop a DNA barcode sequence for M. volkensii.DNA barcoding is the use of nucleotide diversity within a short standardised region of DNA for identification of species (Hebert et al., 2003;Kuzmina et al., 2012;Vijayan and Tsou, 2015).DNA barcoding provides an automated species identification system that is quicker and more reliable than traditional taxonomic methods which rely on morphological characters (Newmaster and Ragupathy, 2009).DNA barcodes can not only resolve phylogenies of plant taxa but are also useful in ecological forensics such as the tracking of illegal trade in plant products (Kress et al., 2015).Other applications of a DNA barcode include monitoring of exotic dispersion, conservation impact assessments, authentication of parts used in preparation of herbal medicine and botanical pesticides, such as tree barks, fruits and leaves (Ferri et al., 2008;2015;Kritpetcharat et al., 2011;Mankga et al., 2013;Mishra et al., 2016).
Until recently, DNA barcoding of plants was hampered by the lack of a standard region of DNA with sufficient universality, sequence quality and species discrimination power (Hollingsworth et al., 2011).The long search for a universal plant barcode culminated in the adoption of a two-locus barcode consisting of the phylogenetically conserved gene for the large subunit of the chloroplast enzyme ribulose-1,5-bisphosphate carboxylase/ oxygenase (rubisco), also known as rbcL, and the more rapidly evolving chloroplast gene for maturase K (matk) (Kress et al., 2009).The 2-locus combination of rbcL and matK genes was adopted by the Consortium for the Barcode of Life Plant Working Group (CBOL, 2009) as the standard or core barcode for land plants.
The rbcL gene is a chloroplast gene of approximately 1400 bp that codes for the large subunit of rubisco, the enzyme that catalyzes carbon dioxide fixation in chloroplasts.The matK gene, approximately 1500 bp, is located within a 2,400 bp group II intron of the chloroplast trnK gene which codes for the transfer RNA for lysine (Johnson and Soltis, 1994;Vogel et al., 1997;Steane, 2005;Hausner et al., 2006;Barthet and Hilu, 2007).It codes for maturase K, an enzymatic protein that allows the intron to remove itself for the two exons of the trnK gene to be spliced together.
A secondary objective of the study was to use the novel barcode sequences in a preliminary phylogenetic study of the Meliaceae and related families.A molecular phylogeny based on DNA barcoding could clarify evolutionary relationships between both the well-known and lesser known members of the family.
This study reports the first DNA barcode for Melia volkensii.The availability of such a barcode for the species is will enable faster and accurate identification of the species and a more robust reconstruction of phylogenetic relationships in the family.This will provide insights on the phylogenetic affinities between M. volkensii, well-known members of the family such as A. indica, M. azederach and S. macrophylla and the lesser known ones.Phylogenetic affinities at the family and generic levels could also reveal closely related families and genera for novel bio-prospecting for compounds of pharmaceutical and pesticidal importance similar to those found in some members of the Meliaceae.

Plant materials and DNA extraction
DNA was extracted from shoot tips of 20 M. volkensii seedlings obtained from seeds collected from Mavuria provenance in Mbeere, Embu county, Eastern Kenya 37° 39.308'E).
DNA Extraction was done using the Cetyltrimethylammonium bromide (CTAB) method of Doyle and Doyle (1987), with slight modifications, which were addition of 10% sodium dodecyl sulphate to extraction buffer, centrifugation at 16,000g instead of 6,000g and washing of the DNA pellet with 70% ethanol instead of a mixture of 76% ethanol and 10mM ammonium acetate.
Isolation of M. volkensii maturase-K chloroplast gene (matK) was also carried out in a 25 μl volume reaction.The expected fragment size was 1500 bp (Fazekas et al., 2012).The primers used were Matk1f (5'ACTGTATCGCACTATGTATCA3') and Matk1r (5'GAACTAGTCGGATGGAGTAG3'), also sourced from Inqaba Biotec South Africa.The reaction mixture contained 1 unit of MyTaq ® DNA polymerase (Bioline, USA); 1x Mytaq ® buffer (Bioline, USA) containing 3mM MgCl2 and 2 mM dNTPs; 0.4 μM of forward and reverse primers, 1 μl of DNA template and brought to the total volume of 25 µl with nuclease-free water.Amplification was done on a MJ Research PTC-100 USA thermal cycler with conditions set at 95°C for 1 min, 20 cycles of 95°C for 15 s, 45°C for 15 s, 72°C for 1.5 min, followed by another 20 of cycles of 95°C for 15 s, 55°C for 15 s, 72°C for 1.5 min and a final extension at 72°C for 5 min.PCR products were purified with EXO/SAP Amplicon purification kit (Affymetrix, Santa Clara, USA).Purified PCR products were sequenced by Inqaba Biotec South Africa using The BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems,USA) with ABI Prism 377 DNA sequencer (Applied Biosystems,USA).The same primers used for the PCR reactions were used in sequencing reactions.

Database deposition and phylogenetic reconstruction
M. volkensii rbcL and matK novel sequences were checked for quality and ambiguous nucleotides resolved in MEGA6 software suite (Tamura et al., 2013).Identical sequences were obtained for each gene.Processed sequences of the two genes were deposited in the DDBJ/EMBL/GenBank databases.They were assigned the following accession numbers: LC075516 for rbcL and LC075517 for matK.
The sequences were then used to carry out two separate GeneBank nucleotide BLASTs.The first set of 100 Blast hits gave 96-100 percentage similarity in nucleotide sequence for the rbcL gene and 90-100% similarity for the matK gene between the query (M.volkensii) and the respective Genbank sequences of members of Meliaceae, Simaroubaceae and Rutaceae families.However, retrieved taxa having sequence data for both rbcL and matK genes were only 16, with the rest of the taxa having data for either rbcL or matK.Since the study intended to use both the barcoding genes separately and after concatenation, phylogenetic reconstruction was limited to the sequences of these 16 taxa.Sequence names, database codes, accession numbers, native distribution and uses of the selected species are listed in Table 1.
The retrieved database sequences were also checked for quality and ambiguous nucleotides resolved in MEGA6 software suite (Tamura et al., 2013).Multiple sequence alignments were performed in MEGA6 software suite using the MUSCLE algorithm (Edgar, 2004) and the aligned sequences used for phylogenetic reconstruction.The evolutionary history was inferred using the maximum likelihood method based on the General Time Reversible (GTR) model (Nei and Kumar, 2000).Initial trees for the heuristic search were obtained automatically by applying Neighbour-Join and BioNJ algorithms to a matrix of pair-wise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value.The tree with the highest log likelihood was selected.A total of 1000 bootstrap replicates were performed (Felsenstein, 1985).Phylogenetic trees were edited in FigTree 1.4 (Rambaut, 2012).

RESULTS AND DISCUSSION
PCR amplification was 100% successful for both genes.Gel electrophoresis gave highly resolved bands of ≈ 1400 bp for rbcL and ≈ 1500 bp for matK, as expected (Figure 1).Sequencing success was 95% for both genes, with edited sequence lengths of 1371 bp for the rbcL gene and 1524 bp for matK.These sequences were successfully deposited in the DDBJ/EMBL/GenBank databases and assigned the accession numbers LC075516 (rbcL) and LC075517 (matK).To the best of our knowledge, they are the first barcode deposits for M. volkensii in these databases.
The BLASTs retrieved taxa belonging to three families: Meliaceae, Simaroubaceae and Rutaceae.This is in agreement with previous reports about the taxonomic proximity of these families (Wiart, 2006).However, most of the taxa had sequence data for either rbcL or matK but not both.Therefore analysis was limited to the 16 closely related taxa which had sequences for both the rbcL and matK genes.These consisted of 6 members of the Meliaceae family, 9 members of Simaroubaceae and 1 member of Rutaceae (Table 1).Consequently, phylogenetic reconstruction was severely constrained by the limited nature of the data retrieved from the databases.A more comprehensive molecular phylogeny of the Meliaceae will be possible only when more sequence data becomes available in these databases.Since the family Meliaceae consists of an estimated 51 genera and 550 species (Wiart, 2006), there is a vast scope for an expanded molecular phylogeny of the family.
The taxa included in the phylogenetic analysis had sequence percentage alignment scores of 96-99% for rbcL gene and 90-95% for the matK gene (Table 1).This is in agreement with previous reports of higher discrimination power of matK over rbcL for most plants (Li et al., 2011).This difference was also evident in the pairwise distance matrices (Tables 2 and 3) and phylogenetic trees (Figures 2 and 3), with matK giving larger genetic distances between the species than rbcL and the concatenated rbcL + matK code giving intermediate distances (Table 4).This was expected as the matK gene is reported to have a higher rate of mutation than the rbcL gene (Kress et al., 2009) and is thus more likely to reveal a greater amount of variation between species.The rbcL locus is generally more suitable for determination of evolutionary relationships at the generic level and above (Kress et al., 2005).On the other hand matK has been more successful in resolving species relationships in several families (Johnson and Soltis, 1994;Hilu and Liang, 1997;Rohwer, 2000).
All the phylogenetic trees obtained with separate rbcL, matK sequences and with concatenated rbcL + matK sequences correctly resolved the 17 taxa into their respective familial clades with 100% bootstrap support (Figures 2, 3 and 4).In each family, the vast majority of branches also had high bootstrap values (> 90%).These barcoding genes also allowed adequate discrimination at generic and species levels, as seen in the clear resolution of the genus Melia (M.volkensii and M. azederach), genus Swietenia (S. macrophylla and S. mahogany), genus Picrasma (P.javanica and P. quassioides) and genus Ailanthus (A. integrifolia, A. altissima and A. triphysa).This suggests a possible use of the two barcoding genes, with additional empirical testing, in resolving taxa in the Meliaceae and related families up to the species level.This recommendation is supported by the findings of Kress et al. (2005) which showed that full-length sequences (>1 kb) of either gene can give enough sequence length to discriminate between species.The sequences obtained in this study were longer than 1kb and therefore met this criterion.
Despite the limited number of taxa used, the molecular phylogeny obtained in this study provides some useful insights into the evolutionary relationships between M. volkensii and the taxa that were included in the phylogeny.This is one of the suggested applications of a DNA barcode (Kress et al., 2015).The M. volkensii barcode could also be useful in aiding identification of the species and its products, enabling more detailed phylogenetic reconstructions and the tracking of its exotic dispersion.However, for application of the matK + rbcLplant barcode in a more comprehensive study of the Meliaceae, there is an urgent need for sequencing of the rbcL and matK genes for all the estimated 550 species of the Meliaceae and deposition of the data in DNA Databases.

Conclusions and recommendations
The plant barcoding genes rbcL and matK managed to resolve selected taxa up to the species level.A partial molecular phylogeny of the Meliaceae and closely related famlilies was obtained.The main limiting factor was the lack of complete data on rbcL and matK sequences in the DNA repositories for members of these families.This calls for accelerated deposition of more sequence data in order to fill the huge gaps in the DNA libraries.Such data can also be used in future Bayesian inferences.

Figure 1 .
Figure 1.Agarose gel profiles of the isolated chloroplast rbcL and matK genes.MM= 1kb ladder, NC= negative control, 1-9 = some of the DNA samples used.

Figure 2 .
Figure 2. Maximum Likelihood phylogenetic tree for Melia volkensii and 16 closely related species based on rbcL gene, with 1000 bootstraps.Bootstrap support values are shown at nodes.Scale = number of substitutions per site.

Figure 3 .
Figure 3. Maximum Likelihood phylogenetic tree for Melia volkensii and 16 closely related species based on matK gene, with 1000 bootstraps.Bootstrap support values are shown at nodes.Scale = number of substitutions per site.

Figure 4 .
Figure 4. Maximum Likelihood phylogenetic tree for Melia volkensii and 16 closely related species based on rbcL + matK concatenated genes, with 1000 bootstraps.Bootstrap support values are shown at nodes.Scale = number of substitutions per site.

Table 2 .
Estimates of genetic distance between sequences using rbcL alone, based on the number of base substitutions per site.Standard error estimate(s) are shown above the diagonal and were obtained by a bootstrap procedure (1000 replicates).

Table 3 .
Estimates of genetic distance between sequences using matK alone, based on the number of base substitutions per site.Standard error estimate(s) are shown above the diagonal and were obtained by a bootstrap procedure (1000 replicates).

Table 4 .
Estimates of genetic distance between sequences using rbcL + matK concatenated sequences, based on the number of base substitutions per site.Standard error estimate(s) are shown above the diagonal and were obtained by a bootstrap procedure (1000 replicates).