Chemical profiling and chemical standardization of Vitex negundo using 13C NMR

Chemical profiling and standardization of the defatted methanol extract of the leaves of Vitex negundo L. were carried out using 13C nuclear magnetic resonance (NMR) analysis followed by chemometric analysis of the chemical shift data. Chemical profile was obtained using a k-means cluster profile and chemical standardization which was achieved using a multivariate control chart. The V. negundo samples were made up of four groups: the training set, submitted samples from production farms, commercial samples, such as tablets, capsules and teas, and experimental samples (samples which were allowed to degrade). Four groups were generated in k-means cluster, which generally corresponded to the four types of samples. The multivariate control chart identified samples whose quality exceeded the upper control limit, all of which were commercial samples and experimental samples. The samples were also analyzed by quantitative thin layer chromatography (qTLC) using agnuside as marker compound. Comparison of the qTLC results with the k-means cluster and the multivariate control chart showed poor correspondence. This means that a univariate analysis of a plant sample using a marker compound is useful only for quantification of the target compound. On the other hand, chemical profiling and standardization of medicinal plants should use a multivariate method. 
 
 Key words: Vitex negundo, 13C NMR, multi-variate cluster profile, multi-variate control chart.


INTRODUCTION
With the growing interest in medicinal plants today, numerous plants which are traditional home remedies are being developed for commercial production. This entails expansion of the supply chain from sourcing of validated planting material to farming and processing of the raw plant material, to manufacture of finished product. Because many herbal products are sold as dried plant material, such as tablets and teas, there is a need to develop effective methods of standardization and quality assurance. Medicinal plants are very complex mixtures of secondary metabolites which can vary significantly depending on the planting material, environment and farming conditions, age at harvest, storage, and processing.
Quality assurance of herbal products should meet the following needs: verification of plant identity; detection of *Corresponding author. E-mail: fdayrit@ateneo.edu.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License adulteration or chemical deterioration; and quantification of active components, if known (Kumari and Kotecha, 2016). Quality assurance can be based on the targeted analysis of one or few compounds (univariate) or on the chemical profile of the plant extract (multivariate) (Ning et al., 2013). Chemical profiling of herbal products refers to the generation of a quantitative molecular description of the whole extract of plant secondary metabolites (MW < 1,000 Da) in order to establish plant identity and product quality (Yongyu et al., 2011) using chemical analytical methods such as chromatography, spectroscopy, or hyphenated chromatography-mass spectrometry.
Nuclear magnetic resonance (NMR) spectroscopy can yield considerable information in an untargeted analysis of a plant extract. NMR is robust, highly reproducible, and requires minimal sample preparation which minimizes experimental artifacts and bias. Because of the desire for highest sensitivity, 1H NMR is the most common technique used and is combined with chemometric methods to profile, fingerprint or discriminate among crude herbal samples (Bailey et al., 2002;Zulak et al., 2008;Lee et al., 2009;Kim et al., 2010;Mahmud et al., 2014), and for quality control (Wang et al., 2004, Rasmussen et al., 2006, van der Kooy et al., 2008. 1H NMR measurements of herbal medicines have been reported using magnetic fields from 300 to 800 MHz (Zulak et al., 2008;Kim et al., 2011). The limitation of 1H NMR, however, is that the spectra are magnetic fielddependent because the chemical shift in Hz is magnetic field-dependent while the 1H-1H J couplings are magnetic-field independent. This means that 1H NMR spectra taken at different magnetic fields will have different ratios J and different spectral appearances. When the ratio between the difference in frequency () and coupling (J) is less than 20, the spectrum is second order and the appearance of the spectrum is sensitive to the magnetic field strength (Becker, 2000). Thus, data from 1H NMR spectra taken at different magnetic fields cannot be combined.
Compared to 1H NMR, 13C NMR is a more general chemical profiling technique because the fully 1Hdecoupled 13C NMR signals are singlets and do not have second-order effects since 1H-13C coupling is zero. This means that 13C NMR data are amenable to spectral comparison across different magnetic field strengths. Unlike 1H NMR, 13C NMR does not require water suppression, which is another source of spectral variability since this is influenced by instrument and operator performance. 13C NMR however requires much longer acquisition times and this is its main disadvantage. To date, there are only a few examples of the use of 13C NMR for the profiling of biological extracts. 13C NMR was used to profile triacylglycerols from the seed oil of Moringa oleifera (Vlahov et al., 2002) and lipid extracts from Atlantic salmon (Aursand et al., 2009). 13C NMR was used to profile fractions from a crude extract of Anogeissus leiocarpus after which hierarchical clustering analysis (HCA) revealed correlations between 13C signals of the mixture with known compounds using a 13C NMR database (Hubert et al., 2014).
Chemometrics is a family of techniques that applies statistics to voluminous chemical data, such as spectroscopic signals from a collection of samples, with the objective of gaining insights into the characteristics of the samples through graphical representation or patternrecognition (Wold, 1995). Chemometric analysis is an ideal tool for the classification of spectroscopic data from whole plant extracts to differentiate plants according to species, origin, processing treatment, age, and other quality parameters (Kim et al., 2010).
The overall objective of this paper is to explore the use of 13C NMR together with multivariate statistical methods for the chemical profiling and standardization of medicinal plants. This work will also compare the use of 13C NMR with 1H NMR. The results from the multi-variate control chart will be compared with a targeted univariate quantitative thin layer chromatography (qTLC) method using a marker compound.

Study species
Vitex negundo, L. is an aromatic shrub which is found from tropical East Africa to South Asia, Southeast Asia, and Polynesia and from Japan southward to Malesia and is widely used in traditional medicine, especially in South and Southeast Asia (GRIN-Global, no date). V. negundo is grown all over the Philippines in commercial farms which supply the dried leaves to herbal pharmaceutical companies. The iridoid agnuside is a major constituent in the dried leaves of V. negundo (Dayrit and Lagurin, 1994). A validated method has been reported for the analysis of the leaves by qTLC using agnuside as a marker compound (Roy et al., 2015).

Samples
There was a total of 64 samples, which were made up of four sets: training set (n=15), submitted samples (n=17), commercial samples (n=13), and experimental set (n=19). The training set was made up of V. negundo leaf samples that we collected from 5 locations around the Philippines. The training set samples were immediately washed and dried at ≤ 60°C to < 5% moisture. The submitted set was made up of dried or powdered leaves that were submitted by 5 commercial farms from various parts of the country. Commercial products (n = 13) were tablets, capsules, and tea products that were purchased from supermarkets and drug stores. Experimental samples (n = 19) comprises a heterogeneous set which include; old samples (> 4 years), flowers, plant tops, and samples that deliberately allowed to degrade (fresh samples were allowed to stand for 3 days before drying).

Sample preparation
To determine the reproducibility of the procedure (extraction and 13C NMR and qTLC analyses), each of the 64 plant samples was extracted and analyzed in duplicate. The results of each duplicate run were not averaged but were considered as a separate sample. Therefore, the number of NMR and qTLC runs is twice the number of samples.
All samples were milled and sieved (30 to 100 mesh). Five grams of plant material were defatted using n-hexane in a Soxhlet apparatus for 4 h. Two grams of the hexane defatted material were extracted with methanol in a Soxhlet apparatus for 4 h at 90°C. The same defatted sample was used for NMR and qTLC.

NMR analysis
To prepare the NMR sample 0.1 g of the defatted methanolic plant extract was dissolved in 0.7 ml of methanol-D4 (with added TMS, Cambridge Lab., USA) in a 5 mm NMR tube. A measured amount of DMSO was added as internal standard.
1H NMR spectra were acquired on a 400 MHz on a JEOL Lambda 400 NMR spectrometer (9.4 Tesla) and on a 500 MHz Varian (11.75 Tesla). The same spectral parameters were used for both instruments: pulse angle: 45°; number of scans: 4; number of points: 32k. The following spectral parameters were adjusted according to the magnetic field: at 400 MHz: spectral width: 7,993 Hz; at 500 MHz: spectral width: 10,000 Hz. FIDs were processed using exponential multiplication with auto-processing to avoid operator bias. Line broadening was set at 2.4 Hz for 400 MHz spectra and 3.0 Hz for 500 MHz spectra.
13C NMR spectra were acquired at the corresponding frequencies: 100 MHz (9.4 Tesla) and 125 MHz (11.75 Tesla). The same spectral parameters were used for both instruments: pulse angle: 45°; broad-band 1H decoupling; number of scans: 2,200; number of points: 32k. The following spectral parameters were adjusted according to the magnetic field: at 100 MHz: spectral width: 27,100 Hz; at 125 MHz: spectral width: 33,875 Hz. FIDs were processed using exponential multiplication with auto-processing to avoid operator bias. Line broadening was set at 1.20 Hz for 100 MHz spectra and 1.5 Hz for 125 MHz spectra.

Data processing and statistical analysis
For the 100 MHz 13C NMR spectrum, a bin size of 4 Hz was used across the spectral range of 27,100 Hz. For 125 MHz spectrum, a bin size of 5 Hz was used across the spectral range of 32,768 Hz. Sixty of the tallest peaks in each 13C NMR spectrum were selected. The duplicate extracts were treated as separate samples. The peaks were aligned and normalized using the signal of the DMSO internal standard.
The tallest 60 peaks in each 13C NMR spectrum were selected, normalized against the DMSO internal standard and then aligned. NMR peaks which were, absent in greater than 90% of the samples were removed. This yielded 108 chemical shifts. These were loaded as a table in JMP for chemometric analysis. Chemometrics analysis was performed using JMP version 11 (SAS).
Weighed volumes of each sample were spotted on the TLC plate in 5 mm bands using an automated TLC applicator (CAMAG Linomat 5, Switzerland). Each plate contained 5 calibration bands of the marker compound and six extracts spotted in duplicate. The plates were recorded using a digital camera under UV-254 nm light and processed using QuantiScan v3.0 software (Biosoft, UK).

Lagurin et al. 13
The correlation coefficient, R 2 , for the marker compound in all TLC plates was > 0.99.

NMR profiles at different magnetic field strengths
The 1H NMR 400 MHz and 500 MHz spectra and 13C 100 MHz and 125 MHz spectra of the same V. negundo extract are shown in Figures 1 and 2, respectively. The 1H NMR spectra taken at 400 MHz and 500 MHz show significant differences in peak heights and peak patterns which are expected from theory. We did not subject the 1H NMR spectra to further analysis. On the other hand the 13C NMR spectra taken at 100 MHz and 125 MHz show very similar profiles.

Principal components analysis (PCA) cluster plot
PCA is the most common method used to reduce the number of dimensions in a large data set by creating linear combinations of the data that can be used to represent the entire sample using fewer dimensions. PCA has been utilized to discriminate among commercial feverfew samples (Bailey et al., 2002), for quality control and authentication of chamomile (Wang et al., 2004), differentiation of Artemisia species (van der Kooy et al., 2008), and metabolite fingerprinting of ginseng (Lee et al., 2009). Initially, we used PCA to generate the sample clusters. The result was that PC1 and PC2 could account only for about 41% of the variability which meant that this was not a sufficiently good model for the 128 samples ( Figure 3). The data required up to PC9 to reach 80% explained variability but there is no simple way to show the resulting clusters.

K-Means cluster plot
An alternative to PCA is k-means clustering, which can be used to classify a given data set starting from an a priori number of clusters. K-means cluster analysis was used to classify different chemotypes of Chamerion angustifolium L., a medicinal plant used in food supplements, according to their geographic origin (Kaškonienė et al., 2015). The k-means cluster was generated directly from the 13C NMR chemical shifts. The procedure for k-means involves obtaining the differences (y i -ӯ), where y i is the intensity of a chemical shift y of run i; ӯ is the average intensity of the chemical shift y for all runs, i = 1 to n. In this work, i = 128 runs and y = 108 chemical shifts. The magnitude of these differences (y i -ӯ), equivalently ( ̅) to remove the effect of the sign, determines the k-means clustering of the samples (Johnson and Wichern, 2007).  The k-means cluster obtained for 128 runs is shown in Figure 4 and the membership of each cluster is summarized in Table 1. Four clusters were defined a priori and the groupings obtained were consistent with  Multi-variate cluster profile by k-means. Cluster 1 is composed mainly of experimental samples that were deliberately allowed to degrade; cluster 2 is composed mainly of training set and submitted samples; clusters 3 and 4 are composed mainly of commercial products. Legend:  -training set;  -submitted samples commercial products; -experimental set. The numbers refer to the run numbers which are given in Table 2. the type of sample. Cluster 1 consists mainly of the experimental set which refers to samples that were intentionally allowed to degrade. Cluster 2 consists mainly of the training set and set of submitted samples. This indicates that commercial farms generally prepared their samples using a good drying protocol. Clusters 3 and 4 are the commercial products. The experimental samples, however, were distributed in both clusters 1 and 2 since their characteristics varied widely depending on the sample treatment.

Comparison 13C NMR at 100 MHz and 125 MHz
In this experiment, we sought to compare the results of the 13C NMR spectra taken at100 and 125 MHz. The  13C NMR of two samples was run at 100 MHz and 125 MHz and the data from these runs were added to the kmeans cluster. Figure 5 shows the resulting k-means profile, where the new data at 100 MHz and 125 MHz are indicated. The new data points clustered very closely. This indicates that the 13C NMR spectra taken at 100 MHz and 125 MHz give very similar results.

Multivariate control chart from 13C NMR data
Nine PCs were used to generate the Hotelling's T 2 multivariate control chart ( Figure 6). The upper control limit (UCL) was set to the training set sample with the highest T 2 value (in this case, this was run 123). This means that the runs that exceeded the UCL were considered rejected based on their 13C NMR profile.

qTLC analysis
The 64 V. negundo samples were analyzed twice by qTLC to measure the agnuside content giving 128 runs (Table 2). This is a univariate analysis using agnuside as  Table 2. quantitative marker compound.
There was no clear relationship between %agnuside content as measured by qTLC, its cluster grouping (Figure 4), and its T 2 value in the control chart ( Figure 5). For example, runs 75 to 78 (submitted samples) had low %agnuside content of 2.3, 2.2, 0.4, and 0.4%, respectively, but were below the UCL, while runs 103 and 104 (experimental samples) had relatively high agnuside content (4.9 and 4.8%, respectively) but were rejected based on their T 2 value. Some runs such as 69 and 70, had 0% agnuside, but were still within the UCL line.

DISCUSSION
The official pharmacopoeia method for the validation of herbal medicines relies on the use of thin layer chromatography (TLC), gas chromatography (GC), or high performance liquid chromatography (HPLC) for the analysis of chemical markers or pharmacologically-active components (EDQM, 2007;WHO, 2011). However, these methods which are based on the targeted analysis of one or two compounds cannot give an adequate assessment of the quality of an herbal sample which contains hundreds of compounds.
The objective of this work was to determine whether the profile of all carbon atoms generated by 13C NMR is able to provide an accurate multivariate profile of a complex mixture, such as extracts of a medicinal plant.
To do this, four types of samples were obtained: a training set, submitted samples, commercial samples, and experimental samples. The results from the k-means cluster, closely agreed with the type of samples that were analyzed. This gives good confidence that the use 13C NMR with subsequent multivariate analysis using kmeans cluster is able to accurately generate a chemical profile of the extract. Further, a multivariate control chart was generated from which an upper control limit (UCL) of the multivariate profiles of the samples could be set.
Comparison of the results of the multivariate control chart and the univariate qTLC analysis using agnuside as marker compound showed poor correspondence. The results showed that a sample can have a high content of agnuside but be above the UCL of the multi-variate control chart. This highlights the difference between a targeted analysis of a single compound and a multivariate chemical profile: a single compound cannot represent the quality of a complex mixture.
To obtain reliable statistical results, a large training set is needed and the method of extraction and spectroscopic measurement must be optimized and standardized to avoid bias, maximize reproducibility and minimize Table 2. Summary of results of qTLC analysis using agnuside as marker compound, runs with T2 value above 23.31 are rejected. cluster grouping, and Hoteling T 2 value. Selected runs are indicated in the cluster plot ( Figure 4) and control chart ( Figure 5).  variation. In this procedure, the 60 highest 13C NMR peaks in each spectrum were selected. The use of fewer peaks makes the statistics easier to calculate but may decrease the chemical reliability. On the other hand, the use of a large number of peaks (>60) will require more training set samples, which will make the procedure more time-consuming. Comparison of the 13C NMR profile generated at 100 and 125 MHz showed that, comparable profiles are generated. On the other hand, the 1H NMR spectra obtained at 400 and 500 MHz were clearly different. This means that 1H NMR profiles are comparable only at the same magnetic field strength while 13C NMR spectra from different magnetic field strengths may still be compared. However, further comparisons of 13C NMR spectra using bigger differences in magnetic field should be done to determine how general this is.

Run
Finally, it is worth noting that NMR is one of several methods that can be used for a multivariate or fingerprint analysis of plant extracts. For example, fingerprint analysis of V. negundo seed samples from different regions in China was done using high-performance liquid chromatography (HPLC) with diode array detection, with hierarchical cluster analysis (HCA) (Shu et al., 2016); mass spectrometry together with HCA were used for the identification and quantitative analysis of phenolic compounds in V. negundo in other to identify possible chemical markers (Huang et al., 2015).

Conclusions
13C NMR spectra of extracts of medicinal plants can be used to generate a k-means cluster, which accurately represents the chemical profile of the samples. The 13C NMR data can also be used to generate a multivariate control chart which sets the upper control limit based on the 13C NMR profile. Comparison of the multivariate control chart with qTLC results showed poor correspondence. This indicates that a univariate analysis of a plant sample is useful only for quantification of the target compound but cannot be used for chemical profiling and standardization of medicinal plants.