Near infrared spectroscopy ( NIRS ) technology applied in millet feature extraction and variety identification

Near infrared spectroscopy (NIRS) technology is widely used on agricultural products for quality detection, classification and variety identification due to its rapid speed and high-efficiency. NIRS experiments were conducted to identify varieties of DUN millet, JIN 21 millet and 5 other types of millet. The NIRS characteristic curves and data of millet samples were collected. The spectroscopic data on different types of millet were analyzed by discriminant analysis, principal component analysis and neural network technology. The calibration set correct classification was 98.9%. A BP neural network prediction model for millet was also built. It was found that the forecast results of original wave spectrum prediction model were best, with its correlation coefficient of validation (Rv) at 0.9999, the standard error of prediction (SEP) was 0.0191 and the root mean square error of prediction (RMSEP) was 0.0189. Moreover, the Rv of first derivative spectra was 0.9976, the SEP and RMSEP were 0.1043 and 0.1437, respectively, and the Rv, SEP and RMSEP of second derivative spectra were 0.9835, 0.28735 and 0.2720 respectively. This study laid the foundation for identification of millet varieties by NIRS.


INTRODUCTION
A large amount of millets are grown in Northern China (Lu et al., 2005).They are very popular miscellaneous grain crops and are commonly used as a food source (Liu et al., 2012).Millet is an important food because of the variety of rare nutrients it provides, and is widely respected as a healthy food (Yang et al., 2012).Therefore, millet planting and related industries in recent years have developed substantially (Yang et al., 2009).Many of the characteristics of millet varieties have been used in production, but some of these millets in the market have been exaggerated with regard to their compositions and effectiveness (Hua, 2010).Therefore, it has practical significance to identify the quality of millet varieties.Applying NIRS on agricultural materials is beneficial because this approach is nondestructive and it has been widely used in many agricultural application areas such as classifying agricultural products (Qiu et al., 2009;Jia et al., 2014), conducting quality inspection *Corresponding author.E-mail: guoyuming99@sina.com.Tel: 13934186326.Fax: 0354-6587587.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License (Burks et al., 2000), agricultural soil (Zheng et al., 2008), quantifying crop nutrition (Liao et al., 2015) and identifying species (Tolleson et al., 2005;Liu et al., 2014).Lidia EsteveAgelet analyzed the damage of soybean and maize kernel by NIRS (Lidia et al., 2012); Zhu (2012) and Zhu et al. (2015) reported on the application of NIRS for the detection of seed quality, and the application of NIRS is used to identify soybean, cassia seed, bitter beans and three other kinds of hard seeds with an identification rate of 95%; Liang et al. (2009) showed that the identification of rice varieties and authenticity by NIRS, and showed that feature wave band optimization model is effective.Zheng et al. (2015) performed detection and analysis of the quality of peanut seeds by using NIRS technology, and showed that the average correct recognition rate of 3 peanut varieties was 95%.
The above research provides a useful reference, and suggests a variety of uses for NIRS on agricultural products.In this paper, near infrared spectroscopy techniques were used to measure the spectral characteristics of 7 different varieties of millet, such as DUN millet, JIN 21 millet, ZHANG 2 miscellaneous cereals, ZHANG 3 miscellaneous cereals, Zhang 5 miscellaneous cereals, ZHANG 9 miscellaneous cereals, and ZHANG 10 miscellaneous cereals.Then the near infrared spectroscopy characteristic curves are analyzed; principal component analysis and neural network identification model (Zheng et al., 2008;Yuan et al., 2003;Ghamari et al., 2010) are established, and the verification is carried out.

METHODOLOGY Material
Test samples were taken from the millet varieties cultivated and popularized in recent years in Shanxi Agricultural University, including DUN millet, JIN 21 millet, ZHANG 2 miscellaneous cereals, ZHANG 3 miscellaneous cereals, Zhang 5 miscellaneous cereals, ZHANG 9 miscellaneous cereals, ZHANG 10 miscellaneous cereals.7 varieties, 36 samples for each species, a total of 252 samples were used.

Instrument and parameters
Experiments were conducted in order to test the spectrum for each millet variety.A FieldSpex 3 spectrometer was used and its transmission accessary was produced by an American company, ASD (Analytical Spectral Device).Its sampling spectrum interval (Wide Frequency) is 1.5 nm.The range of testing was 350 to 2500 nm, and the scanning time is 30 times, and the resolution is 3.5 nm.

Model
MATLAB scripting language was used to write and construct a variety of functions for network design, training routines and typical neural network activation function.In multiple linear regression analysis, principal component regression can be used to diagnose the co-linearity between the independent variables, and give the final regression prediction equation.

The procedure for collecting and analyzing spectrum data
The spectrum data test on millet was collected by a spectrometer.Its original results were exported by software ASD View Spec Pro.Then the first and second derivatives were found.The original, first and second derivative spectra were exported to EXCEL in the form of ASCII.Then a spectral image was created in MATLAB to find the characteristic wave with the most obvious change in permeability from these three spectrograms.The reflectivities from different levels of millet corresponding to the characteristic wavelengths were listed in EXCEL.Finally, the qualitative discriminations based on the principle components neural network were computed using MATLAB.

The extraction of spectral features and near infrared spectroscopy
Original visible and near infrared spectrum average were collected from the 7 different varieties of millet as shown in Figure 1.It can be seen from the Figure that, 7 different millet varieties (DUN millet, Zhang 5 miscellaneous cereals, Zhang 2 miscellaneous cereals, Jin Gu 21, Zhang 3 miscellaneous cereals Zhang 9 miscellaneous cereals, Zhang 10 miscellaneous cereals) are consistent with the visible near infrared spectral curve trend, and have a relatively strong spectral absorption peak at 500-600 and 1400-1900 nm band; in 600-1900 nm the wavelength range shows differences in permeability.The spectroscopic data in combination with chemical measurement methods for each type of millet were calibrated in order to establish the discriminate model of millet varieties, and understand the differences between varieties.Figure 2 is a derivative of the corresponding visible / near infrared spectra of averages; Figure 3 is the second order differential of the corresponding average visible / near infrared spectra.

The extraction of spectral features
From Figures 1 to 3, the initial part and the terminal part of the spectra curve have higher noise.In order to mitigate the influence of the noise, the spectra from 400 to 2400 nm bands were selected.The peaks and troughs of the wavelength curve's relative error is almost zero, so the original curve, the first derivative, and the second derivative were calculated, and showed 20 changes in the apparent wavelength as the characteristic wave.Principal component analysis was used to analyze the neural network.The original spectral characteristics were selected at 400, 440, 548, 585, 630, 655, 688, 752, 879, 924, 981, 1071, 1195, 1269, 1457, 1670, 1765, 1805, 1929 and 2204 nm.The first order derivative spectral characteristic wave was found at 450, 538, 591, 643, 717, 955, 1022, 1139, 1232, 1506, 1755, 1806, 1877, 2012, 2256nm and so on.Second order spectral characteristic waves were also found at 429,465,514,571,693,760,951,1107,1176,1354,1435,1775,1849,1929,2230 nm and so on.In order to reduce the error, 2 data are extended around each data as characteristic waves.
Since differences were found among the spectra of different millet varieties in order to find the best spectrum to distinguish different varieties of millet, the original spectrum, first derivative spectra, and second derivative spectra of 7 different varieties were used in the principal component analysis.The input of the neural network is used as the main component of the neural network, 1, 2, 3, 4, 5, 6, 7 respectively, as the output of the neural network, said the different varieties (1-DUN millet, 2 -Zhang 5 miscellaneous cereals, 3 -Zhang 2 miscellaneous cereals, 4 -JIN 21 millet, 5 -Zhang 3 miscellaneous cereals, 6 -Zhang 10 miscellaneous   cereals, 7 -Zhang 9 miscellaneous cereals), establishing the corresponding neural network model.Table 1 shows the identification results of using principal component neural network to correctly identify the samples.

DISCUSSION
Neural network has a nonlinear adaptive information processing ability (Ghamari et al., 2010), and BP network is a kind of multilayer feed forward neural network (Shao et al., 2007), which transfer function is S function and output is continuous between 0 and 1.It can achieve any nonlinear mapping from input to output (Huang et al., 2011).In recent years, it has proven to be very useful in the simulation of nonlinear functions.The general structure is shown in Figure 4.

Input layer hidden layer output
In this paper, Levenberg-Marquardt (LM) was used to  optimize the neural network algorithm (Lera et al., 2002).This method helps make the convergence process smooth, and completes the iterative process of the network in a relatively short time.The input curve tangent S activation function by function layer (tansig), and the output layer uses a linear activation function (purelin).
The entire program can be written in the M file of MATLAB and the artificial neural network toolbox is used for debugging and simulation.
In the selected feature wave, 30 groups were trained at each level and 20 groups were predicted.The maximum number of training was 5.The training accuracy is 10 -4 , transfer function (tansig), the training function is (trainlm).

The prediction model on the original spectrum
The original spectrum from BP network and the evaluation parameters of the prediction model (Qi et al., 2003) are shown in Table 2, prediction results are shown in Figure 5, the correction standard error is shown in Figure 6.

The prediction model on the first derivative spectra
The evaluation parameters of the correction and prediction model on the first derivative spectra are shown in Table 2, prediction results are shown in Figure 7, the correction standard errors are shown in Figure 8.

The prediction model on the second derivative spectra
The evaluation parameters of the correction and prediction model on the second derivative spectra are shown in Table 2, prediction results are shown in Figure 9, the correction standard errors are shown in Figure 10.

Comparison between the prediction and model
The evaluation parameters adopted in the model include standard error of prediction (SEP), root mean error of prediction (RMEP), and correlation coefficient of validation (Rv).
It can be concluded from the comparison that among these three prediction models the prediction models based on the BP neural network achieve the highest accuracy.The forecast results for the original spectrum model were the best, with a correlation coefficient of validation (Rv) at 0.9999, the standard error of prediction (SEP) at 0.0191 and the root mean square error of prediction (RMSEP) at 0.0189.The first derivative spectra and the second derivative spectra were also compared.
The Rv was 0.9976 and 0.9835, and the corresponding SEP and RMSEP were 0.1043, 0.1437 and 0.2735, 0.2720 respectively.The standard value, predicted value and residual data of the sample's original spectrum, first derivative spectra and second derivative spectra are shown well.Table 1 shows that all three models can distinguish different varieties of millet, and among them, the original spectrum combined with principle components neural network has the most accurate results, with a simple accuracy rate of 100%.

Conclusions
In this paper, the NIRS of 252 samples of 7 millet The experimental results show that the model built with the characteristic wave has higher prediction accuracy than the whole wave model.The experimental results show that the model built with the characteristic wave has higher prediction accuracy than the whole wave model, and also shows that the characteristic wave extraction is an effective method for model optimization.NIRS combined with principal component neural network is used to classify the different varieties of millet.Because of the wide range of spectrometer, it can be well adapted to the discrimination of different millet varieties.The results of rapid identification analysis show that it is feasible to distinguish millet varieties by using visible/near infrared spectrum combined with principal component neural network model.
Through the analysis of the standard values, predicted values and residuals of the sample's original spectra, first derivative spectra, second differential spectrum, and discriminant results show that the effect of the original spectra combined with principal component neural network to identify the millet is best with a simple accuracy rate of 100%, followed by the first differential spectra and the second derivative spectra.These three models all can distinguish between different varieties.
When applying the second differential prediction model to forecast analysis, except for the first 2 varieties, the prediction error of the other 5 varieties is very small.If the interference factors are eliminated, the second differential spectrum prediction will also achieve very high prediction accuracy NIRS is a simple, fast and no sample pretreatment method for qualitative analysis of millet varieties.It has a good classification effect and has certain application value in the field of millet variety discrimination.

Figure 1 .Figure 2 .
Figure 1.The original spectrum curve of millet

Figure 3 .
Figure 3.The second derivative spectra curve of millet

Figure 6 .Figure 7 .
Figure 6.Curves of training error, checking error and testing error

Figure 8 .Figure 9 .
Figure 8. Curves of training error, checking error and testing error

Figure 10 .
Figure 10.Curves of training error, checking error and testing error

Table 1 .
Identification results of checking samples for millet samples.