Detection and quantification of cow milk adulteration using portable near-infrared spectroscopy combined with chemometrics

Milk adulteration is a common phenomenon in many countries, which draws extensive attention from humans due to health hazards that might result in some fatal diseases. In this study, a portable nearinfrared (NIR) spectrometer combined with multivariate analysis was used to detect and quantify milk adulteration. Fresh cow milk samples were collected from eight dairy farms in Beijing and Hebei province of China. Water, urea, starch and goat milk were used to adulterate milk at 11 different concentrations. The data driven soft independent modeling of class analogy (DD-SIMCA) method was employed for qualitative analysis. Partial least squares regression (PLSR) was applied for statistical analysis of the obtained NIR spectral data. The results showed that the DD-SIMCA approach achieved satisfactory classification. By the PLSR model, standard error of prediction (SEP) values of 4.35, 0.34, 4.74 and 5.56 g/L were obtained for water, urea, starch and goat milk, respectively. These results demonstrated the feasibility and reliability of NIR spectroscopy combined with multivariate analysis in the prediction of the total contents of the investigated adulterants in cow milk.


INTRODUCTION
Milk adulteration is usually conducted to meet the regulatory requirements while lowering the milk quality by substitution of cheap substances, admixture or extraction of valuable milk components (Poonia et al., 2017). It poses serious threats to human health and becomes a global concern, particularly in developing countries. The possible reasons for milk adulteration might be a high demand of milk by all ages, easy adulteration operations and lack of feasible and accurate detection tools (Kamthania et al., 2014).
As a nutritionally balanced mixture and perishable food, milk has attracted the interest of many researchers. It has *Corresponding author. E-mail: yangshuming@caas.cn.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License been characterized as a valuable source of fat, protein, carbohydrates, vitamins, and minerals (Faraz et al., 2013). The relatively low costs and high nutritional values contribute a significant part to human diets for milk. However, increased demand makes it a highly possible target for adulteration (Salih et al., 2017). According to Moore et al. (2012), milk is among the seven common food items most susceptible to adulteration. Many additives and food ingredients such as starch, rice flour, skim milk powder, whey powder, reconstituted milk, salt, vegetable oil, animal fat, glucose, melamine and urea have been used as milk adulterants to increase the thickness and viscosity of milk and to maintain the composition of fat, protein and carbohydrates (Campos Motta et al., 2014;Singuluri and Sukumaran, 2014;Soomro et al., 2014). Due to the common practice of milk adulteration and its consequences on human health, it is necessitated to develop a suitable method for adulteration detection. So far, many types of techniques have been investigated for quantitative detection of adulterants in milk. These techniques are challenged by the complexity, time inefficiency and high costs. Near-infrared (NIR) spectroscopy is one of the techniques that have recently been used for food quality control. It has been reported for the prediction of quality parameters in oranges (Cayuela and Weiland, 2010), monitoring of oil oxidative stability (Allendorf et al., 2012), quantification of transfat in edible oils (Birkel and Rodriguez-Saona, 2011) and mineral fortification in whole-grain cornmeal (Hassel and Rodriguez-Saona, 2012). This approach has been used to determine several milk components including protein, fat and lactose (Etzion et al., 2004;Kawasaki et al., 2008); however, no study describes its use in monitoring milk authenticity. NIR spectroscopy is distinguished by its simplicity and lower costs, which make it an ideal tool for rapid screening. Recently, industrial companies of NIR spectroscopy have developed portable NIR systems, which are fast, simple and precise (Kim et al., 2008).
To our knowledge, no literature has been found concerning a portable NIR analyzer in detecting adulterants in milk. This study therefore aims to investigate the performance and evaluate the feasibility of portable NIR spectrometers combined with multivariate analysis. Water, starch, goat milk and urea were used as adulterants for the investigation. Chemometrics, utilized as a Microsoft Excel add-in, was employed as a novel method with unique and powerful features, which was enabled to perform the most types of multivariate analysis in chemistry to help understand data and build models. Qualitative and quantitative models were developed using pattern recognition techniques including data driven soft independent modeling of class analogy (DD-SIMCA) and partial least squares regression (PLSR), respectively.

Sample preparation
Fresh cow milk samples were collected from eight dairy farms in Beijing and Hebei province of China. A total of 44 samples from the farms were adulterated by water, starch, goat milk and urea at different levels (Table 1). All the samples were used together and divided into two sets for PLSR, including a training set (75% of the samples) and a test set for validation (25% of the samples).

Portable NIR spectroscopic analysis
All samples were scanned using MicroNIR 1700, a miniature NIR spectrometer developed and manufactured by JDSU (JDS Uniphase Corporation, California, USA, www.jdsu.com). It features a small size (45 mm diameter × 42 mm height), low weight (60 g) and short analysis time (a few seconds). This technology can be controlled and operated on a tablet, smartphone or portable computer. The MicroNIR spectrometer is an ultra-compact instrument designed for diffuse reflection, transflection or transmission mode. It uses a linear variable filter (LVF) mounted over a diode array detector (DAD) that separates incoming light into different wavelengths. The spectrometer integrates light sources with readout electronics in a small construct.
White reference measurement was obtained using a while black (or dark) reference was obtained at a fixed place in the room. Absorbance values were calculated as -log (Sample-Black/White-Black) with data sent to the host computer as .csv files. Three replicate spectra were acquired from all samples. Replicas were averaged for data analysis in the following chemometric analysis.

Statistical analysis
Multivariate analysis was used to develop qualitative and quantitative models with NIR data sets. Discrimination and quantification of adulterated milk samples were evaluated using soft independent modeling of class analogy (SIMCA) and PLSR, respectively. Chemometrics was performed in Microsoft Excel (Pomerantsev A.L., 2014). The Excel file SIMCA template.xlsb is a pattern for performing DD-SIMCA, an advanced one-class classification method for class modeling (Pomerantsev, 2008;Pomerantsev and Rodionova, 2013), which allows theoretical computing of misclassification errors. This technique is based on principal component analysis (PCA) applied to the training data of a target class. The PLSR template.xlsb file presents the forms for measuring the performance of the projection to latent spaces/structures (PLS) regression. PLS regression is an innovative method for regression analysis using Excel 16 , which allows development of a relationship between the matrix of predictors X and the vector of responses Y. This technique is based on projection to latent structures. Figure 1 shows the normalization of five mean NIR spectra in the region of 950 to 1650 nm for pure milk and milk adulterated with water, starch, goat milk and urea. The obtained spectra showed two prominent absorption bands centered at around 1070 and 1270 nm, respectively. These two bands were associated with strong O═H stretch overtone/combination of water (Laporte et al., 1999;Rodriguez et al., 2006), which was attributed to a short spectral range (from 950 to 1650 nm) on the spectrometer. Samples adulterated with urea demonstrated strong absorption at 1063 nm associated with NH 4+ deformation, indicating the decomposition of urea, while C═O was absorbed at 1255 nm (Mortland, 1996). The spectra of milk adulterated with goat milk did not exhibit any visual difference from the control, which could be ascribed to close similarity of the components in this substance with the control milk.

Spectral characteristics
Nevertheless, those mean NIR spectra were extremely similar to each other, making it difficult to classify the spectra. Therefore, the DD-SIMCA method was applied in this study to distinguish between natural (pure) and adulterated milk samples.

DD-SIMCA classification
The application of DD-SIMCA using a training dataset demonstrated the performance of this method in absence of adulterated samples. For this purpose, we verified the model established for natural or pure milk, which was directly collected from the cows in the farms. First, the test set included natural or pure milk samples collected from the cows in the same farms, but not included in the previous training set. Second, we adulterated the samples with water, starch, goat milk and urea. They helped verify the performance of the model regarding outliers. Milk samples that were essentially dissimilar to the target class (control sample) were taken as 'low quality' adulterated samples, whereas milk samples close to the target class (control sample) were regarded as 'high quality' adulterated samples. The latter helped analyze the most challenging cases and assess the values of type II errors.
Preliminary PCA performed on the whole data revealed both big similarities and differences between pure and adulterated milk samples that were collected from a variety of dairy farms (Figure 2). Though all milk samples contained the same components, these samples were adulterated with various levels of adulterants, resulting in spectrum discrepancies. Spectral absorbance reflected the concentrations of the components.
Individual decision rules were developed for each adulterant and the control milk sample using the DD-SIMCA method. For this purpose, several milk samples of the target class were collected as the training set. The test set comprised of many samples of the target class and adulterated samples. To illustrate the result, pure milk samples were chosen as the target class and other adulterated samples were analyzed against a model developed for this class. A proportion of 75% of both the control and adulterated samples were collected in the training set whereas the test set contained 25%.
The PCA model with four principal components (PCs) explained 74% of the total variance. The decision rules were constructed for the chi-squared distributions, with Nh = 2 and Nv = 6 DoFs. For α = 0.01 (left panel of Figure 3), all training objects were located inside the acceptance area. As for the test set (right panel of Figure  3), the samples originated from all adulterated samples were located far from the acceptance area and could easily be classified as outliers. All test milk samples from the target class control milk (pure) were classified properly, but two samples originated from the subset of milk adulterated with urea (0.5%) and milk adulterated with water (5%) were wrongly accepted. This could be attributed to the low concentrations of the adulterants.
SIMCA classification of the other classes was done in a similar way. The final results are as shown in Figures 4 to  7.
In this study, we demonstrated that the PCA model with four PCs reliably separated the target class (control (pure) milk samples) from all other classes (adulterated milk samples). The proposed classification method recognized such alien objects successfully.
Similar classifications were repeated for the cases where each adulterated milk sample was selected as a target class. The results provided individual decision rules with specific values of α and β errors.

Quantification of adulterants in milk
The adulterant levels in milk were quantified using PLSR analysis to generate calibration models. Individual models  were developed using the same spectral regions identified by classification analysis (Table 2). This PLSR calibration model was then used to test the prediction ability for each independent 25% test sample set, as shown in Figures 8, 9, 10 and 11. This was an external validation part of the PLSR model because the spectral data used for the test set were not utilized in building the PLSR model. It can be seen from these figures that the PLSR model has very good prediction ability according to the root mean square error of prediction (RMSEP), which might becaused by the fact that the 25% test samples were not used in building the PLSR calibration model. RMSEP is a statistical measure of how well the model predicts new samples (not used when building a model). RMSEP expresses the average error to be expected in future predictions when a calibration model is applied to unknown samples.
PLSR models suggested a good correlation between infrared estimated concentrations and spiked adulterant levels in the milk ( Table 2).
The PLSR scatter plots (Figures 8 to 11) illustrated a good fit between the reference levels and NIR predicted  values for milk adulteration. Models were developed using 5, 10, and 13 factors and explained more than 97% of the variance in the multispectral data set with R 2 val ranging from 0.87 to 0.94, revealing prediction robustness of the calibration models. The SEP values, an estimate of the error in predicting the adulteration level in an unknown sample, were 4.35, 0.34, 4.74 and 5.56 g/L for estimation of the levels of adulteration with water, urea, starch and goat milk respectively (Table 2).
In Figure 8, the statistical result revealed the lowest RMSEP value 4.35% (v/v) with a model factor of 13. This region contained bands arising from the combination of OH symmetric and asymmetric stretching modes of water (Maeda, 1995). It was noted that the error limit of ±4.35% (v/v) was slightly higher than the minimum water content contained in a milk sample set. Therefore, when milk was adulterated by adding a very small volume of water within     the error limit, it was difficult to determine the content of water in such a milk sample. This may be attributed to the fact that typical cow milk normally contains water as a main component. Thus, low contamination of water in milk was not obvious in the NIR spectrum. These low SEP values can be associated with the presence of distinct and specific absorption signals for each adulterant used in the spiking process. Our regression modeling showed the same prediction abilities as those described in the earlier literatures (Laporte and Paquin, 1999;Rodriguez et al., 2006) and had minor differences with that reported in Santos et al. (2013) work.

Conclusion
This study examined the applicability of a portable nearinfrared (NIR) spectrometer (MicroNIR 1700) combined with a multivariate analysis method chemometrics in the monitoring of adulteration in cow milk. The results revealed that the method is unique and suitable for detection of water, urea, starch and goat milk. Further study on the feasibility validation of this technique used in Figure 11. Root mean square errors (RMSE) and partial least squares regression (PLSR) curves for training (circle) and validation (square) data sets of milk adulterated with other milk (goat milk). monitoring adulterants in cow milk is recommended.