Near-infrared spectra quantitative analysis for flue gas of thermal power plant based on wavelength selection

This paper proposed a near-infrared (NIR) spectra quantitative analysis method for flue gas of thermal power plant based on wavelength selection. For the proposed method, the self-adaptive accelerated particle swarm optimization is presented for determining the most representative wavelengths of NIR spectral signals and is combined with partial least square for predicting the various contents of the real flue gas dataset. The proposed method chooses the current own optimal or the current global optimal as the reference state randomly and accelerated updates of the flight velocity by the reference state, then the particle state is updated based on the new velocity self-adaptively. The experimental results of a real flue gas dataset verified that the proposed method has higher predictive ability and could overcome the premature convergence.


INTRODUCTION
With regards to energy saving in a serious situation of coal and power energy shortages, using nature gas could decrease the power generation cost (Lee and Jou, 2012).The flue gas from the gas generating unit mainly consists of methane (CH 4 ), carbon monoxide (CO) and carbon dioxide (CO 2 ).The quantitative analysis of the flue gas could reflect the potential level of environmental pollution.The traditional chemical calibration analysis methods are usually time-expensive.Near-infrared (NIR) spectra is a fast nondestructive technique and has been widely used for analyte quantitative determination (Lillhonga and Geladi, 2011).Partial least square (PLS) is the most popular chemometric method and has been developed for quantitative analysis of NIR spectral data (Marcio et al., 2011).Since not all wavelengths in the spectrum are equally important model, the wavelength selection is the crucial step of NIR spectroscopy analysis.The most popular wavelength selection method is the uninformative variable elimination by PLS (UVE-PLS) and are better than the statistic-based wavelength selection methods (Centneret al., 1996).Moreover, wavelength selection could be deemed as a combinatorial problem, genetic algorithm (GA) and particle swarm optimization (PSO) combined with PLS are proposed for wavelength selection (Arakawa et al., 2011;Sorol et al., 2010).Nevertheless, GA and PSO may fall into the local minima.*Corresponding author.E-mail: yan.zhou@mail.xjtu.edu.cnIn this paper, we proposed the self-adaptive accelerated particle swarm optimization (SAAPSO) to realize the wavelength selection of NIR spectral for building the analyte quantitative model.

THE PROPOSED METHOD
A wavelength selection method based on SAAPSO is presented for building the concentration prediction model of flue gas.Some notations would be explained beforehand.The number of particles in the population is n and the number of dimensions of each particle is d, namely, d equals the number of the total wavelengths.
, where if  , p c is the reference state, r 1 and r 2 are both random number between 0 and 1, p ij and p gj are the positions of the jth dimension of p i and p g , respectively, f c is the selection coefficient, w is the inertia factor and a is the acceleration factor.Then, x could be updated by the formula (1): where ρ is the random number between 0 and 1.For wavelength selection, SAAPSO uses the binary coding to encode the location of each particle and each dimension represents a wavelength, where '1' or '0' indicates that the wavelength is selected or dropped, respectively.The velocity and location of each particle are initialized randomly, and then the prediction model is built by PLS with the selected wavelengths.In addition, the rootmean-squares error of cross-validation (RMSECV) is adopted as the fitness function.Aiming at minimizing the RMSECV, the optimal individual of SAAPSO is the solution of the problem after iterations.
For SAAPSO, the search space could be expanded and the acceleration factor would ensure the convergence rate.Furthermore, the new location determined self-adaptively would overcome the local optimum.In the next section, the experiments results could further verify the effectiveness of the proposed method.

EXPERIMENTAL RESULTS
To evaluate the effectiveness of the proposed method, a real dataset obtained by measuring the NIR spectra of the field flue gas is used in the experiments.The dataset are obtained during a combustion process and includes 106 samples and each sample consists of a spectrum for a mixture of CH 4 , CO and CO 2 with the reference concentration obtained a gas chromatography (GC).The Cao and Zhou 289 spectra were measured by a GASMET DX4000 Fourier transform infrared (FTIR) gas analyzer.The spectral wave number is 549.44~4238.28cm -1 with a resolution of 7.72 cm -1 .Each spectrum contains 473 wavelengths.The concentration ranges of the three analytes are 0~0.4598ppm, 0~0.4083 ppm and 0~0.3818 ppm, respectively.In this study, PLS, UVE-PLS, GA combined with PLS (GA-PLS), PSO combined with PLS (PSO-PLS), the neighboring combined with PLS (NPSO-PLS) and SAAPSO combined with PLS (SAAPSO-PLS) are used for the concentrations of CH 4 , CO and CO 2 in the gas dataset and compared the effectiveness of these models.The real dataset would be split into the calibration set and the validation set based on the Monte Carlo crossvalidation, namely, 80% of the samples in the dataset are randomly selected as the calibration set and the other 20% are regarded as the validation set.To ensure the fairness of the experiment, the size of populations of GA-PLS, PSO-PLS, NPSO-PLS and SAAPSO-PLS were all set to be 30 and the number of iterations is 100.Although GA-PLS, PSO-PLS, NPSO-PLS and SAAPSO-PLS are initialized randomly, the initialization conditions of them are kept consistent for the experiments.For GA, the crossover probability and the mutation probability were set to be 0.7 and 0.05, respectively.For PSO-PLS and NPSO-PLS, two learning factors were both 1.7.For PSO-PLS, NPSO-PLS and SAAPSO-PLS, the inertia factors were all set to be 0.7.For NPSO-PLS, the number of neighbors of particle is 5.For SAAPSO-PLS, the selection coefficient is 0.5 and the acceleration factor is 2. Since these methods are based on PLS, the number of latent variables is determined according to the RMSECV with leave-one-out cross validation.For UVE-PLS, the latent variables are selected for each recalculation.
Similarly, for GA-PLS, PSO-PLS, NPSO-PLS and SAAPSO-PLS, the latent variables are re-determined at each iteration.The RMSECV for the calibration set, the root mean-squared error of prediction (RMSEP) for the validation set, the determination correlation coefficient (R 2 ), the cross-validation correlation coefficient calculated by with leave-one-out cross validation for the calibration set (R 2 cv ), the squared correlation coefficient for the validation set (R 2 p ), the number of latent variables (N lv ) and the compression ratio (CR) are used to assess and compare the predictive ability of the various models, where CR equals   , N t is the number of total wavelengths and N s is the number of the selected wavelengths.The experiments are implemented in MATLAB 7.0.4 on a personal computer with an Intel i5-2300 CPU and 3 GB of RAM.
The analytical results are summarized in Table 1.For SAAPSO-PLS, the predictive ability is higher and the compression ratio is largest all alone for different contents.For CH4, R 2 of SAAPSO-PLS are little less than those of UVE-PLS, GA-PLS, PSO-PLS and NPSO-PLS.For CO2, R 2 of SAAPSO-PLS are little less than those of PLS, GA-PLS, PSO-PLS, NPSO-PLS, R 2 cv of SAAPSO-PLS are little less than those of GA-PLS and PSO-PLS and R 2 p of SAAPSO-PLS is only less than that of GA-PLS.The prediction accuracy of SAAPSO-PLS is higher.
Figure 1(a) to (c) show the scatter plots of measured value vs. predicted value of SAAPSO-PLS for CH 4 , CO and CO 2 respectively.Because the data almost are positioned around the diagonal line, the prediction accuracy of SAAPSO-PLS model is high.Figure 1(d) and (f) show the iterations curves of GA-PLS, PSO-PLS, NPSO-PLS and SAAPSO-PLS for CH 4 , CO and CO 2 , respectively.At the beginning of iteration, SAAPSO-PLS would explore the new search area by the reference state determined randomly and the search efficiency would be limited in a certain extent.Nevertheless, because of including the acceleration mechanism, SAAPSO-PLS begins to converge after several iterations and has the fastest convergence rate overall.In summary, the experiments results verify that SAAPSO-PLS could be adopted for quantitative analysis of real NIR spectral data successfully and has a powerful predictive ability.

Conclusion
representative wavelengths and is combined with PLS for predicting the various contents of the real flue gas of thermal power plant.The proposed method has some advantages as follows.First, the prediction model of the various contents of flue gas dataset built by the proposed method is more accurate.Second, the proposed method explores new areas to avoid the premature convergence in a certain extent.Third, the proposed method accelerated finds the optimal solution with faster convergence speed.The experiments results verify that the predicative ability and the search performance of the proposed method.In the paper, SAAPSO is presented for determining the University.

Figure 1 .
Figure 1.Scatter plot of measured value vs. predicted value and iterations curves for flue gas dataset.

Table 1 .
Analytical results for flue gas dataset.