Simultaneous spectrophotometric determination of nitroanilines using genetic-algorithm-based wavelength selection in principal component-artificial neural network

Ternary mixtures of nitroaniline isomers have been simultaneously determined in synthetic and real matrix by application of genetic algorithm principal component artificial neural network model. All effective factors on the sensitivity were optimized. Also, the linear dynamic range for determination of nitroaniline isomers was found. The simultaneous determination of nitroaniline mixtures by using spectrophotometric methods due to spectral interferences is a difficult problem. A genetic algorithm is a suitable method for selecting wavelength for principal component-artificial neural network (PC-ANN) calibration of mixtures with almost identical spectra without loss prediction capacity. The experimental calibration matrix was designed by measuring the absorbance over the range of 200 to 500 nm for 21 samples of 1.0 to 17.0, 1.0 to 15.0 and 1.0 to 18.0 μg/ml of m-nitroaniline, o-nitroaniline and pnitroaniline, respectively. The root mean square error of prediction for m-nitroaniline, o-nitroaniline and p-nitroaniline were 0.7848, 0.2864 and 0.1851, respectively. The proposed method was successfully applied for the determination of m-nitroaniline, o-nitroaniline and p-nitroaniline in synthetic and water samples.


INTRODUCTION
Nitroanilines are important pollutants in water because of their wide use in many industrial processes such as the manufacture of pharmaceuticals, dyes and synthetically colors.Furthermore, they are of great environmental concern because of their high toxicity to living things (Gurten et al., 2006).Actually, they can be released into the environment either directly as industry waste or indirectly as breakdown products of herbicides and pesticides.Due to their solubility in water, anilines can *Corresponding author.E-mail: goudarzi@shahroodut.ac.ir or goudarzi10@gmail.com.
readily permeate through soil and contaminate ground water.They can be taken up by humans via the skin, the respiratory tract and the gastrointestinal tract.Because of their toxicity, bioaccumulation and vast scale distribution in the ecological environment, their separation and determination have become one of the important studies of environmental analysis.A variety of analytical methods have been reported for the determination of selected anilines.Such analytical techniques have been included: gas chromatography (Farroha and Emeish, 1975;Lopez-Aviia and Northcutt, 1981), pulse polarography (Wolff and Nürnberg, 1966), liquid chromatography (Scher and Adamo, 1993), voltammetry (Au Kuzmina et al., 2003;Grabaric et al., 1997;Isaeva et al., 1991;Faller et al., 1996) and spectrophotometry (Ai-Ghabsha et al., 1976; Population size: 30 chromosomes On average, five variables per chromosome in the original population Response: cross-validated percentage explained variance (five deletion groups; the number of components is determined by cross validation).
Maximum number of variables selected in the same chromosome: 30 Probability of mutation: 1% Maximum number of components: the optimal number of components determined by cross-validation on the model containing all the variables (not higher than 15).

Number of runs: 100
Backward elimination after every 100th evaluation and at the end (if the number of evaluations is not a multiple of 100).
Window size for smoothing: 3 Revanasiddappa et al., 2001;Ghasemi and Niazi, 2007;Rahim et al., 1987).A drawback of the techniques of feature selection when applied to spectral data is that usually the selected features (wavelengths) are scattered throughout the spectrum.It has already been shown that genetic algorithms (GAs) (Arcos et al., 1997;Depczynski et al., 2000;Lucasius and Kateman, 1993;Hibbert, 1993) can be successfully used as a feature selection technique (Leardi et al., 1992(Leardi et al., , 1998(Leardi et al., , 2002;;Leardi, 1994Leardi, , 1996Leardi, , 2000Leardi, , 2007)).Leardi and Gonzalez (1998) demonstrated that GAs, after suitable modifications, produce more interpretable results, since the selected wavelengths are less dispersed than with other methods.The algorithm used in this paper is an evolution of the algorithm described in Leardi et al. (1998), whose parameters are reported in Table 1.
Simultaneous determination of components in a multicomponent drug formulation could be a difficult task, especially when characteristics of these components from analytical point resemble closely in addition to the presence of other pharmaceutical excipients.In recent past, multivariate chemometric methods for analysis of multicomponent systems have been reported in international journals mostly due to the advent of fast and affordable computers and rapid scanning spectrophotometers controlled by computer software.Artificial neural networks (ANNs) are a data processing system consisting of a large number of simple, highly intercomnected processing elements inspired by the biological system and designed to simulate neurological processing ability of human brain.Theoretical background information on ANNs can be found elsewhere (Rumelhart and McClelland, 1986;Fausett, 1994;Schalkoff, 1997).
Computationally, ANN is an approach for handling multivariate and multi-response data and hence suitable for modeling, that is, a search for an analytical function that will give a specified n-variable output for any mvariable input (Zupan and Gasteiger, 1999).Unlike standard modeling techniques where the mathematical function is required to be known in advance, ANN models do not require the knowledge of the mathematical function in advance, and are called 'soft models', that is, the models are able to represent the experimental behavior of the system when the exact description is missing or too complex.ANNs adapt to any relation between input and output data on the basis of their supervised training.The characteristics that make ANN systems different from traditional computing are: learning by example, distributed associative memory, fault tolerance and pattern recognition (García-Reiriz et al., 2007;Tang et al., 2006;Absalan and Soleimani, 2004).The flexibility of ANNs and their ability to maintain their performance even in the presence of significant amounts of noise in the input data are highly desirable (Fausett, 1994;Despagne and Massart, 1998).Since perfectly linear and noise free data sets are seldom available in practice, thus making it suitable for multivariate calibration modeling.There are reports on the application of ANNs for mixture analysis (Absalan and Soleimani, 2004;Balamurugan et al., 2003;Sathyanarayana et al., 2004;Yin et al., 2001;Ni et al., 2000).
Though, most of them employ separate networks for estimation of each component and calibration involving synthetic binary mixtures for calibration.The current research work evaluates the performance characteristics of principal component-artificial neural network (PC-ANN) model trained by Levenberg-Marquardt algorithm (Hagan and Menhaj, 1994).The use of computed spectral datasets has been demonstrated of using spectra of synthetic mixtures for the calibration models.A method for routine pharmaceutical quality control of this tablet dosage form by multivariate calibration based on soft modeling using principal component based backpropagation neural network has been presented.This method was used for simultaneous spectrophotometric determination of nitroanilines in different samples and results show the applicability of this procedure for analysis of real samples.

THEORY The statistical package used for data analysis principal component analysis
Principal component analysis (PCA) is a multivariate procedure that reduces dimensionality of data space while retaining as much information that is possible.It can be viewed as rotation of the existing axes of rotary positioning (RP) to a new position in it, such that maximum variability of data space is projected onto the axes (Hagan, 1966;Berzas et al., 1997;Marbach and Heise, 1990;Goodarzi et al., 2007;Jalali-Heravi et al., 2004;Kalivas, 2001).The first principal component (PC) is the combination of variables that explains the greatest amount of variations.The second principal component defines the next largest amount of variation and is independent to the first principal component.There can be as many possible principal components as there are variables.A particular direction that defines a linear latent variable in RP is described by a vector b (b 1 , b 2 , ..., b p ) which is usually scaled to length one.The value of the corresponding latent variable u for an object, x (x 1 , x 2 , ..., x p ) is obtained by projecting the object points onto a straight line which is defined by the direction b.Mathematically, this is a linear combination of the features x j of the object and the vector component b i .
, T , 1 The vector components b i and x i are, respectively called loadings and scores are given by the eigenvectors of the covariance matrix.

Genetic algorithm
A detailed explanation of the GAs can be found in previous researches (Hibbert, 1993;Leardi et al., 1992Leardi et al., , 1998;;Leardi, 1994Leardi, , 1996Leardi, , 2000Leardi, , 2007)), and a brief introduction to the method will be given subsequently.GA is a simulated method based on ideas from Darwin's theory of natural selection and evolution (the struggle for life).In GA for variable selection, an individual (or chromosome), that is, solution, represents a set of variables; they are the following basic steps in algorithms: (1) a chromosome is represented by a binary bit string and an initial population of chromosomes is created in a random way; (2) a value for the fitness function of each chromosome is evaluated; (3) according to the values of the fitness function, the chromosomes of the next generation are reproduced by selection, crossover and mutation operations.In this paper, the genetic algorithms follow Leardi's method (Leardi et al., 1998) (Table 1).

Artificial neural networks (ANNs)
ANN is a computer based system derived from the simplified concept of the brain in which a number of nodes, called processing elements or neurons, are intercomnected in a netlike structure.The ANN characteristics have been found to be nonlinear making them suitable for data processing in which the relationship between cause and results cannot be linearly defined.Three components constitute an ANN: the processing elements, the topology of connection between the nodes and the learning rules.
The PCA selected by loading plot were processed by ANN which was trained with the back-propagation of errors learning algorithm.Its basic theory and application to chemical problems can be found in the literature (Wythoff, 1993;Zupan and Gasteiger, 1991).The structure of the network comprised of three node layers: an input, a hidden and an output layer, represented by i, h and o, where they, respectively, indicate the number of nodes in the input layer, hidden layer and output layer.The absorbance data versus the time were centered and normalized as the input for ANN.The input nodes transferred the weighted input signals to the nodes in the hidden layer, and the same as the hidden nodes for the output layer (Table 2).

Reagents and standard solutions
All the chemicals used were of analytical reagent grade, sub-boiling distilled water was used throughout.Stock solutions of nitroaniline isomers were purchased from Fluka Company.Standards of working solution were made by appropriate dilution daily as required.A universal buffer solution (pH 7.0) was prepared by Lurie (1978).
Table 2. Architecture of the ANN models and their specifications.
No. of nodes in the input layer 3+1

Instrumentation and software
A Scinco (SUV-2120) spectrophotometer controlled by a Hewlett-Packard computer and equipped with a 1 cm pathlength quartz cell was used for UV-vis spectra acquisition.A Metrohm 692 pH-meter furnished with a combined glass-saturated calomel electrode was calibrated with at least two buffer solutions at pH 3.00 and 9.00.The backpropagation neural network algorithm having three layers was used in Matlab (version 6.5, MathWork Inc.) using NNet Toolbox.It is worth mentioning that GA variable selection and PCA modeling were also written in the same software.

Linear calibration range
Individual calibration curves were constructed with several points as absorbance versus m-nitroaniline, o-nitroaniline and p-nitroaniline concentrations.For constructing the individual calibration lines, the absorbances were measured at 251, 225 and 381 nm against a blank for m-nitroaniline, o-nitroaniline and p-nitroaniline, respectively.The linear regression equation for the calibration graph for m-nitroaniline for the concentration range of 1.0 to 17.0 µg/ml was A = 0.0726Cm-nitroaniline + 0.0451 (R 2 = 0.996) and for onitroaniline for the concentration range of 1.0 to 15.0 µg/m was A = 0.0775Co-nitroaniline + 0.0772 (R 2 = 0.999) and for p-nitroaniline for the concentration range of 1.0 to 18.0 µg/m was A = 0.0769C p-nitroaniline + 0.0206 (R 2 = 0.9973).The limits of detection were 0.04, 0.07 and 0.05 µg/m for m-nitroaniline, o-nitroaniline and p-nitroaniline, respectively, and these were calculated according to calibration line characteristics.

Procedure of standard calibration set
The concentrations of m-nitroaniline, o-nitroaniline and pnitroaniline varied between 1.0 and 17.0, 1.0 and 15.0 and 1.0 and 18.0 µg/m, respectively.The mixed standard solutions were placed in a 10 ml volumetric flask and completed to the final volume with deionized water (final pH 7.0).The absorption spectra were recorded between 200 and 500 nm against a blank of universal buffer.The spectral region between 200 and 500 nm, which implies working with 301 experimental points per spectra (as the spectra are digitized each 1.0 nm), was selected for analysis, because this is the zone with the maximum spectral information from the mixture components of interest.All absorption data are preprocessed by standard mean centring and scaling.

Selection of the optimum chemical conditions
Figure 1 shows the absorption spectra in aqueous solution of individual nitroaniline isomers at pH 7.0.With the aim of investigation, the possibility of determining nitroaniline isomers in mixtures, the optimum working conditions were studied under the conditions previously established for each nitroaniline isomers.A universal buffer solution of pH 7.0 was selected.In order to select the optimum pH value at which the minimum overlap occurs, influences of the pH of the medium on the absorption spectra of nitroaniline isomers were studied over the pH range 4.0 to 10.0.Individual calibration curves were constructed with several points, as absorbance versus nitroaniline isomers concentration in the range 1.0 to 17.0, 1.0 to 15.0 and 1.0 to 18.0 µg/ml for mnitroaniline, o-nitroaniline and p-nitroaniline, respectively.The wavelengths used to generated calibration curves

Calibration and validation
Calibration matrix of synthetic mixtures of nitroaniline isomers by genetic algorithm-principal componentartificial neural network (GA-PC-ANN) method was designed.In Table 3, the compositions of the ternary mixtures used in the calibration matrices are summarized.For prediction set, six mixtures were prepared (Table 4).To ensure that the prediction and real samples are in the subspace of training set, the score plot of first principal component versus second was sketched and all the samples are spanned with the training set scores.

Prediction set and analysis of real samples
For prediction set, 6 mixtures were prepared which did not include previous set and were used as an independent test (Table 4).The real samples in this study were collected in different waters (Table 5).The range concentrations were added to be 1.0 to 13.0, 1.0 to 12.0, and 1.0 to 15.0 µg/ml for m-nitroaniline, o-nitroaniline and p-nitroaniline, respectively.

Variable selection
We have given 140 variables for the calibration set.The data are presented in Figure 2. At the lower end, we see larger variation in the curves, while at higher end we see very small variation.We run GA for 140 variables using a PC-ANN regression method of which the maximum number of factors allowed is the optimal number of components determined by cross-validation on the model containing all the variables, and we used the selected variables for the running of PC-ANN.The selected wavelengths are 244, 245, 246, 329, 330, 331, 332, 333, 334, 335, 345 and 346 nm for m-nitroaniline and 224, 225, 226, 227, 228, 229, 230, 231, 402, 403, 404, 407, 408, 409 and 410 nm for o-nitroaniline and 228, 229, 230, 231, 255, 256 and 257 nm for p-nitroaniline and is as shown in Figure 2. The present study shows that the GA can be a good method for feature selection in spectral data sets.The results obtained on data set of mnitroaniline, o-nitroaniline and p-nitroaniline mixture demonstrate that the predictive ability of the models obtained with the wavelengths selected by the algorithm is very often much better.

Statistic parameters
For the evaluation of the predictive ability of a multivariate calibration model, the root mean square error of prediction (RMSEP) and relative standard error of prediction (RSEP) can be used (Niazi et al., 2006):  6.

Determination of nitroaniline isomers in synthetic mixtures
The predictive ability of method was determined using six three-component nitroaniline isomers mixtures (their compositions are given in Table 4).The results obtained by applying GA-PC-ANN algorithm to six synthetic samples are listed in Table 4. Also, Table 4 shows the recovery for prediction series of nitroaniline isomers mixtures.As can be seen, the recovery was also quite acceptable.The root mean square error of prediction and relative standard error of prediction results are summarized in Table 6.The plots of the predicted concentration versus actual values are as shown in Figure 3 for nitroaniline isomers (line equations and R 2 values are also shown).The results in this figure show the prediction ability of this model for calculation of three nitroanilines concentrations.

Conclusion
A GA-PC-ANN calibration model was proposed for the simultaneous determination of m-nitroaniline, onitroaniline and p-nitroaniline.Modeling with ANNs is a more robust, simpler, practically applicable method, utilized for predicting the concentration of unknown samples than standard methods using calibration lines.Based on the results obtained in this work, application of GA-PC-ANN method, which was trained with the back propagation of errors learning algorithm can construct a powerful model for simultaneous determination of mnitroaniline, o-nitroaniline and p-nitroaniline in an effective and accurate way.A GA-PC-ANN was used to build an efficient model for predicting concentrations of mnitroaniline, o-nitroaniline and p-nitroaniline in mixed solutions.Non-linear effects resulting from analyteanalyte interaction in this system can be modeled by artificial neural network.There is no need to know the exact form of the analytical function on which the model should be built, also it requires no complex pretreatment of the samples containing analytes.This technique is simple, fast and affordable.
concentration in the sample, obs y is the observed value of the concentration in the sample and n is the number of samples in the validation set.The RMSEP, RSEP and R 2 results are summarized in Table

Figure 3 .
Figure 3. Plots of predicted concentration versus actual concentration for nitroaniline isomers by GA-PC-ANN Method.Concn: concentration.

Table 1 .
Parameters of the genetic algorithms.

Table 3 .
Concentration data of the different mixtures used in the calibration set for determination of nitroaniline isomers.

Table 4 .
Added and found results of synthetic mixtures of nitroaniline isomers by GA-PC-ANN method (µg/ml).

Table 5 .
GA-PC-ANN results applied on the real matrix samples (µg/ml).

Table 6 .
Statistical parameters of the optimized matrix using GA-PC-ANN.