Artificial neural networks in predicting energy density of Bambusa vulgaris in Brazil

In this study, the physical and chemical characteristics of Bambusa vulgaris ex J.C. Wendl. var.vulgaris (Bambusa vulgaris) aged 1, 2 and 3 years were evaluated. The objective was to train, validate and evaluate the efficiency of artificial neural networks (ANNs) as predictive tools to estimate bamboo stem energy density grown commercially in northeastern Brazil. For that, samples were collected in a commercial plantation and managed for energy production, determining the energy properties. Among all the characteristics analyzed, basic apparent density was the one with major correlation with bamboo stem energy density. This factor has a great advantage because it is easy to estimate, determined both by dry mass at 0% moisture, and at saturated mass. Also, the precision of ANNs was verified when associated with basic density, as a predictor of bamboo stem energy density, showing low standard error (Syx%, 1.52) and high coefficient of determination (R2 = 0.98). ANN-estimated values had no statistical difference (tcal 0.58 ≤ ttab 2.08) with energy density estimated in the laboratory. Therefore, this tool was efficient, being recommended to predict the energetic density of the species under study, with basic density as the only predictive variable.


INTRODUCTION
Biomass converted into biofuels fits into the concept of sustainable development.Thus, bamboo, as a source of biomass, is an alternative to being, among other factors, a perennial grass, with good productivity without the need for being replanted (Guarnetti, 2013).
There are about 1439 species of bamboo worldwide, being distributed into 116 genera (Bamboo Phylogeny Group, 2012).Over 4000 uses are registered for this species (Kuehl and Yiping, 2012).Brazil owns the greatest diversity of species among the Latin American countries (Grombone-Buaratini et al., 2011).Over time, bamboo use has benefited man in several generations both as a source of work and income (Almeida, 2010).In East Asia, this plant is employed in the manufacturing of houses, agricultural tools, handicrafts and furniture (Negi and Saxena, 2011).Currently, other uses have been researched, such as activated charcoal (Liu et al., 2010) and other energy sources.
Wood energy potential should be evaluated by means of energy density, which is the product between calorific value and apparent density, whose IS unit is kj.m -3 . It is an important property because, in addition to encompassing density and calorific value, it encompasses wood chemical and physical characteristics for energy production through heat form.Moreira (2012) stated the energy density as an excellent indicator of energy potential for Bambusa vulgaris.
Nevertheless, these variables must be carefully defined, whereas both wood density and calorific value can be obtained under different conditions, generating different values for energy density.In general, this variable must be determined under conditions where calorific value and density are estimated with the same moisture content.But, the exception to this rule would be by the existence of basic density.In this case, the higher calorific value is used in calculations of the energy density.
For companies that use biomass as fuel, the energy density is fundamental as an indicator of its energy potential.In this sense, it becomes imperative to use techniques or tools that enable estimations of such energy potential.Among them are the artificial intelligence (AI) tools.The use of AI tools for estimations of growth and production in the forest science field is a new subject, which is rarely explored.In contrast, efforts have been made towards this approach, showing promising results for eucalyptus and pines species (Castro et al., 2013;Diamantopoulou, 2005;Gorgens et al., 2009;Miguel et al., 2015); yet, there are no studies on assessment for bamboos, mainly for energy density prediction.
Among the available AI techniques, artificial neural networks (ANNs) have gained prominence (Binoti et al., 2013;Miguel et al., 2015).The use of ANNs in modeling allows greater accuracy in production estimates and improved decision-making (Angel et al., 2007).
ANNs are mathematical models making use of artificial intelligence to solve certain complex problems.These nets are formed by simple processing elements-artificial neurons, which are activated by a function, known as the activation function.The neurons are bound to each other by connections, mostly coefficients or weights, which are adjusted by training algorithms.An artificial neuron is a simplified and related model of a real neuron, whose basic properties are information adequacy and reproduction based on connections, being thus an information-processing unit within a neural network (Wang et al., 2010).

MATERIALS AND METHODS
Nine stems of Bambusa vulgaris, aged between 1 and 3 years, were collected from commercial areas in the city of Santo Amaro -Ba, Brazil.The selected stems were naturally dried inside a shed, with good air circulation, which belongs to the Laboratory Forestry Products (LPF), Brazilian Forest Service.If necessary, they were left in a forced-air circulation oven for drying at 105 ± 3°C until constant weight.Stems were chopped and ground in a Willey type mill, being later classified in a 0.250-mm sieve system for use in tests.The immediate analysis was performed based on the NBR 8112 standard (ABNT, 1986), with some adaptations: triplicate tests, a material with a particle size below 0.250 mm, use of ceramic crucibles, and for ashes 2 g sample for volatiles.
Samples were prepared for chemical analyses using the methods T 257 cm-85 (TAPPI, 2012) and T 264 om-88 (TAPPI, 1996).For evaluation of the extractive contents in ethanol (toluene), T 204 om-88 (TAPPI, 2007) was used.For ashes without extractives at 525°C, the method T 211 om-93 (TAPPI, 2002) was used.The laboratory analysis procedures LAP-003 and LAP-004, from the National Renewable Energy Laboratory -NREL (Templeton and Ehrman, 1995), were used to determine the contents of lignin.Equation 1 was used for holocellulose content (HC) determination.

ANN adjustments
The networks were developed and trained through Pearson correlation between bamboo stem physical and chemical properties against its energy density.Hereafter, delay and burden for estimation of such variables was analyzed.According to Draper and Smith (1998), this type of modeling is justified when, instead of using difficult-to-obtain variables, estimates can be attained by easily accessible variables and under the pre-established requisites.
By adjusting ANNs, the numerical variables were linearly normalized within a range of 0 to 1 (Heaton, 2011).Input layer consisted of a single neuron (1), which stands for the basic density of the species as a function of the output variable.As an output, bamboo energy density was used.
Besides, the networks had one hidden layer.In fact, most of the time networks require at least one hidden layer to solve non-linearly separable problems (Oliveira-Esquerre et al., 2002).The number of neurons in this layer was optimized by the Intelligent Problem Solver (IPS) tool of Statistica 7.0 software (StaSoft Inc., 2005), using a sigmoidal activation function.
This sigmoid activation function is the most usual in ANN training, being differentiable if compared to the others.In a well-drawn network layout, any continuous function could be approximated with precision (Ismailov, 2014).It is mathematically expressed as: Where  = sigmoid activation function; β = estimate of the parameter for the sigmoidal function inclination; u = function activation potential.
The ANN key element is an artificial neuron.It is responsible for information processing after receiving values of operating parameters as input (basic density), returning the desired results as output (energy density).According Wang et al. (2010), this neuron a simplification of a biological neuron, and its basic properties consist of connection-based information matching and reproduction.Such connections may be composed of "n" inputs x1, x2, ..., xn (dendrites) and an output y (axon).The inputs receive weights w1, w2, ..., wn, which represent the synapses that might be negative or positive.Mathematically, this artificial neuron is represented by: Where Yk = artificial neuron output;  = activation function; Vk= linear combiner output, in other words: ( Where Vk= linear combiner. Nevertheless, when working with ANN modeling, there is a potential problem of overfitting, which consists of an exaggerated learning of information from the database provided to the network.This way, the ANN becomes extremely trained on this information set and it starts to capture noise (errors) instead of the underlying relationships.Shortly, this overfitted network will not be able to be used in the entire sample data, since its generalization capacity was affected.Another consideration to be taken is regarding the selection of the training algorithm.This factor interferes particularly with the move out of local minimum.A good algorithm should have a high capability for local search and global search (scanning).A training algorithm is defined as a set of well-defined rules for solving a learning problem.In the present study, the training algorithm was resilient propagation, as proposed by Riedmiller and Braun (1993), being one of the most efficient and recommended for Multilayer Perceptron ANNs (MLP-ANNs).
In this type of algorithm, weights are based on information contained in the current data.For this, an individual updating value is entered for each weight.Initially, weights of all networks were randomly generated (Heaton, 2011).Then, this individual updating value evolves during the learning process, based on the error function.Therefore, training persists until the error rate is shortened to an acceptable rate or until the maximum number of times or cycles are reached (Shiblee et al., 2010).
Network learning was of the supervised type since two sets of values were given to the network: a set of input values (basic density) and another of output values (energy density).The training consisted of an optimization of a problem related to the network parameters (synaptic weights), which aimed to respond to the inputs as expected, as well as extrapolating the same behavior to other unpredicted inputs until the error between the output patterns reaches the desired minimum values (Haykin, 2001).

ANN training
One hundred multilayer perceptron ANNs (MLPs) were trained.In this type of ANN, there are at least two different layers (Serpen and Gao, 2014).There are several procedures to determine the stopping point of a training process.Parallel to this, certain cares must be taken, once an excessive number of cycles can lead to network loss of generalization power (overfitting).Also, with a small number of cycles, the network may not reach its best performance (underfitting).These problems were eliminated by adopting an average quadratic error below 1% as training stopping criterion or, when the root mean square error (RMSE) increased again as suggested by Chen et al. (2014).Therefore, the training was finalized when one of the criteria was reached, and the best network to estimating bamboo energy density was then selected.The ANN adjustments were made using the Statistica 7.0 software (StaSoft Inc, 2005).

ANN validation
Eighty-one samples were tested to validate the neural network efficacy in predicting bamboo energy density as a function of basic density.These samples were collected in different stem positions (base, middle and top) of several plants, and for plants of different ages.From them, 60 samples (85%) were randomly selected to make part of adjustments; however, only 21 of them (25%) were used for validations.It is noteworthy that the same drawing of lots selected plants for both data adjustment and validation for the various stems, positions and ages.The 21 sample units described above were not part of the adjustment database, as suggested by Zucchini (2000).This author commented that validation samples must be independent.Moreover, Gujarati and Porter (2011) recommended that these samples should meet the modeling precepts, wherein nearly 10 to 30% of the samples composing the database should be directed to validation of the adjusted equations.
For ANN selection, traditional criteria were adopted to verify the goodness of fit, such as coefficient of determination (R2), estimate standard error (Syx%), and graphical analysis of residues.For validation, criteria consisted of t-test for pairwise data, estimate standard error (Syx%), aggregate difference (Da%), and absolute mean error (Ei).

RESULTS AND DISCUSSION
Table 1 shows the correlation values between physical and chemical properties of bamboo, as well as their  significance.Although, the variables such as ash content, fixed carbon, superior calorific value, and total lignin show significant correlations to energy density, only basic density was used as a predictive variable.This fact is justified by its high correlation with the energy density (Table 1), besides being more easily estimated as compared to the other variables.
After training 100 ANNs, the one with the best performance was selected, which showed a 1-3-1 architecture, that is, a network with three layers and five neurons (Figure 1).It was also noticed that good fits were achieved for ANNs predicting the bamboo energy density with low standard errors (absolute Syx = 159.68 and Syx% = 1.52) and high coefficients of determination (R² = 0.98).
Even though all adjustment estimators were good for selecting models, the graphical analysis of residues was fundamental for choosing an equation applied in the forest sciences.This is because trend errors are to occur within a certain amplitude of a variable of interest, without being detected by statistics measuring accuracy.Figure 2 displays the behavior of the ANN in predicting values of bamboo energy density as a function of basic density as compared to actual energy density and residual distribution.
Figure 2 highlights that the ANN was able to predict  reliably the energy density with residual errors below 5%.
Along with accuracy statistics (Syx: 1.45% and R²: 0.98), we may infer that using basic density as the predictor variable and ANNs as modeling tools, an effective estimate of bamboo energy density was obtained.ANN reliability was tested by comparing the values of energy density estimated by them with real values obtained in the laboratory.Of the total sample units (81), 25% (21) were randomly separated for this validation.Validation criteria for ANN adherence to the dataset were t-test for pairwise data, estimate standard error (Syx%), aggregate difference (Da) and absolute mean error (Ei), as seen in Table 2.
When set against the tabulated values, t-statistics showed no significance at 95% probability (Table 2), leading to the acceptance of the null hypothesis, therefore, using the ANN to estimate bamboo energy density, with basic density as a predictor variable, was valid and reliable.For a thorough analysis, other statistical parameters, related to the behavior of ANN against the validation sample, were gathered in Table 2.
It is noteworthy mention that the ANN previously selected during the fitting to predict bamboo energy density values presented the same behavior for estimate standard error (Syx%).Thus, this result corroborates the statements of Serpen and Gao (2014) who mentioned the efficiency of ANNs in learning and generalizing data and forms.These authors claimed that ANNs could extract standards from a given database and reapply then in others with great precision.
The aggregate difference (Da) is a statistic value used as a model fit index and corresponds to the difference between the sum of observed values and the sum of estimated ones.This index acts as an indicating criterion for sub or overestimations, and here expressed in percentage for a better visualization.The ANN developed to predict energy density showed values of 0.13%, characterizing thus an underestimation of this variable.However, it is evidenced, the adaptability of the ANN in predicting such property of bamboo, since the value of the aggregate difference was very low.
The mean errors (Ei) generated by the ANN were also analyzed.Values close to zero are desirable, whenever possible, as they show the ability of the network in estimating the variables of interest with accuracy.The mean error (Ei) generated by the ANN was 62.70 MJ.m-³, in percentage, and it corresponds to 0.14%.Again, we may state that low mean errors evidence the potential of an ANN in the learning and predictability of a variable.
Seeking greater accuracy and precision, a most detailed graphical analysis of the residues was then used throughout the amplitude of the variable of interest for validation data, as shown in Figure 3.We could ascertain a satisfactory behavior of the ANN behavior since residue distribution was compact (± 5%) and homogeneous, without critical trend points.Therefore, this ANN can be considered as accurate and valid to estimate bamboo energy density using only basic density as a predictor variable.This fact had already been proven by the t-test.Artificial intelligence has great potential for several applications, with emphasis on engineering and agriculture.However, for its most promising application, Thakare and Singhal (2009) asserted the need for direct relationships between input parameters and the target response, which is defined as the output variable.In these cases, ANNs are developed to achieve a performance typical of a biological system, based on connections of these elements, similar to biological neurons.Also, Egrioglu et al. (2014) mentioned that ANNs have advantages over the conventional techniques, such as generalization, parallelism and the chance of learning, as well as exemption of certain statistical assumptions like data normality or linearity.
Nevertheless, it is worth emphasizing that the results presented here only have validity for the studied species.This is due to variations of structures and physical/chemical compositions in each species.Therefore, further studies must be carried out testing different neural network settings to achieve a greater correlation between predictive data and responses.In this way, it will be possible to enhance accuracy in estimates of variables of interest.Inserting a "species" factor, as a categorical variable, into the input layer becomes an interesting alternative, which may result in a single ANN able of accurately predicting the energetic density of several species within the Bambusa genus.

Conclusions
In multilayer perception, artificial neural networks (MLPANNs) with sigmoidal function for layer activation, the training algorithm 'resilient propagation' associated with basic density, as predictor variable, was accurate and efficient in estimating energy density in stems of B. vulgaris.The results showed no statistical difference from energy density obtained in the laboratory.Therefore, its use is recommended in the prediction of this variable.

Figure 1 .
Figure 1.Architecture of the ANN selected for prediction of the bamboo energy density.

Figure 2 .
Figure 2. Behavior of the ANN in estimating bamboo energy density as compared to actual values, and residual distribution graph.

Figure 3 .
Figure 3. Validation of behavior between actual and estimated data, residual analysis of bamboo energy prediction by means of the neural network interface.

Table 1 .
Correlation between physical and chemical variables of bamboo.

Table 2 .
Real and ANN-estimated values (average, minimum and maximum) for the variable bamboo energy density.