Modeling the apparent volume of bamboo culms from Brazilian plantation

Brazil has a great potential for cultivation of bamboo species. However, there is a scarce research on the biometric relationships in such plants, which may represent a limitation for its management and use. This article aims to evaluate six methods for estimating the total apparent volume of two species of the genus Bambusa, namely: B. oldhamii Munro and B. vulgaris Schrad. Ex J.C. Wendl. The models tested to estimate the apparent volume of the culms were: (1) Form factor; (2) Hush simple entry volume model; (3) Schumacher-Hall double entry volume model; (4) 5 th order polynomial taper function; (5) Taper function using a polynomial with flexible exponents, and (6) Four options of the Data Mining technique (models 6-9). We found that the model 5 provided the best fitting for B. oldhamii and model 3 was the most reliable for B. vulgaris. For both species combined model 5 provided best fitting. Model 4 was also considered satisfactory. We concluded that the model 5 is the most accurate, although models 3 and 4 also generate reliable estimates. The models 3, 4 and 5 may be used by people and companies that cultivate, sell and produce the bamboo species included in this research work.


INTRODUCTION
The term bamboo designates the members of the taxonomic group of large woody grasses belonging to the family Poaceae, subfamily Bambusoideae. Its distribution is wide with occurrence in the tropics, subtropical regions and temperate zones of the world (Scurlock et al., 2000). There are approximately 1,200 species and 90 genera of bamboos in the world (Lobovikov et al., 2007).
Millions of people in the world depend economically on bamboo in some way (Lobovikov et al., 2007;INBAR, 1999). China is the country that holds the highest trade in bamboo products, of whom was formally exported approximately US$ 2 billion in 2010 for the entire world (INBAR, 2010). However, the trade in smaller scale, at local and regional level, is not computed entirely in statistics and involves much informality. There are no statistics on the production and trade of bamboo resources in Brazil.
On the other hand, Brazil has one of the largest reserves of native bamboo (Judziewicz et al., 1999) and *Corresponding author. E-mail: alourencorodrigues@gmail.com, Tel: +55 41 33604264. Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License a large potential for cultivation of indigenous and introduced species. However, the national report of Brazil to FAO (Food and Agriculture Organization of the United Nations), entitled FRA-2010(FAO 2010, observes that there is no reliable information on the area of forests with bamboo in the country, but that exists around 9 million hectares in the southeast Amazon dominated by bamboos. Also it is included in the report, that there is a private planted bamboo area, with 30,000 ha of B. vulgaris, which provides raw material for a paper mill in the Northeast. The document mentions also a growing interest on bamboo, especially for industrial use, which is growing rapidly at the moment. Management and conservation of this important natural resource should be based on reliable information. Although the amount of bamboo marketed worldwide is reported in 20 Mt.ano -1 , it is not clear if this amount represents the dry weight or (more likely) the weight newly harvested. Studies have shown that the moisture content of the culms varies considerably with the seasons and ages of plants, which means that the volume of the culms with the same weight can change according to the environmental conditions. For this reason, the quantification of bamboo resources should be expressed not only by weight unit, but also by volume .
There is scarcity of mathematical models in the international literature for the quantification of stocks of the main species of bamboo. Also there is a great lack of biometric data on the bamboo species in Brazil and scarcity of basic models to estimate apparent volume of culms (that is, the total volume including empty part) or wood volume (that is, the volume of walls of the woody culms). Although the wood volume of bamboos has a major importance on estimating fiber and biomass production, as it represents the effectively useful parts of the plant, the apparent volume of the culms is an important variable for purposes of logistics from the field to the industrialization process, as it comprehends the real dimension of the culms.
Some of the few studies in the country include the work of Nascimento and Della Lucia (1994), which adjusted volume equations for Dendrocalamus giganteus Wall. Ex Munro. Recently, Martins et al. (2013) introduced the use of taper functions to estimate volume of culms of the genus Bambusa. This type of function is largely employed in the forest area to estimate timber volume (Machado et al., 2004;Souza et al., 2008;Fonweban et al., 2010).
These models are usually adjusted by ordinary linear regression. This assumes that the statistical constraints of homoscedasticity of variables, normality and independence of residuals must be satisfied. New forms to model plants' stem volume that do not require the attendance of those assumptions, need to be sought, assuming that it is common to observe data series to which such constraints naturally are not attainable. Sanquetta et al. (2013a) proposed the technique of data mining to model the biomass of trees and the results were promising.
The purpose of this study was to examine some methods of modeling the apparent volume of culms of two major species of bamboo grown in the country. Particularly this paper aimed to evaluate more simple models with direct application of average form factor, regression models for volume and taper, and the technique of data mining.

MATERIALS AND METHODS
The present study was carried out at the Experimental Station of Canguiri, belonging to the Federal University of Parana, located in the municipality of Pinhais, on the First Parana Plateau, metropolitan region of Curitiba, Parana, Brazil, as shown in Figure  1. According to the classification of Köppen, the predominant climate is temperate or subtropical, humid mesothermal (Cfb), with severe winters and cool summer. The rain is present in all the seasons of the year (Ribeiro et al. 2008).
In February 2013 twenty four culms were cut, randomly selected in a planting experimental area deployed in December 2008, of which 12 were individuals of B. vulgaris species and 12 of the B. oldhamii. In the month of May 2014 the sampling was completed by measuring other 36 culms, 18 of each species, completing a total of 60, or 30 for each one.
Diameters and heights were measured along the culms at different sections of 0.5 m, with a slight displacement of the measurement in the sections, when deformities were detected. The diameters were taken with a caliper and heights (lengths) of the culms were measured with a tape. The actual apparent volumes (considering the hollow) of the culms were calculated by the formula of Smalian, considering the sections as small cylinders, except the last, which was calculated by the formula of a cone.
The models to estimate the total apparent volume of culms were the following: Where: d , h and v = as defined previously; a and b = Where: a , b, c, d and z = coefficients to be adjusted, 1 p ….. Taper is a technical term used in forest environment to refer to the profile of the trunk of a tree and is defined as the rate of decrease in diameter along the trunk of the trees (Silva et al., 2011). Also applies to culms of bamboos. By means of the integral of the taper function (Equations 6 and 8) the estimated apparent volume of the culms is obtained.

Data mining:
Where: d(p,q) = Euclidean distance; X1 and X2 = independent variables (in this study: dap and h, respectively), and (p, q) = any combination of two specific values of the variables X1 and X2. For this study, the number of nearest neighbors considered was 3 and 5.
The data mining technique (DM) has as objective the discovery of useful information in a data set (Tan, 2009). This technique, used in learning algorithms, whose metrics can be found in detail in Aha (1991) and Bradzil (2003), is already widespread in several areas and applications. However, its potential has been little explored for modeling of forest resources. The method uses a technique known as cross-validation, in which each instance is compared to the rest of the sample, and selected one with smaller distance. Its estimated volume will be that of the instance whose distance had the lowest value. We have used two variations of the technique with 3 and 5 nearest neighbors, with weighting (1/d) and (1/d 2 ). In this case, the 3 (or 5) trees nearest to a chosen tree were considered and, to give the volume of trees by distance order, the weighting was calculated using the inverse of the distance. Similar to nearest neighbor, once having a tree to estimate the volume, the distance between the vector formed by their dimensions of dbh and height up to all the trees of the sample Baysian Information Criterion or Schwarz Criterion (Schwarz, 1978) Analysis of residuals Source: Burnham and Anderson (2002). Where: n = number of observations; k = number of parameters of the model.
In AIC, AICc and BIC, the value of k must be increased by 1, which refers to one degree of freedom for the variance.
must be calculated. The unknown volume will be the result of this weighting. The adjustments were made for each species separately and for the two sets of clustered data (considering genus). It was used regression by ordinary least squares for methods 2, 3 and 4, and the stepwise method to the adjustment of regression for model 5, whereas variations in powers up to the minimization of the sum of the squares of residuals was used to assess the significance of the coefficients of the regression models. Statistical programs were used for those adjustments. A computer program was specifically built for the method 6.
The models of estimation were evaluated by five general criteria of goodness of fit and by graphical analysis of residuals (Table 1).

RESULTS AND DISCUSSION
The culms of bamboos here analyzed reached approximately 4 cm dbh and height of 7 m on average for the considered ages. The size of the culms of B. vulgaris was larger than those of B. oldhamii in terms of diameter and height, but not in volume, because the second has presented smaller form factor than the first  one (Table 2). There was no difference in terms of hypsometric relationship (height-diameter) between the species (Figure 2a), indicating that the degree of slenderness for both species is almost the same, that is, the relationship between the growth in diameter and height of them is similar.
The ratio h/d was 1.65 and 1.75, for B. oldhamii and B. vulgaris, respectively. These values are higher than those observed in tree species in the study area (Sanquetta et al., 2013b). The decrease in the degree of slenderness with the height increase means that, to every meter that the plants grow in height, they increase more than one centimeter in dbh, becoming more robust and stable (Selle and Vuaden, 2010). The authors have studied the behavior of tree species. It is appropriate to stress that the behavior in terms of static stability is distinct between trees and bamboos, being these proportionally thinner and higher. This justifies the use of bamboo as structural parts in construction, particularly in Asia.
The profiles of the culms of the two species, on the other hand, are very similar, indicating that their taper forms are not distinct from one to another, despite the difference in diameter and height (Figure 2b).
This fact is also confirmed by the values of the artificial form factor of the two species, which are close numerically, with a value slightly higher in B. oldhamii, indicating less taper in comparison to B. vulgaris (Table   3). Therefore, in theory, the data of the two species could be treated in an aggregated form, without considering the species separately.
Culm analyzes are not only interesting from the point of view of plant architecture, but are also relevant for estimating the volume of the culms (Inoue et al., , 2012. As bamboo culms are usually hollow, not only the wood volume has to be analyzed, but also the apparent volume along the culm profile. There are few reports in the literature that describe the profile of culms of bamboos. Inoue and Suga (2009) studied the relationship between the surface of the culms, and other dimensions of them. Inoue (2013) has made a detailed analysis of the form of the culms of Phyllostachys pubescens Mazel ex J. Houz. in Japan. The analysis indicated, for the studied species, that the stem consists of three or four segments with multiple forms. Also he showed that the hypsometric relationship can be expressed by an equation of the straight line in log-log scale. Interpreting the data of the author, it can be assumed that the ratio h/d exceeds the value of 1, considering the general amplitude of diameters and heights, which coincides with the results of this study. Similar results regarding the hypsometric relations can also be inferred from the work of Yen and Lee (2011).
All the coefficients of the adjusted equations, according to the models 1 to 5, were significant (p < 0.05), except for coefficient "f" of the 5 th degree polynomial model (Table 3). aj. and dW with the data of B. oldhamii suggest that the models presented good adjustment (Table 4). However, when examining the values of Syx and Syx% we realize that not all models fitted well, with relative errors reaching values higher than 20% in some cases. The lowest values of Syx% were obtained with the model 5, which also showed the highest values for R 2 aj. and dW, and lowest for AIC and BIC. The models 3 and 4 have approached the model 5 in terms of performance, but the distribution of residuals proved to be more balanced as the application of the model 5 (Figure 3a, b and c). The model 5 was selected as the best.
Adjustments for B. vulgaris presented better performance than those for B. oldhamii for all tested models, considering the majority of the statistics indicating goodness of fit. This occurred due to lower dispersion of data and higher simple linear correlation between diameters and height of the culms with the volumes (0.92 and 0.93, respectively for d-v and h-v in B. oldhamii and 0.94 and 0.96 for B. vulgaris). Syx% reached maximum values around 15% for the adjustments of B. vulgaris. The model 3 presented the best performance in terms of goodness of fit, following closely by models 4 and 5. The graphical distribution of residuals indicates that the models 3 and 4 have presented more balanced dispersions along the estimated line (Figure 3d, e and f). In general, the model 3 can be selected as the best.
Considering the grouped data for the two species, the adjustments lost accuracy, driven by higher dispersion of data and a lower correlation between the dependent and independent variables. The model 5 present the best goodness of fit, followed by 4, with graphical distribution of residuals very similar (Figure 3 g, h and i). The model 5, in general, was selected because of its best performance.
All the estimates with the use of model 1 have presented skewness, despite their general indicators, even though, apparently, they suggest satisfactory adjustments. This fact can only be detected by graphical analysis of residuals (Figure 4a). This is due to the fact that the form factor is not constant for diameter, height, or volume of the culms, on the contrary, tends to form a curving downward, that is, the smaller plants present form factors higher than the taller ones. The adjustment of the model 2 also resulted in residuals distributed in an unacceptable way. The distribution of residuals resulted in a "U" shape ( Figure 4b) suggesting that the relationship between the dependent and independent variable follows a quadratic trend, non-linear, a fact confirmed by the data of both species. Both models should be rejected for estimation of apparent volume of culms of B. oldhamii and B. vulgaris.
Models 3, 4 and 5 could be indicated for application in practice, due to their goodness of fit. However, in general, the model 5 was the one that presented better performance. The taper function using a polynomial with  Hradetzky (1976) is powerful, because it is an extremely flexible polynomial and assumes a variety of different shapes. This model has often been selected as the best to describe the bole profile of different tree species in Brazil (Silva et al., 2011;Teo et al., 2013;Kohler et al., 2013). DM models (6 to 9) have not adjusted well in this study, with indicators of worst quality in comparison to models 3, 4 and 5. Their residuals also have not presented a balanced distribution, showing a trend of heteroscedasticity and greater dispersion. They could not be used accurately in the estimates for B. vulgaris. Its application would be recommended only if the assumptions for using linear regression have not been attained, but this was not the case in this study. Such results using DM were not satisfactory due to the low number of individuals in the sample. As the method uses the information of nearest neighbors, its efficiency improves when we have a larger sample size.
Models for estimating the shape and volume of tree species are widespread in the literature and have great practical application for different purposes in forest science. In Brazil these work are extremely limited. There is more research directed to estimate biomass and productivity (Bonilla, 1991;Mendes, 2005;Silva, 2008;Dallagnol et al., 2013). Even in international context, works on modeling are quite restricted. The amount of research is much greater on the use of bamboo than on its management.
Studies of greater prominence in this field are those of Japanese researchers, who developed scientific works related to different species of the genus Phyllostachys (Watanabe et al., 1980(Watanabe et al., , 1989Inoue et al., 2011Inoue et al., , 2012. Suga et al. (2011) developed the theory and applications in estimating volume of P. pubescens in Japan. The equation for volume was obtained from the assumptions that the culm forms could be expressed by Kunze's equation and the form factors in two different heights of the culms are stable regardless of their size and, because of this, they have proposed the so called: The bidirectional volume equation. Equations for length and volume of culms for Guadua angustifolia were adjusted   There are no comparative studies between different methodological approaches to estimate volume of bamboo culms, even in the countries of the East, which have the greatest tradition in the management and use of bamboos.
In this study it was found that the model 5 has more predictive quality. However, mathematically, is the most complex, it is not possible to estimate the coefficients and the exponents of the taper model by ordinary linear regression, which demands using the stepwise method with attempts for different power variations. In the same way, it is necessary to integrate mathematically the taper function in order to calculate the volume. This procedure is still more complex, requiring specific software, because the resulting mathematical operation is not obvious. The model 4 presents mathematical properties similar to model 5. The unique feature of this model in relation to its competitor is that the coefficients can be obtained by ordinary linear regression. The integration of the 5 th order polynomial taper function is also a little simpler.
Models 1 and 3 are the simplest among all tested, but they have not adjusted well to the data of the present study and, when of its application, they have generated inconsistent estimates. The model 3 is relatively simpler when compared to model 5, since after obtaining the coefficients a, b and c, one needs just enter dbh and height of the culms in the equation to get the apparent volume. The coefficients, in this case, are easily obtained by linear regression solved by the least squares method. The DM models showed poor goodness of fit to data of this study. Only 4 DM models were tested. Possibly other models with different distances, weights and number of nearest neighbors could get more interesting results. This is a lesser-known technique for estimation of volume of stems or culms of plants, but may be more applied only when the assumptions for the application of the regression method are not met.

Conclusions
Of the nine models for estimation of apparent volume of culms of bamboo of the two studied species, three may be considered satisfactory: Schumacher-Hall double entry volume model (3); 5 th order polynomial taper function (4); and Taper function using a polynomial with flexible exponents (5), being the last model with the Sanquetta et al. 3985 best predictive quality; The model 5 is more complex mathematically, which could represent an obstacle to its use in practice. However, with the statistical, mathematical and computational resources currently available, their application is fully viable, generating reliable estimates of total apparent volumes of culms of B. oldhamii and B. vulgaris, besides the possibility of calculating partial volumes in different lengths and diameters. The model 3 can be a simpler alternative, representing more appropriate compromise between simplicity and accuracy, because it only requires the obtainment of three coefficients by ordinary linear regression method. However, it is appropriate only to estimate the total apparent volume. The models 3, 4 and 5 may be used by people and companies that cultivate, sell and produce the bamboo species included in this research work.