Critical variables for estimating productivity in maize as a function of plant population and spacing

The objective of this study was to find a group of independent variables that would influence and estimate maize (Zea mays L.) productivity, modeled by multiple linear regression. For that, an experimental delinquency in random order was used in a 2 × 2 factorial scheme, from two populations (45,000 and 65,000 ha -1 plants) and two spacings (0.45 and 0.90 m), with 20 replicates. Soil attributes and maize production components were evaluated. The soil attributes evaluated were bulk density, macroporosity, microporosity, total porosity, soil moisture and mechanical resistance to penetration, at depths of 0-0.15 and 0.15-0.30 m. The maize production components were plant height (PH), height of the first ear insertion (HEI), stalk diameter (SD), number of rows per ear (NRE) and number of grains per row (NGR). There was a positive correlation between the variables and production per hectare, except for grain moisture, soil moisture, macroporosity (0.15-0.30 m) and microporosity (0.00-0.15 m). The number of ears per hectare, the number of grains per row and the 100-grain weight served to estimate maize productivity. The methodology applied in this study was adequate for estimating production with an accuracy of 98% and can be applied to other experiments.

The physical attributes of the soil have been considered by some authors as indicators of the differences between areas under different management systems (Carneiro et al., 2009).According to Nascimento et al. (2014), one of the peculiarities of agriculture is the treatment adopted for the management of agricultural areas, which considers the production area in a homogeneous way, thereby disregarding the natural variability that occurs in areas of production.
The arrangement of maize plants through changes in population density, spacing between rows, plant distribution along the row, and plant variability, is one of the most important management practices for maximising the interception of solar radiation, optimising its use and strengthening grain yield (Fantin et al., 2016).
The use of multivariate statistical techniques evaluates variables simultaneously, identifying those having a real power of discrimination and giving an understanding of the relationships between the variables and the groups of quality class that they form (Gerhard et al., 2001).
Research has been carried out on the maize crop in order to identify the direction and intensity of linear relationships between variables or characteristics (Toebe and Cargnelutti Filho, 2013).Freddi et al. (2008), using a technique of multivariate analysis in evaluating principle components, found that high rates of maize productivity proved to be correlated with good growth in the aerial part of plants under conditions of lower soil density, giving high values for dry matter production of the roots, albeit, of small diameter.
Many farmers seek an estimate of productivity before the harvest, as they can then use the production forecast to assess their future transportation and storage needs for the product, as well as likely profits in the marketplace.Productivity estimates are useful for comparisons in trials of hybrids/varieties, for checking production variability in any one area or between different areas, or for comparing different management practices (Rodrigues et al., 2005).Reetz model, Emater-MG (2000) method used by Rodrigues et al. (2005) and Bernardon (2005) in their original sources do not require a statistical or mathematical explanation to demonstrate their appearance.These methods are very practical for producers, whereas other methods used by Holzman et al. (2014) are more sophisticated and hardly accessible to small and medium producer, despite making predictions well in advance.A method or mathematical model is therefore necessary that is practical and accessible, with a statistical or mathematical construction that attests to its appearance, so that data from future experiments can better represent reality and assist the producer in his planning.
The aim of this study was to find a group of independent variables that would influence and estimate productivity in maize (Z.mays L.), modelled by multiple linear regression.

MATERIALS AND METHODS
The study was carried out in an experimental area belonging to the Department of Agricultural Engineering of the Federal University of Ceará in Fortaleza in the State of Ceará, Brazil, in a Yellow Red Argisol (EMBRAPA, 2013), located at 03°43'S and 38°32'W at an altitude of 19 m.
According to Koppen classification, climate region is type Aw, tropical rainy with precipitation in summer-autumn and annual average temperatures of 28°C and precipitation of 900 mm.
The experimental area is in the initial stages of setting up a notillage system.In November 2014, the forages crotalaria, sorghum and Mombasa grass were planted to form straw for sowing maize in March 2015.
During cultivation of maize, additional irrigation was given using a conventional sprinkler system.ET0 was calculated by a class A tank, installed on grass with a border of 100 m; the tank coefficients were obtained with Doorenbos and Pruitt (1977) model.In ETc calculating, the Kc for different phenological stages of crop was used, which varied between 0.2 and 1.6, as per Guerra et al. (2004).
Maize seed used in the experiment was Al Avaré cultivar, considered low to high technological cultivar, with 98% purity and 85% survival, aiming for one population of 65,000 plants per hectare, with a spacing between rows of 0.90 m at a sowing density of six seeds per metre, and for another population of 45,000 plants per hectare, with a spacing of 0.45 m between rows at a sowing density of three seeds per metre.
During cultivation of maize crop, base and cover fertilisation were carried out based on chemical analysis of soil; for base fertiliser, 250 kg ha -1 of NPK 10-28-20 commercial formulation was used.Cover fertilisation was carried out during the V2, V4 and V8 stages of maize, using 300 kg ha -1 urea and 120 kg ha -1 potassium chloride.To control the presence of fall armyworm (Spodoptera frugiperda), four applications of Lufenuron (a.i.) product at a dose of 18 g ha -1 of active ingredient, and Lannat BR phosphorous insecticide (active ingredient: methomyl) were given at the V4, V8, V12 and R1 stages.Sowing was manual, following the order of treatments.
The soil attributes and maize production components were evaluated.The soil attributes evaluated were bulk density, macroporosity, microporosity, total porosity, soil moisture and mechanical resistance to penetration, at depths of 0-0.15 and 0.15-0.30m.The maize production components were plant height (PH), height of the first ear insertion (HEI), stalk diameter (SD), number of rows per ear (NRE) and number of grains per row (NGR), with the data collected from 10 plants in the working area to determine the mean value of each treatment for these variables.The value for the number of ears per hectare (NEH), production per hectare in kg (PPH) and maize dry matter (DM) were collected in the working area, with the values for grain moisture (GM), the emergence speed index (ESI), and100-grain weight (HGW) being estimated per hectare.
For PH, ten plants were selected from working area.The HEI was determined by measuring from the ground surface to the first ear insertion using a tape measure.The SD was obtained by means of a digital calliper, calculating the average for largest and smallest diameters measured at internode located above first node of adventitious roots.The NRE was obtained by counting number of rows in ten ears from each plot.To determine the NGR, the number of grains per row was again counted in ten ears from each plot, to obtain the average for each treatment.The ESI was calculated from daily counts according to methodology proposed by Maguire (1962), after applying emergence test.The HSW was obtained according to Seed Analysis Rules (BRASIL, 2009).
DM was determined by cutting the maize plants 2 cm above ground surface in working area, all the plants were then weighed, subtracting grains weight after thrashing, thereby obtaining mass weight of green straw in grams.To determine PPH, all ears in working area were collected followed by threshing.
Initially, using statistical planning, the minimum number of samples for a normal distribution of data was calculated.The statistical methodology adopted allows verification of the number of samples necessary for data normality in the experiment by the standard mean error.Immediately afterwards, with the standard mean error available, a 10% β error was considered, and by means of a graph of operating characteristic curves, the number of samples to be used in the evaluations was found (Montgomery, 2004).The minimum number found was 10 samples for each treatment; however, with the idea of increasing data normality, a standard number of 20 samples was determined for each replication.
The soil attribute data were submitted for analysis of variance, in which data that presented no significant difference between treatments were eliminated; differences between mean values were compared by Tukey's test at 5% significance.With the remaining variables, a correlation matrix was prepared, and any variables that had no significant correlation with production per hectare, were eliminated.
Multiple linear regression modelling was then performed, evaluating each assumption, making the necessary variable transformations and verifying the impact of each variable on production using the Stepwise method for modelling.The last step was model validation using data from experiments carried out in the same area by Santos et al. (2017) and Nicolau (2016), to check the estimating power.All the analyses were carried out using the SPSS Statistics v.22 software (IBM).
The first assumption was validated using Durbin-Watson statistic (d=1.887).The VIF was used to diagnose multicollinearity, where all the variables were at the acceptable level (1.011≤ VIF ≤1.412).Residual normality was validated by Kolmogorov-Smirnov and Shapiro-Wilk tests (Table 2), neither of which rejected the null hypothesis.Homoscedasticity was also checked (Figure 1) as per Mâroco and Pinheiro (2014).Linearity of the coefficients is guaranteed by the adopted model, in this case the least-squares method (Corrar et al., 2012).The sample size was within the desired range, with 20 replications (observations) for each treatment, for a total of 80 observations (Hair, 2009).
Multiple linear regression identified the variables Log (ears per hectare), Log (number of grain per row), Log (100-grain weight) and Log (number of rows) as significant predictors of Log (production per hectare), as shown in Table 3.
This model is highly significant and explains a high proportion of the variation in Log (production per hectare) (Table 4).In this experiment, plant height was not included in the model; the same results were found by Mourtzinis et al. (2013), where plant height was not included with any significant predictor.Kappes et al. (2017) found no correlation between plant height and grain productivity.
The next step was to verify by means of the model, the closeness of the productivity estimate to the actual productivity.Data from the experiment carried out by Santos et al. (2017) were used for this, as shown in Table 5.
Nicolau ( 2016) also carried out research with maize in the same experimental area as the present work, as described in Table 6.The Shapiro-Wilk test was used for normality, as there were less than 30 samples (Mâroco and Pinheiro, 2014).

DISCUSSION
In their research, Harrell et al. (1996) and Babyak (2004) found that the inclusion of variables that have no influence on dependent variable still presents problems of multicollinearity with other independent variables, which can lead to problems of overfitting, and should therefore be avoided.Including these variables in the analysis consequently makes no sense, as previously shown by Siqueira et al. (2008), Oliveira Júnior et al. ( 2010) and Nascimento et al. (2014).
Based on Pearson correlation analysis (Table 1), the variables that had no significant correlation with production per hectare were eliminated, including grain moisture, macroporosity (0.15-0.30m), microporosity (0.0-0.15 m) and stem diameter, in relation to the last variable having no significant correlation.Mourtzinis et al. (2013) found similar results in their linear regression model for grain yield; stem diameter was not included with a significant predictor.Kappes et al. (2017) found the same result and attributed this to the crop being well supplied by nutrients from soil and applied fertilisers, and little dependent on translocation of nutrients from stem to grains; stem diameter is considered an important characteristic of organ used to store    photoassimilates that contribute to grain filling.Grain moisture, soil moisture at both depths, macroporosity (0.15-0.30m) and microporosity (0.00-0.15 m) correlated negatively with production per hectare; other variables however had a positive correlation.It was found that plants per hectare, number of grains per row and number of ears per hectare, correlated with production per hectare.Despite these variables correlating with production, the correlation was weak; but this may be associated with the fact that these data come from agricultural experiments, with little experimental control over such factors as rainfall, and physical, chemical and biological conditions of soil, in addition to all of these factors being variable over time and under intemperate conditions, according to Pimentel- Gomes (2009).
To carry out the multiple linear regression, a preliminary regression was made, where the assumptions were observed, and some seen to be violated; each variable was therefore transformed by the base ten logarithmic function, as suggested by Hair (2009) for the problem of homoscedasticity.Multiple linear regression employing a stepwise selection of variables was used to obtain a parsimonious model that would predict the production per hectare as a function of independent variables (ears per hectare, number of grains per row, 100-grain weight and number of rows).The assumptions of model were analysed, namely the absence of serial autocorrelation between residuals, multicollinearity between independent variables, residual normality, homoscedasticity of residuals and coefficients linearity.
The final model found for Log (production per hectare) = -4.866+ 1.009 Log (ears per hectare) + 0.979 Log (number of grains per row) + 0.943 Log (100-grain weight) + 0.940 Log (number of rows).Observing the beta coefficients (Table 7) demonstrates the importance of production: Log (ears per hectare), Log (number of grains per row), Log (100-grain weight) and Log (number of rows).This implies that number of ears per hectare has a very strong impact on production.The increase in grain productivity due to increase in population can be explained by adjustment in plant development as a function of population density.Therefore, at low densities, individual plant production is generally high, but productivity per area is small, as verified Vian et al. (2016), who found that component that best correlated with productivity in a maize crop in an irrigated area with adequate spatial plant uniformity was the number of ears per area.
These variables explain around 0.98 of the variability in production, agreeing with Vian et al. (2016), who reported that in the 2011/2012 crop, the coefficient of determination of production components explained 0.90 of variation in grain productivity.The number of ears per area, 100grain weight, number of grains per ear and number of grains per row had a direct effect on productivity, with correlations classified as high (0.65 and 0.54) for the first two variables and low (0.26 and 0.23) for the last two variables, respectively.For the other variables that were not included, such as the physical attributes of the soil and the remaining agronomical components, it is possible that in this experiment the other variables had a greater impact on the final regression model, as shown by Mourtzinis et al. (2013).To apply the logarithmic properties to the multiple regression the following expression was found: Log(PH) = -4.866+ 1.009Log(NEH) + 0.979Log(NGR) + 0.943(HGW) + 0.940Log(NRE) Therefore, PH = 13.6 x10 (-6) × (NEH) 1.009 × (NGR) 0.979 ×(NRE) 0.940 × (HGW) There are various methods for estimating productivity, among them Reetz (1987) andEmater-MG method (2000), which considered some of the variables found in t model.Bernardon (2005) employs a model that uses some of these same variables, but does not refer to the appearance of the model in his work; for these three cases, there is no mathematical argument showing the construction of these models.
The multiple regression should be tested for each region, to adapt the constants because of changes that can occur in the mean values of these variables due to genetic or environmental factors, as verified by Vázquez et al. (2012), Menezes et al. (2015) and DuoBu et al. (2013).For the producer, these methods for estimating productivity are more practical and economical compared to methods used by Holzman et al. (2014) and Li et al. (2014).Calibration was by solving a linear system for unknowns A, B, C, D and E using the equation: (2) using earlier data or previously constructing pilot projects from which the values for Log (PPH), Log (NEH), Log (NGR), Log (HGW) and Log (NRE) can be obtained, thus a linear system can be set up for the unknowns A, B, C, D, and E.
Both the estimated and actual data achieved normality (Tables 5 and 6); the paired t-test was then applied, as it was the most appropriate in this case (Mâroco and Pinheiro, 2014).
It was found that the expression was good at estimating productivity, irrespective of the type of data collection or management, although a significant difference in mean values was seen for pair 14, which can be avoided by calibrating the model (Table 7).Santos et al. (2017) Production per hectare in kg; NEH = Number of ears per hectare; NGR = Number of grains per row per ear; HGW= 100-grain weight and NRE = Number of rows per ear.

Table 2 .
Test of residual normality.

Table 3 .
Coefficients for the dependent variable, Log (production per hectare in kg)(a).

Table 4 .
Model summary for the dependent variable, yield per hectare(e).

Table 5 .
Test of normality of the productivity estimate*.
Santos et al. (2017)rcropped with the maize, sown on the same day as the maize; BS2: Brachiaria intercropped with the maize, sown at stage V4 in the maize; MS1: Mombasa grass intercropped with the maize, sown on the same day as the maize; MS2: Mombasa grass intercropped with the maize, sown at stage V4 in the maize; CS1: Crotalaria intercropped with the maize, sown on the same day as the maize; CS2: Crotalaria intercropped with the maize, sown at stage V4 in the maize; T: Monocropped maize, control.*Source:Santosetal. (2017).

Table 6 .
Test of normality of the productivity estimate**.

Table 7 .
Mean values for comparing productivity between pairs.
SV: Source of variation; BS1: Brachiaria intercropped with the maize, sown on the same day as the maize; BS2: Brachiaria intercropped with the maize, sown at stage V4 in the maize; MS1: Mombasa grass intercropped with the maize, sown on the same day as the maize; MS2: Mombasa grass intercropped with the maize, sown at stage V4 in the maize; CS1: Crotalaria intercropped with the maize, sown on the same day as the maize; CS2: Crotalaria intercropped with the maize, sown at stage V4 in the maize; T: Monocropped maize, control; M1: disk mechanism; M2: shaft mechanism; C1: Cover of Crotalaria; C2: Cover of Mombasa grass; C3 : Covering with sorghum; T1: corn on bare soil with disc; T2: corn on bare ground with stem; t: Paired t-test at 5%