Comparison of spatial interpolators for variability analysis of soil chemical properties in Cuamba (Mozambique)

The knowledge of spatial distribution of soil attributes, particularly chemical ones, which is very relevant for agricultural planning. Several studies have focused on spatial interpolation of soil properties, but only a few of them have been undertaken in sub-Saharan Africa. This study aims to analyse the spatial variability of hydrogen potential (pH) and electric conductivity (EC) within an agricultural region in Cuamba district of Mozambique. Efficiency of a deterministic and a stochastic interpolator were compared, namely Inverse Distance Weighting (IDW) and Ordinary kriging, respectively. Soil samples were collected at random locations scattered through the study region, and were later analyzed in water and soil laboratory. These point data were then used to produce interpolated surfaces of soil chemical properties. Efficiency of spatial interpolation methods was assessed based on prediction errors’ statistics derived from cross-validation. Results show that ordinary kriging was less biased and more accurate than IDW at samples’ locations. Hence, maps produced using the former method are a valuable contribution for the spatial characterization of soil quality, according to its chemical properties. Considering the spatial patterns of pH, southeast area is characterized by clayey soils, which has a high fertility potential for food crops.


INTRODUCTION
Obtaining a correct spatial distribution, various soil attributes is very relevant in agricultural planning (Kravchenko, 2003;Mueller et al., 2001), particularly for planting and maintenance of crops. Spatial distribution of soil attributes may affect various elements such as hydrologic responses and yield potential. Moreover, if there is detailed knowledge about spatial variability of different attributes, local application of nutrients and fertilizers, or even lime, can be optimized and more effective, thereby improving production system (Behera  Souza et al., 2010).
High cost and amount of time required for soil samples collection has motivated researchers to develop methods for creating soil maps from sparse soil data (Bishop and McBratney, 2001). Geostatistics techniques have been increasingly applied in the assessment of spatial variability of parameters of interest for agrarian sciences, thus allowing for mapping, quantification and modeling of continuous phenomena through interpolation of sampled points in space (Goovaerts, 1999). However, challenges still exist in determining the best method of interpolation for that purpose (de Moura Guerreiro et al., 2017;Piccini et al., 2014;Reza et al., 2010).
There are only a few studies characterizing the spatial variability of soil characteristics in Mozambique, particularly in northern provinces which are among the most fertile country areas (e.g., Nankani et al., 2006). Some of available studies are reported by Maria and Yost (2006) and Maria (2004) regarding the study of soil nutrient status, spatial variability of soil chemical properties and fertilization requirements in Cabo Delgado, Nampula and Manica provinces of Mozambique, and a survey of soil fertility status of four agroecological zones of Mozambique, respectively. Moreover, to best of our knowledge, little or no documentation exists on the comparison of spatial interpolators for soil properties in Mozambique.
The main objective of this study was to characterize the spatial variability of hydrogen potential (pH) and electric conductivity (EC) in Administrative Post of Cuamba in Mozambique. In order to obtain suitable interpolated surfaces of these soil properties, we assessed the efficiency of Inverse Distance Weighting (IDW) and Ordinary Kriging (OK). Spatial patterns of pH and electric conductivity may provide valuable information for assessing soil condition for plant growth.

Study region
Republic of Mozambique lies on the eastern coast of Southern Africa. Country region was grouped into ten agroecological zones and four zonal research centers (Walker et al., 2006), but it can be broadly divided into three geographical regions (Batidzirai et al., 2006;Nankani et al., 2006): North (Niassa, Cabo Delgado, and Nampula), Center (Zambézia, Tete, Manica, and Sofala) and South (Inhambane, Gaza, and Maputo province). Agroecological zones differ from each other in terms of physiographic, climatic and soil characteristics. The northern region covers approximately 50% of the total land area, corresponding to agroecological zones R7, R8, R9, and R10, and supports about 33% of the country's population.
This region is characterized by high population growth and land use pressure. Low natural forest productivity associated with wide unproductive land abandoned by farmers have encouraged Mozambique's government to promote exotic forest plantation for companies, that plant large-scale monoculture (Mbanze et al., 2013). This study was carried out at Administrative Post of Cuamba ( Figure  1), which is situated in District of Cuamba, south of Niassa Province, and classified as agroecological zone R7. Administrative Post of Cuamba shares borders with Administrative Post of Lúrio in the east, Administrative Post of Etarara in the southeast, and with Administrative Posts of Mitande and Metarica in the north. The study region has an area of 216 215 ha, where altitudes vary from 200 to 500 m above sea level. It has a wavy relief that is sometimes interrupted by rocky formations. This region is physiographically characterized by a low plateau area that gradually passes to a more dissected relief with steeper slopes, starting from sub-planaltic zone of transition to lateral zone.
In Cuamba region, as in most of northern country part, arid and dry weather and sub-humid dry, where average annual rainfall can vary from 800 to 1500 mm. As for mean temperature during plants growing period, there are some areas that exceed 25°C, although this may vary annually between 20 and 25°C on average (Marques et al., 2001).
Physiography of region is dominated by alternation of inter-rivers valleys that may alternate with dambos (complex shallow wetlands) due to their width, depth and position in relation to rivers. River's valleys are dominated by alluvial dark soils that are characterized by a dense to moderate texture, which are moderately or poorly drained and subject to regular flooding (MAE, 2005). Hydrogeomorphic soils of varying texture can be found in dambos. On slopes of intermediate interfluves, soils are moderately well drained, varying in color from brown to yellowish, and have a clayey texture (Marques et al., 2001). In Cuamba district, total cultivated area is approximately 49 000 ha and number of agricultural explorations is almost 31 000 (INE, 2012), thus average land size under cultivation is about 1.58 ha per farmer. According to Lukanu et al. (2004), besides cropping, farmers also obtain their food and income through livestock such as goats, chickens, ducks, pigs, sheep and doves.

Data
In order to know the geographic location where soil samples should be collected, 40 random points were generated within Administrative Post of Cuamba ( Figure 2). During the collection of soil samples, a GPS (Global Positioning System) locator was used to determine the direction and data of geographic location, of each point that was previously defined. However, it was impossible to obtain samples at all points due to inaccessibility of certain locations.   Hence, only 33 soil samples were collected. Soil samples were collected using a hand probe of 18 cm. Values of pH and EC, were obtained through laboratory analyses of those soil samples using a pHmeter and a Conductometer, respectively.

Spatial interpolators
A flow chart summary of all procedures undertaken is shown in Figure 3. Data of pH and EC were organized and processed in order to evaluate interpolation efficiency of two methods. Values of IDW and OK were obtained using the geostatistical analyst extension of ArcGIS © 10.2. Before implementing spatial interpolation, an exploratory data analysis was undertaken aiming to uncover underlying patterns of soil attributes, as well as to detect possible outliers and anomalies, that could influence spatial analysis efficiency.
IDW is a deterministic spatial interpolator that accounts for distance relationships only. A predicted value of the attribute is computed as a weighted sum of data points located within a given search neighborhood, centered at the location of the point being predicted. Weight values are computed based on the inverse of distance (raised to some power) between data points and location being predicted.
Smoothness of the predicted surface increases as the power parameter value increases. Hence, we considered a power of two, which is the most common value of this parameter. In this case, the method is sometimes named Inverse Square Distance (ISD). Kriging methods are a family of stochastic spatial interpolators used to predict values of a random field at an unobserved location, using a set of samples from neighboring locations. Kriging estimators vary depending on the adopted model for trend, given by the expression ( ) in the general linear regression estimator ( ) of Z attribute, defined as (Goovaerts, 1999): (1) where is the predicted location and are samples' locations; ( ) is the weight assigned to datum ( ); ( ) and ( ) are expected values of the random variables ( ) and ( ); ( ) is the number of samples closest to location being predicted. OK accounts for local variations of mean ( ) by limiting its domain of stationarity to a local neighborhood. Unlike deterministic interpolators, kriging assigns optimal weights to each sample point considering not only the distance between the samples and the unknown point, but also the spatial autocorrelation of the attribute. Kriging weights are obtained by solving the kriging system. When developing the system's equations, the semivariogram model is assumed known. The semivariogram evaluates the dispersion of values as a function of distance and, under stationarity hypotheses, it is the inverse function of spatial covariances.
Considering Tobler's first law of geography, nearby points in space tend to have values more similar than points farther away, thus it is expected that the variability (dispersion) of values increases with increasing distance between points' locations. While spatial autocorrelation decreases with increasing distance between  points, semivariogram increases with increasing distance between points. The semivariogram model used in the kriging process must guarantee that kriging equations are solvable (non-singular). Therefore, a mathematical semivariogram model is usually selected from a set of authorized ones and it is fitted to experimental semivariogram values calculated from data for given angular and distance classes. Most common models are spherical, exponential and gaussian (da Silva Gualberto et al., 2016;de Moura Guerreiro et al., 2017;Hooshm et al., 2011;Robinson and Metternicht, 2006). A nugget effect model, which is a constant structure (C0), can also be added to create a linear model of regionalization. The nugget effect model, represents a discontinuity at origin due to small-scale variation. On its own it would represent a purely random variable, with no spatial correlation. Data posting maps did not show evidence of anisotropy in attributes, thus omnidirectional experimental semivariograms were modeled. This means that spatial autocorrelation of both attributes was assumed identical in all directions (i.e isotropic). Theoretical semivariograms adjusted for pH ( Figure 4) and EC ( Figure 5) were spherical models: where ̂( ) is the estimated semivariance for h distance between observations; a is the range parameter that takes values 7 and 9 km for pH and EC, respectively; C1 is the sill parameter that takes values 0.390 and 2900 for pH and EC, respectively. The spherical model shows a progressive decrease of spatial dependence of the attribute until a distance (a) at which there is no spatial dependence. In both cases, we chose not to include a nugget effect model (C0), because there was no reason to believe that there is a high variability of attributes over short distances.

Prediction errors' statistics
IDW and OK are local interpolators, which means that these This strategy guaranteed that the circle's radius was small in areas where there was a high density of sample points, but larger where there was a lower density of points. Different combinations of minimum and maximum number of samples were evaluated considering prediction errors' statistics obtained through crossvalidation (Falivene et al., 2010), namely the Mean Error (ME) and the Root Mean Square Error (RMSE): (3) where z*(xα) and z(xα) are predicted and measured values at samples' locations , respectively.
To reduce data clusters' influence, the circle was also divided into equal angle sectors and different combinations of minimum and maximum number of samples were then evaluated. Best modelling parameters of local neighborhoods (Table 1) were those that provided a ME value closest to zero and a smallest RMSE value.
Those prediction errors' statistics were also used to compare the performance of the interpolation methods considered (e.g Hooshm et al., 2011;Robinson and Metternicht, 2006). ME allows checking if prediction is biased, while RMSE assesses prediction accuracy.

Exploratory data analysis
A statistical summary of pH and EC properties is presented in Table 2. Considering the classification used by Maria and Yost (2006), soil samples of Administrative Post of Cuamba vary between acid and slightly alkaline, since pH values range from 5.40 to 7.89. Only 14.7% of samples have pH values smaller than six (acid), while approximately 20% have pH values greater than 7.4 (slightly alkaline). Electric conductivity has an average value of 295.55 µS/cm.
The coefficient of variation of both soil attributes was smaller than 20%, thus their measurements have moderate variability in the study domain. However, since EC is more heterogeneous than pH, its interpolated surface might be less precise. Exploratory data analysis did not suggest the existence of potential outliers, which could mislead the spatial analysis. Although normality is neither an hypothesis nor an assumption of interpolation methods considered (e.g Goovaerts, 1997), normality of attributes' data was investigated using the Shapiro-Wilk test (Shapiro and Wilk, 1965). Considering its results (Table 2), there is evidence that both attributes are normally distributed at 5% significance level, since pvalues of the test statistic are greater than 0.05.
Strongly acidic soils are likely to have high conductivity, because soils with low pH might have high soluble salt content and therefore high electrical conductivity. Even though pH measurements did not reveal strongly acidic soils in the study region, we investigated the possible existence of a linear relationship between pH and EC using the t-test of Pearson's correlation coefficient, which was equal to 0.24. As expected, results show that there is no evidence to reject the null hypothesis of independence between the two soil properties at 5% significance level (p-value = 0.1874). This indicates that pH and EC have distinct spatial patterns.

Efficiency of interpolation methods
Interpolated surfaces of pH and EC were produced using IDW and OK (Figure 6 and 7, respectively). IDW maps exhibit more homogeneous surfaces than those of OK, thus they are less informative. This result is caused by a greater smoothing effect of the IDW method.
In areas where there is little or no data, the interpolator moves towards the overall mean, thus smoothing the prediction surface. This is not only controlled by the number of averaged neighbors, but also by the method itself and other method parameters (e.g. semivariogram in kriging) (Falivene et al., 2010). Results of prediction errors' statistics obtained by cross-validation show that OK was more efficient than IDW in the interpolation of both soil properties (Table 3), particularly in the case of EC. Both methods have a ME close to zero in pH interpolation, thus providing evidence of unbiased predictions at samples' locations. However, IDW was much more biased than kriging in the interpolation of EC, most likely because the OK estimator is the Best Linear Unbiased Estimator (BLUE) if its assumptions hold, and EC has more heterogeneous sample values than pH. Furthermore, RMSE values of OK predictions were smaller than those of IDW for both soil properties. Hence, kriging produced more accurate predictions at samples' locations than IDW. Similar conclusions were reported by Reza et al. (2010) when comparing the efficiency of those two methods in the interpolation of soil properties in Dhalai district of Tripma, India. OK also performed better than IDW for pH and EC in topsoil of a dry land sheep and cropping farm, located in Shire of Wickepin, in southwest of Western Australia (Robinson and Metternicht, 2006).

Spatial variability of soil properties
Considering the interpolation methods' efficiency results, the analysis of spatial patterns of soil properties will be based on OK maps of pH (Figure 6b), EC ( Figure 7b) and Land cover ( Figure 8).
As expected from exploratory data analysis, most of Administrative Post of Cuamba has slightly acidic (6.1<pH<6.5) and neutral (6.6<pH<7.3) soils, where the former are mainly located in the northwest and southwest (Figure 6b). These areas are covered by closed broadleaved deciduous forests and closed to open shrub land, respectively.
Additionally, patterns of acidic soils (pH < 6) are mainly located in the southwest of the study region. This area is mainly characterized by closed to open shrub land and crop lands (Figure 8). However, cropland area is very small compared to other classes of land cover, implying that a large area with favorable pH conditions for agricultural production was unexplored. Five slightly alkaline areas (pH > 7.4) were scattered in the center of the study region, with open broadleaved deciduous forest as major land cover.
Soil pH helps decision makers and farmers to identify different kinds of chemical reactions that can occur in a specific soil. Toxic elements such as aluminum and manganese are a major cause for crop failure in acidic soil. This is because these elements are more soluble at soils with low pH. Several plants grow best when pH is between 7 and 9 (neutral and alkaline soils). Lower and higher pH levels, particularly lower than three and higher than 11, dramatically reduce plant growth. All plants are affected by extremes of pH, but each plant has its own  level of tolerance to acidity and alkalinity. However, 5.8 to 7.2 pH levels contributes for normal yield for most vegetables such as lettuce, tomato, cucumber, cabbage, strawberry, carrot, pea and pepper (Lake, 2000). On other hand, to get maximum yields for legumes and cereals, a pH ranging from 6.0 to 7.0 is desirable as it permits optimum availability of nutrients in soil. Lake (2000) mentioned that, in general, a soil pH of 5.2 to 8.0 provides optimum conditions for most agricultural plants.
EC of soils varies depending on moisture amount held by soil particles, and it can be measured in a scale of 0 to 1000 ms/ m. It has been used at field scales to evaluate different anthropogenic properties (Corwin and Lesch, 2005). For example, Corwin et al. (2003) reported that crop yield inconsistently correlates to EC because of influence of soil properties (salinity, water, texture, among others). Wiatrack et al. (2009) mentioned that EC is an important characteristic used for mapping spatial variability of soils within a production field, and that showed a correlation with other soil properties such as organic carbon, cation exchange capacity (CEC) and depth of topsoil.
High values of EC of soils are commonly found in clayey soils. Highest values (Figure 7b) were found in northcentral and southeast regions. According to the land cover map (Figure 8), these areas are characterized by closed broadleaved deciduous forest, closed to open shrub land and mosaic vegetation (croplands). Type of soil (clay), which is representative at Administrative Post of Cuamba (Marques et al., 2001), has the advantage of nutrient retention by using slow-realize mineral fertilizers and this type of soil is good for growing vegetables like beans, potato, cabbage, pea.
In summary, the southeast region of the study area has favorable pH, which is good for growing most food crops. Moreover, this area is characterized by clayey soils and high EC, which are favorable for nutrient retention. Therefore, the southeast region has a high fertility potential for food crops.

Conclusions
This study aimed at analyzing the spatial variability of pH and EC in Administrative Post of Cuamba, Mozambique. Interpolated surfaces of these soil properties were produced using IDW and OK. The latter was less biased and more accurate at samples' locations, thus exhibiting a better performance than IDW. Therefore, the spatial variability analysis of soil properties was based on OK maps. Results show that soil has favorable pH in most of the study domain, but a large area in the south is still unexplored for agricultural production. Moreover, areas identified with acidic soils can become more productive by artificial liming (i.e by the application of calcium-rich materials). Alternatively, crops that are more tolerant to acidity can be grown in such areas. For example, the pigeon pea (Cajanus cajan, L., Millsp.) crop is increasingly being used in Mozambique, because it is moderately tolerant to acidic conditions which is drought resistant due to deep rooting (Rocha et al., 2017).