Spacial and temporal characterization of water quality in the Cuiabá River Basin of Central Brazil

1 Programa de Pós-Graduação em Recursos Hídricos Universidade Federal de Mato Grosso, Campus de Cuiabá, 78060-900 Cuiabá-MT, Brazil. 2 Secretaria de Estado do Meio Ambiente – Mato Grosso, Brazil. 3 Departamento de Solos e Engenharia Rural, Universidade Federal de Mato Grosso, Campus de Cuiabá, 78060-900, Cuiabá-MT, Brazil. 4 Instituto Federal de Educação, Ciência e Tecnologia de São Paulo, Campus Matão, 15991-502, Matão-SP, Brazil.


INTRODUCTION
Decreasing quality and quantity of water resources have become a global concern, even in countries with high water potential, such as Brazil (Colletti et al., 2010).The high concentration of the urban population in Brazil has caused numerous environmental problems for water resources, including contamination of rivers by domestic and industrial sewage, flooding caused by improper occupation of flood plain, inadequate management of urban drainage, and lack of collection and adequate disposal of urban waste (Tucci et al., 2000).
The Cuiabá River, located in Central-western portion of Brazil, is a waterway of major importance for multiple uses, such as water supply and wastewater dilution.This river is one of the main tributaries of Northern Pantanal of *Corresponding author.E-mail: sergiobatista@outlook.comAuthor(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License Mato Grosso State, the largest floodplain in the world (Zeilhofer et al., 2006).Water use conflicts, changes in aquatic biota and degradation of the water quality of the aquatic environment of this river have previously been described (Figueiredo and Salomão, 2009;Laabs et al., 2002;Zeiholffer et al., 2006).
Water quality is a concern because water from this river supplies approximately one third of the population of the state and has a significant impact on the complex, rich Pantanal ecosystem.To monitor water quality, a systematic monitoring program is required to obtain accurate estimates of variations in surface water quality due to variations in physical, chemical, and biological parameters caused by seasonal effects and land use in the basin (Simeonov et al., 2003;Toledo and Nicolella, 2002).However, large amounts of data obtained in systematic studies can be difficult to interpret.Matrices with multiple dimensions are usually generated, but the most relevant environmental informa-tion is included in only a small number of variables, while the remaining variables add little to the interpretation of the results in terms of quality (Vidal et al., 2000;Simeonov et al., 2003;Toledo and Nicolella, 2002;Andrade et al., 2007b;Felipe-Sotelo, 2007).
Multivariable statistic analysis provides an alternative approach to understand the water quality of study region and identify the pollution source apportionments (Wu et al., 2009).Among these tools, factor analysis and hierarchical cluster analysis have become increasingly widespread due to the diffusion of chemometric (science of extracting information from chemical systems by datadriven means) knowledge and the availability of software that significantly reduces the labor of calculation (Mas et al., 2010).
Several studies have addressed the evaluation of water quality data using multivariate analysis in water bodies around the world (Brondnjak-Voncina et al., 2002;Felipe-Sotelo et al., 2007;Mendiguchía et al., 2004;Kazi et. al., 2009;Pejman et al., 2009) including Brazil (Silva and Sacomani, 2001;Zeilhofer et al., 2006;Palácio et al., 2009, Alexandre et al., 2010;Pereira-Filho et al., 2010;Palácio et al., 2011).This study aimed to evaluate the water quality variables responsible for the spatial and temporal variation of water quality as well as the similarities and dissimilarities among stations on the River Cuiabá in the hydrological year 2010/2011 that could be affected by urban effluent discharge and diffuse pollution.

Study area
This study was performed in the Cuiabá River basin, a sub-basin of the upper Paraguay River basin.In the section that was investigated, which has a size of approximately 680 km, the river passes through several cities, including the main municipalities in the basin, Cuiabá, the capital city of Mato Grosso, and Várzea Grande, which has a population of more than 800 million inhabitants.Before passing the urban areas of Cuiabá and Várzea Grande, the river runs through areas of livestock, agriculture, protected areas, and small towns of low population density.In the urban area of Cuiabá, approximately 70% of the domestic effluent is discharged into the river without any treatment.After Cuiabá, there are areas of livestock and agriculture and point sources of domestic sewage discharge from the urban areas of St. Antônio do Leverger (approximately 18,500 inhabitants) and Barão de Melgaço (approximately 7,600 inhabitants).At the end of its route, the Cuiabá River enters the Pantanal region.
Water samples were collected from 13 sampling stations (Table 1) at which water quality has been monitored by the Environmental Department of Mato Grosso State (SEMA/MT) since 1995.These monitoring stations are located on stretches of the high, medium, and low Cuiabá River (Figure 1).

Sample collection and treatment
We performed six samplings campaigns, with samples taken in triplicate bimonthly between July 2010 and May 2011.This period corresponded to an annual hydrological cycle with sampling occurring in: rainy (October to April) and dry seasons (May to September).Total precipitation in the period ranged from 17.2 mm in July -September 2013, 1484 mm in October 2010 -March 2011, and 188 mm April -June 2011.Water samples were collected approximately 20 cm below the water surface using polyethylene bottles with capacities of one litre (this sample was preserved with addition of solution of 50% sulfuric acid until pH < 2) and two litre (this sample was not chemically preserved).Samples for bacteriological analyses were collected using sterilized 100 ml plastic bags.After sampling, the samples were transported under refrigerated conditions to the laboratory of SEMA-MT, where the analyses were performed within 24 h.The sampling methodology was based on the Standard Methods of Examination of Water and Wastewater (American Public Health Associaton, 2005) and on the Water Sample Collection Guide (CETESB, 1988).
The field measurements for the parameters pH, water temperature, EC, DO, salinity, TDS, and ORP were conducted with an HI 9828 HANNA Multi-parameter Meter equipped with a pH/ORP/DO/EC/Temperature HI 769828 sensor with a 20 m cable.Analyses were performed using ultrapure water produced with a Milli-Q Advantage Ultrapure Water Purification System.All reagents used were analytical grade (J.T. Baker®, Chemis® and Synth® brands).The spectrophotometric readings for the parameters COD, total nitrogen, total phosphorus, orthophosphate, nitrate nitrogen, nitrite nitrogen, and sulfate were obtained with a DR 5000TM UV-Vis Hach spectrophotometer.
The color measurements were acquired with an Aquacolor Cor PoliControl portable colorimeter.Cation analyses (sodium, potassium, magnesium, calcium, and ammonia nitrogen) were performed using an ICS-1000 Dionex ion chromatograph with an analytical column (Dionex CS16, 3 mm, coupled to precolumn CG16, 3 mm), a 25 ml loop, 20 mM sulfuric acid eluent solution (H2SO4, J.T. Baker brand), an eluent flow rate of 0.36 ml min -1 , and a 2 mm CSRS suppressor 300 (Cation Self-Regenerating Suppressor).Unpreserved samples were filtered (0.45 µm pore size) before injection (1 ml) into the ion chromatograph.Microbiological analyses were performed according to the enzyme substrate method, using Colilert ® culture medium and IDEXX ® cards.

Data treatment
The censored data, below the method's limits of detection (LOD) were replaced by values corresponding to half the LOD (LOD/2) according to procedures proposed elsewhere (Farham et al., 2002) and used by others (Navarro et al., 2006;Terrado et al., 2007;Felipe-Sotelo et al., 2008).After replacing the censored values, an analysis of the missing data was performed to evaluate the implementation of corrective actions to enable factor analysis, such as deletions of variables or cases.
To detect atypical observations, the data were converted into standard scores (Z scores) from the univariate perspective, identifying the observations with scores higher than 3.For multivariate detection, the Mahalanobis D² was calculated, identifying those cases with values (D²/df) above 3.0, where df is the number of present variables (Hair et al., 2009).A Pearson's correlation matrix was elaborated to verify whether the data matrix had a sufficient number of correlations to justify the application of principal component analysis.Bartlett's test of sphericity and the adequacy test applied to the Kaiser-Meyer-Olkin (KMO) model were also performed.The sphericity test examines the matrix as a whole and demonstrates that the correlation matrix presents significant correlations among the variables.In the sphericity test, values that have significance (p <0.05) indicate the existence of a number of correlations among variables, thus permitting the continuation of the analysis (Hair et al., 2009).The KMO test shows normalized values (between 0 and 1) and the proportion of variance that the variables have in common or the proportion due to common factors.Values less than 0.5 indicate that the method of factor analysis is inappropriate for the treatment of the data (Dziuban and Shirkey, 1974).The descriptive statistical as well as multivariate analyses were performed by employing (Statistical Package for Social Sciences (SPSS) version 19.0.

Factor analysisprincipal component analysis (PCA)
After processing the data, PCA was performed.For the PCA, in this study, there were no necessity to standardize the data.The analysis was performed by transforming the correlation matrix, by estimation, into a factor matrix containing factor loadings for each variable in each factor obtained.The loads of each variable on the factors were then interpreted to identify the latent structure of the variables (Hair et al., 2009).
The latent root criterion was chosen to define the number of factors to be extracted, and all factors with eigenvalues greater than 1 were considered significant.This criterion is more reliable when the number of factors to be extracted is between 20 and 50, as in the present study (Hair et al, 2009).In this study, the VARIMAX method (orthogonal rotation) was applied to normalized data using the evaluation of the spatial and temporal variability of water quality (Andrade et al., 2007b;Palácio, 2011).Statistical software was used to obtain the rotated component matrices as well as the scores, loadings, and eigenvalues (eingenvectors).

Hierarchical cluster analysis (HCA)
In this study, the HCA aimed to observe the similarities and dissimilarities among sampling stations, with the goal of clustering in each group the collection stations with similar characteristics based on the water quality data to determine which characteristics are consequences of natural environmental conditions and of land use and occupation in the Cuiabá River basin (Figure 4).To perform cluster analysis, data were standardized into standard scores (Z scores).The squared Euclidean distance was used in the HCA.Average linkage clustering and Ward's method were used as hierarchical cluster algorithms (Hair et al., 2009;Palácio et al., 2011).
To define the optimal number of clusters, the percentage variation for heterogeneity (Palácio et al., 2009;Alexandre et al., 2010;Palácio et al., 2011), with the determination of the clustering coefficient in SPSS, was adopted as a stopping rule because large percentage variations in the coefficient are used to identify stages of cluster combinations that are significantly different (Hair, 2009).

Descriptive statistics and data correlation
After data treatment, the original matrix (234 cases x 31 variables) was reduced (201 cases x 28 variables).Chemical oxygen demand, fluoride and suspended solids were removed to obtain a complete matrix that could be subjected to principal component analysis.Variables exhibiting high values of standard deviation, such as turbidity, total residue and total coliforms (Table 2), displayed variability during the sampling period, which was mainly attributed to the variation in the precipitation during the study period.These variables are influenced by the amount of solid material that is carried to the river in the runoff process.Other variables such as temperature, pH, salinity, DO, BOD, and ammonia nitrogen presented very similar mean and median values, indicating an apparent symmetry of the data distribution, which can also be observed from the low values of asymmetry.
A large number of significant correlations (p = 0.05) among the variables were identified (Table 3), but due to the large number of observations, the critical value of r was low (0.196 for α = 0.05).Because the significance test for r proves only that the significant correlations differ from zero, it is important to focus on correlations that are greater than 0.5 (p = 4.4  10 -5 ), as indicated by Helena et al. (2000) and Andrade et al. (2007b).
According to the Brazilian legislation (CONAMA Resolution nº 357/2005), in raining season there were values in parameters like turbidity, total phosphorous and Escherichia coli above the maximum limits, mainly in the sampling sites near the urban area.Sewage effluents from industrial and domestic sources and surface runoff of the waste could explain the increase of these parame-ters values.Dissolved oxygen presented lower values during rainy season, only in the sampling sites Barão de Melgaço and Porto Cercado due to the increasing the organic matter near the Pantanal Floodplain.
Among the observed correlations, there were associations between salinity and conductivity (r = 0.98), salinity and total dissolved solids (r = 0.99), and total dissolved solids and conductivity (r = 0.99); these variables are strongly correlated because they represent the presence of salts in the water.These salts may be natural due to the geological and soil conditions in the region, particularly in the upper course of the river.The salts may also have anthropogenic origins, such as the silting process in the water bodies or the diffuse pollution that occurs in the basin.Similar results were found in a study developed in the basin of the upper Acarajú, Ceará (Andrade et al., 2007b).From this correlation matrix, KMO (value of 0.773) and Bartlett's sphericity (approximate chi-square of 6526.300,significance of 0.000) tests revealed that there were sufficient correlations to apply the factor analysis.

Factor analysis -PCA
Considering the latent root criterion, 7 components were retained, which were composed of 19 of 28 variables from the original data matrix.The 7 components combined explained approximately 75.74% of the total variance.
In this matrix (Table 4), factor loadings represent represent the degree of association (correlation) of each variable with each factor.The 1 st component explains the greatest amount of variance, and this factor is composed of 7 variables containing high loads (greater than 0.50).Factor loadings of ± 0.50 or greater are considered practically significant (significance level α = 0.05 and power level of 80% for sample size equal to or greater than 120), and loads greater than 0.70 are considered indicative of defined structure (in this case, the factor explains 50% of the variance of the variable).
The 1 st and 2 nd components together explain approximately 44.49% of the variance of the data.The 1 st component (PC1) was assigned to the variables TDS, conductivity, salinity, calcium, alkalinity, hardness, and potassium.This 1 st component is basically a component of dissolved salts, particularly potassium and calcium salts.The occurrence of similar variables in the 1 st component and the correlation of these variables with the natural process of weathering of the geological components of the soil have also been reported in other studies (Singh et al., 2004;Andrade et al., 2007b).
The 2 nd component (PC2) of this study was attributed to the parameters turbidity, sulfate, and color, which were positively correlated with the factor, and to the parameters phosphate and nitrite, which were negatively correlated.Samples collected during the rainy season are related to the positive values of component 1 (Figure 1) and to variables with negative loadings in component 2 (PC2), while samples collected during the dry season are related to the variables with positive loadings in component 2 (PC2).Observing the sample distribution as a function of the collection station in Figure 2, the highest scores recorded for the 1 st component are related to Station 1 (Marzagão).This station is located in the municipality of Nobres, a region with geological records (the Araras, Raizama, and Puga Formations) that are composed mainly of sediments and rocks rich in calcium and magnesium carbonates, such as sandstones (Figueiredo and Salomão, 2009).When incorporated in the water, these compounds increase the concentration of calcium ions and carbonates, increasing the hardness and alkalinity and, therefore, the conductivity and salinity.
During the rainy season, there is an increase in turbidity and color due to surface runoff.This runoff is aggravated by the removal of riparian vegetation in many parts of the basin and by localized processes of siltation.Large amounts of sediments and organic matter reach the tributaries and flow into the Cuiabá River, which results in increased color and turbidity.The transport of sediment in the Cuiabá River basin has already been reported, and the increase in sediment transport was attributed to urbanization, with sediment deposition in the surroundings of St. Antônio do Leverger, at the beginning of the Pantanal Plain (Figueiredo and Salomão, 2009).
Sulfate can be derived from anthropogenic activities, mainly from fertilizers and water and wastewater treatment, which occur closer to the urban areas of Cuiabá and Várzea Grande.The 3 rd component, which accounted for 9.60% of the explained variance, can be attributed to the discharge of domestic and industrial effluents and the leaching of organic matter from livestock areas in the sub-basin because this component was composed of the variables E. coli and ammonia nitrogen.These variables presented high values, mainly at the Marzagão Station (Station 1), the stations in the urban areas of Cuiabá and Várzea Grande (Stations 7, 8, and 9), and the St. Antônio do Leverger station (Station 10).
The other components each explain approximately 3 to 7% of the total variance.The 4 th component is positively related to the variable chloride and negatively related to the variable ORP.The influence of chloride is related to domestic effluent discharges in urban areas (Helena et al., 2000;Andrade et al., 2007a).The increase in the concentration of chlorides (highly reactive chemical species) decreases the ORP (which indicates which half of the chemical species is reduced).The 5 th component can be explained by the presence of limestone rich in magnesium, particularly in Província Serrana, where the upper course of the Cuiabá River is located (Figueiredo and Salomão, 2009).The 6 th and 7 th components are represented by the variables total phosphorus and nitrate.Both variables are indicative of agricultural and urban pollution processes (Andrade et al., 2007a;Campanha et al., 2010;Pereira-Filho et al., 2010).However, because they are different components, they may reveal different aspects of this pollution, as nitrates more closely represents point sources, while total phosphorus represents diffuse sources or a different combination of both sources.
The results obtained here to explain the influence of organic pollution near the urban stations of Cuiabá River and the improvement of the water quality downstream from the urban area of Cuiabá and Várzea Grande are similar to those of a previous study (Zeilhofer et al., 2006).Considering the 17 parameters that were employed in our PCA matrix, the weight of organic pollution (discharge of domestic and industrial effluents) was minimized due to the influence of dissolved salts and surface runoff.The absence of the parameter COD also explains some of the differences among the results of our study and those of previous studies (Zeilhofer et al., 2006).

HCA
After analyzing the diagrams obtained, the dendrogram obtained with Ward's method presented a greater segregation of clusters in the middle stages and was therefore selected for the HCA.Evaluating the percentage difference in the clustering coefficient to estimate the number of clusters formed (stopping rule), the highest percentage difference was observed to occur between stages 11 and 12 (Table 5).However, solutions of a cluster resulting from the union of 2 very heterogeneous groups presented high values for the stopping rule that are unacceptable (Hair et al., 2009).Thus, the increase registered between stages 7 and 8 was considered to be the highest, with a variation of 30.57%.The cutoff mark was the distance at approximately 225 in the dendrogram obtained by Ward's method (Figure 3), resulting in 5 groups.
Station 1 (Marzagão) differs from the other stations in the dendrogram (Figure 4).There were significant variations in the values of turbidity, color, and total phosphorus, as well as in the concentrations of dissolved salts, as indicated by PCA.These variations, which are most likely attributable to seasonality, contribute to the characterization of a unique station (Figure 5).
The beginning of the operation of the APM-Manso water reservoir in 2001 regulated the water flow in the Cuiabá River.Station 1, which is located upstream from this reservoir, is not affected by this regulation, which has a strong influence on the water quality parameters at sites located downstream from the reservoir, mainly Stations 2-6.At these locations (Stations 2-6), the values of the analyzed parameters were low in comparison to those of other stations.These results are a reflection of the land use and occupation in this stretch, which still has significant conservation of riparian forests and a low degree of anthropization along the banks.
Proximity to point sources of effluent discharge influenced the water quality parameters at Stations 7 and 8, which are located downstream from 3 polluted tributaries that serve as a flow channel for the untreated effluents from Urban Agglomeration.There are also considerable industrial effluent discharges as well as solid waste disposal along this stretch, which suggests that among the uses of this water body, the dilution of effluent is prominent (Zeilhofer et al., 2006).It can be explained by the influence observed in PCA related to component 3 due to high values of E. coli and ammonia nitrogen measured in these stations.Stations 9 to 12 are located along a stretch of the Cuiabá River located downstream from the Urban Agglomeration.
These stations presented high concentrations of organic matter but lower concentrations than those recorded in the urban area, which may indicate autodepuration of the organic load coming from the Urban Agglomeration of Cuiabá and Várzea Grande.This phenomenon may occur due to a reduction in the quantity of point sources of effluent discharge in this stretch as well as the small contribution from diffuse sources of pollution, as there are few human activities with significant impact potential in this region.
Station 13 (Porto Cercado) differs from the other stations in this cluster solution because it is located within the floodplain in the Pantanal of Mato Grosso.Among the results presented for this station, low concentrations of DO and low levels of coliforms are highlighted.The increase in organic matter is significant, particularly during the floods in this plain, due to sedimentation of solid particles caused by the decrease in slope, which occurs downstream of the town of Santo Antonio do Leverger (Figueiredo and Salomão, 2009).The stretches indicated by the HCA are consistent with the spatial gradient indicated in another study (Figueiredo and Salomão, 2009), in which 4 distinct stretches were indicated: I) from Cuiabazinho Springs to Manso Junction (where Station 1 is located); II) after Manso until the beginning of the urban areas of Cuiabá and Várzea Grande (Stations 2 and 6); III) urban areas (Stations 7 and 8); and IV) Pantanal (Station 13).However, the present study suggests a better segmentation of the stretch that begins after the urban area along the Cuiabá River.This study considers a stretch that begins downstream from the urban areas of Cuiabá and Várzea Grange and extends to the urban area of Barão de Melgaço (Stations 9-12) and a stretch that begins within   the floodplain due to the peculiar behavior of the analytical parameters presented at Station 13.These results will contribute to improved management of the environmental monitoring of the water quality of the studied basin by the Department of Environment of Mato Grosso State (SEMA-MT) and may be useful for a better understanding of other watersheds.

Conclusions
PCA identified 7 components composed of 19 variables that are responsible for the variation in the data from the Cuiabá River basin and that together accounted for 75.74% of the data variation.The 1 st axis (29.08%), composed of 7 variables, represents the component dissolved ions, thus reflecting the natural process of weathering of geological components of the soil, particularly in the stretch of the upper Cuiabá River that is characterized by regions containing limestone rocks.The 2 nd component (15.42%), composed of 5 variables, was defined by a component associated with human activities that produce diffuse pollution (agricultural and urban) and point sources of effluent discharge.Samples of this subbasin presented a wide variation in color and turbidity parameters, influenced mainly by the carrying of sediments, which occurs principally during the rainy season, aggravated by the processes of human occupation in the basin.
HCA revealed the existence of 5 clusters of stations that present homogeneous characteristics.Stations 1 and 13 (Marzagão and Porto Cercado) present a high degree of dissimilarity when compared to the other stations and are considered unique clusters.Three other homogeneous groups were identified: II) after the mouth of the Manso River until the beginning of the urban areas of Cuiabá and Várzea Grande (Stations 2-6); III) the urban area (Stations 7 and 8); and IV) after the urban area until the vicinity of Pantanal (Stations 9-12); these groups characterize regions of low, high, and moderate pollution in the basin, respectively.The water quality in Cuiabá River is influenced mainly by natural phenomena, but processes guided by urban agglomerations and diffusion pollution in basin are increasing and contributing for quality degradation.This kind of information is important to improve water management by environmental state agency and regulate the occupation and wastewater discharge in basin.

Figure 1 .
Figure 1.Location of the sampling stations in the Cuiabá River basin.

Figure 3 .
Figure 3. PCA biplot (scores and variables), classified according to the collecting station.

Figure 4 .
Figure 4. Dendrogram of the 13 sampling stations studied in the Cuiabá River basin.

Figure 5 .
Figure 5. Sampling stations classified according dendogram in the Cuiabá River basin.

Table 1 .
Locations of the sampling sites in the Cuiabá River basin.

Table 2 .
Statistical description of the chemical, physical, and biological parameters determined in the 13 stations along the Cuiabá River basin.

Table 3 .
Binary correlations between the studied variables determined in the Cuiabá River basin.

Table 4 .
Stopping rule for the HCA -Ward's Algorithm Method.

Table 5 .
Explained variance and component matrix of PCA (Varimax rotation) applied for 28 chemical, physical, and biological parameters determined in the 13 stations along the Cuiabá River basin.