Categorical statistical approach to satellite retrieved rainfall data analysis in Nigeria

This paperwork describes a comprehensive statistical assessment of rain gauge and satellite- based monthly and annual rainfall measurements over Nigeria during the period of 2001 to 2010. Two statistical methods were empolyed for inter-comparison and validation of the Tropical Rainfall Measuring Mission - TRMM 3B43 V7 rainfall product. TRMM was selected because of the recently developed algorithms to estimate 3D rain distribution from the visual spectrum radiance, radar and microwave sensors. The results of the continuous statistical assessment of the rainfall algorithms show good agreement with rain gauge measurement in term of correlation coefficient and improved mean error tests. The geometric mean of correlation coefficient from all the locations for annual, monthly, wet and dry period are 0.43, 0.79, 0.64, and 0.41 respectively. The categorical analysis assessment is based on the International Telecommunication Union for Radio-communication (ITU-R) recommended threshold for radio propagation. The results of Accuracy and FBI for ITU-R recommendation threshold of 1 to 10% percentage error for radio propagation applications are 0.629 and 0.901 for annual while that of monthly are 0.558 and 0.416. The overall performance of the TRMM based rainfall assessment is encouraging but more improvement is still needed for accurate and sufficient global rainfall estimation. 
 
 Key words: TRMM 3B43 V6, satellite data, cross-validation, categorical statistics, radio propagation, ITU-R.


INTRODUCTION
Advancement in satellite technology and remote sensing techniques have brought about the development and launching of space borne measuring instruments for continuous monitoring of rainfall and its associated parameters in space and time.To a certain extent, a number of meteorological weather satellites have been launched in the last few decades and some of these satellite rainfall products are freely available in real time on the internet via the web or File Transfer Protocol (FTP) such as PERSIANN, CMORPH, and GSMAP etc. Satellite based precipitation estimates provide greater spatial coverage with higher temporal frequency than many of the current rain gauge networks.
Satellite rainfall estimation coupled with ground validation and calibration, offers a good view for an accurate and global rainfall database particularly for *Corresponding author.E-mail: rosmiwati@ieee.org.Tel: +604-5996056.Fax: +604-5941023.
remote areas and oceans (Arkin and Xie, 1994;Chandrasekar et al., 2008).However, coarse spatial resolution of many of the satellite precipitation products currently available is one of the hindrances to their prevailing adoption in many applications and the fact that the precipitation products are derived indirectly from either thermal or microwave radiance observations makes them less acceptable to direct measurements taken via rain gauge.Nevertheless, satellite data remain the most valuable source of information on clouds and rainfall, most importantly over water bodies where conventional rain gauge measurement is practically impossible (Todd and Bailey, 1995).Recent research has shown that two-third of the world precipitation falls in the tropics and that the region is covered with 75% of oceans and seas (NOAA, 2013).Consequently, precipitation over tropics can only be best quantified through the use of satellite weather monitoring instruments.
In recent studies, several satellite retrieval rainfall products have been subjected to cross-validation tests over many regions to ascertain the accuracy of their rainfall estimations.The performance of satellite precipitation estimates over land areas has been reported to be highly dependent on the rainfall regime and the temporal and spatial scale of the retrievals (Ebert, 2007).The continuous statistical approach to validation studies of satellite retrieval algorithms in the work of (Ji, 2006;Wagner et al., 2008;Omotosho and Oluwafemi, 2009;Jianxin and David, 2009) has pointed out good agreement between satellite and ground rainfall data at different percentage bias errors which ranges from 1 to 15%.
In the validation study reported on Nigeria rainfall estimate by (Omotosho and Oluwafemi, 2009), TRMM 3B43 V6 among other satellite rainfall products presented by TRMM is found to be in agreement with ground measurement.However, in the study carried out on East Africa validation examination on NOAA-CPC Rainfall Estimator known as RFE, the rainfall estimation is found not to be in agreement with in-situ rain gauge data over a study region in Ethiopia (Dinku et al., 2007).The study specifically observed the significant role of topography in satellite rainfall estimation.The algorithms are reported to be incapable of detecting orographically induced rainfall, which does not imply that the algorithms are unable to detect the daily occurrence of precipitation, but however, it leads to a low detection rate relative to magnitude of rainfall on a daily basis.Therefore, continuous statistical analysis has been reported not to be a good approach in satellite retrieval algorithms inter-comparison performance evaluation (Matina et al., 2006).
Although TRMM precipitation products have been extensively validated at ground sites around the world (Adler et al., 1994;Ferraro and Marks, 1995;Petty and Krajewski, 1996;Tsintikidis et al., 1997;Conner and Petty, 1998;Hsu et al., 1999;Mircea and Emmanouil, 2001), very few of these sites lie in Africa.Thus, an explicit validation for Africa is necessary to ensure confidence in the TRMM estimates for the region.Hence, in this study, comprehensive categorical statistical analysis is done on satellite-ground rainfall measurement over Nigeria by comparing TRMM 3B43 V7 rainfall product with rain gauge data.Owing to lack of sparingly deployed rain gauge networks and inadequate controlled measurement of rainfall data, the analysis is limited to state capitals of the country where measured rainfall data are available.

STUDY AREA
Precipitation measurements in Nigeria are inadequately measured and controlled as a result of its sparingly populated rain gauge networks.In fact, most remote areas are not covered by precipitation measuring instruments.With the advent of satellite technologies and applications such as broadband internet facilities, satellite televisions and GSM for mobile communications, accurate measurement of rainfall is very crucial.Rainfall has long been recognised as one of the atmospheric effects that has serious impacts on radio wave propagation (Aydin and Daisley, 2002;Pratt et al., 2003;Mandeep et al., 2008).The effect becomes more important as frequency of operation increases especially above 12 GHz.Therefore, for a reliable and efficient communication design and many other applications, precipitation measurements are needed to be adequately quantified and controlled.In view of this, satellite based rainfall estimation which offers greater spatial coverage with higher temporal frequency than ground measurement is becoming more popular.Hence, need to evaluate the performance of the satellite rainfall retrieval algorithms over Nigeria.

Ground measurement data for Nigeria
Nigeria lies wholly within the tropical zone, between latitude of 4°N and 14°S and longitudes of 2°W and 15°E.The country is divided into six regions: North-West, North-East, North-Central, South-West, South-East and South-South.There are wide climatic variations for the regions of the country.The rainfall event and its associated rain accumulation increases from North-east region down to the South-south region.In Nigeria, there are two distinct seasons: a wet season from April to October, with generally lower temperatures, and a dry season from November to March, with midday temperatures of 38°C or more.In the coastal and southeastern portions of Nigeria, the rainy season usually begins in February or March as moist Atlantic air, known as the southwest monsoon, invades the country.The peak of the rainy season occurs through most of northern Nigeria in August, when air from the Atlantic covers the entire country.In southern regions, this period marks the August dip in precipitation known as August break.Sometimes around November of the year, moist air from the Atlantic converges with hot, dry and often dust-laden air from the Sahara locally known as the Harmattan.
The study locations are divided into six different regions according to the country's classification and each region is represented by its state capitals.The location of the rain gauges as represented by its state capital is as shown in Figure 1.
Ten-year rain data from 2001 to 2010 are collected from network of rain gauges of Nigeria Meteorological Centre (NiMet), Osodi, Lagos.The missing data in this study is filled using Coefficient of Correlation Weighting Method (CCWM) as defined in Equation 1.
where, m P is the missing precipitation value to be interpolated (which also represents precipitation at the base station m); n is the number of stations; i P is the precipitation at station, i and 2 mi r which is the ratio of covariance of two data sets to the product of standard deviations of data sets (Ramesh et al., 2005).

TRMM based data for Nigeria
The Tropical Rainfall Measuring Mission (TRMM) satellite was the first weather space borne measuring instruments launched in November 1997 (Kummerow et al., 1998).It is designed to take measurements of rainfall and its associated parameters in the tropics, but since the orbit has been increased to 403 km altitude in August 2001, the swaths of TRMM Microwave Imager (TMI) now covers up to 50° North and South Pole.The details of TRMM missions and instruments on the satellite can be found in (Kummerow et al., 1998).The basic TRMM sensors are passive microwave radiometers known as the TRMM Microwave Imagers (TMI), a microwave radar unit called Precipitation Radar (PR) and a visible and infrared radiometer tagged Visible and Infrared Scanner (VIRS).These sensors are combined together with some other infrared (IR) and gauge measurement to produce rainfall products such as TRMM 3B42 V7, 3B43 V7, TMI 2A12, and many others.
In this work, the TRMM space borne rainfall sensor is employed because of the newly developed algorithms to estimate the 3D rain distribution from the visual spectrum radiance, radar and microwave sensor on the basis of multi-sensor approach for precipitation analysis.The TRMM rainfall product is a real time data logging system for which data set is produced six to nine hours after acquisition.TRMM 3B43 V7 database is a combination scheme for precipitation estimates from different satellite measuring instruments such as microwave infrared, radar data and rain gauge measurements.The algorithm is based on the concepts of Huffman (Huffman et al., 1995).The TRMM 3B43 V7 is a monthly gridded rainfall estimate with spatial resolution of 0.25° by 0.25°, which is a product of rain archives of different estimates (NASA, 2011).The area covered by one tile of 0.25° by 0.25° correlates to an area of approximately 25 km by 25 km in Nigeria.The topographic feature of the locations considered is shown in Table 1.

STATISTICAL METHODS
In this paper, there are two statistical methods employed: the continuous and categorical approach.Categorical statistical approach has been reported to be the most appropriate method used in precipitation retrieving algorithms validation (Matina et al., 2006).The details of both methods are explained in "continuous statistical method" and "categorical statistical method" part of this work.

Continuous statistical method
A number of continuous statistical measures are employed to determine the ability of TRMM 3B43 V7 algorithms to accurately identify and quantify the magnitude of rainfall and to detect its associated systematic errors.The statistical measures for the performance assessment are as expressed in Equations 3 to 7.
Where i SD is the TRMM estimated precipitation, i GD is the corresponding ground measurement, N is the number of data point, i E is the mean absolute error,  is the standard deviation and cov is the covariance between the ground and satellite data.
The Bias or Mean Absolute Error (MAE) estimates the average difference between the satellite and gauge values while Fitness measures the accuracy of the prediction algorithm.The values of fitness measure ranges from 0 to 1, with value of 1 corresponding to an ideal estimate (Wilks, 1995).Improved Symmetric Mean Absolute Percentage Error (SMAPE) is an accuracy measure based on percentage (or relative) error (Armstrong, 1985;Flores, 1986).It measures the symmetric difference between the estimated and measured values.Root Mean Square error (RMSE) measures the average magnitude of the errors with a focus on extreme values while Pearson's correlation coefficient SG r evaluates the degree of linear association between the two datasets (Ebert and McBride, 2000).

Categorical statistical method
The 2 × 2 contingency table is employed to define categorical measures for the verification of satellite algorithms estimate against rain gauge measurement.A contingency table is a scatter plot for a categorical variable.The entries of the table simply represent a convenient presentation of the raw verification dataset from which many statistical inferences can be drawn (Murphy, 1993(Murphy, , 1995)).To verify the accuracy of the algorithms estimate, four combinations between the estimate and the measured data are used.These are: Hit, Miss, False alarm and Null.The total numbers of measured and algorithm's estimated rainfall occurrences and non occurrences are given on the lower and right sides of the contingency table, and are called the marginal distribution.Hit represents the event estimated to occur and did occur, Miss describes the event estimated not to occur but did occur, False Alarm evaluates the event estimated to occur but did not occur and Null represents the event estimated not to occur and did not occur.The contingency table is shown in Table 2.The validation analysis is based on the following: (i) If the percentage bias error is within the threshold and it is positive, allocate "Hit" (ii) If the percentage bias error is within the threshold and it is negative, allocate "False Alarm" (iii) If the percentage bias error is more than the threshold and it is positive, allocate "Misses" (iv) If the percentage bias error is more than the threshold and it is negative, allocate "Null" The following expressions are computed from the contingency table to describe the particular aspects of TRMM algorithms performance relative to frequency of detection.The probability of detection (POD) measures the percentage of real precipitation events that are correctly detected by the satellite algorithms, the false alarm ratio (FAR) measures the fraction of false alarms in the satellite estimates, the Accuracy estimates the percentage of the estimated values that are correctly predicted while the frequency bias index (FBI) is the ratio of satellite rain estimates to the actual precipitation events (Scheel et al., 2011).The percentage bias threshold recommendation for meteorological applications is from 1 to 20%.However, 1 to 10% bias error is recommended by the ITU-R for radio propagation applications (ITU-R P. 618,2008).Therefore, emphasis is laid on the ITU-R recommendation because this work is tailored to have applications in radio wave propagation.

PERFORMANCE EVALUATION
The suitability of TRMM 3B43 V7 rainfall retrieval algorithms for Nigeria rainfall estimation has been examined.The results of continuous statistical analysis reveal that there is random negative and positive bias in the satellite estimation for the entire database considered.For all the locations, the highest and the least positive bias for annual analysis are 336.11and 0.32 mm which correspond to location Benin and Gusau respectively.While that of monthly, wet and dry are 31.68,5.65 and 10.23 mm for highest bias and 1.4, 0.004 and 0.0 mm for lowest bias respectively.
The monthly inter-comparative analysis between the grounds measured and satellite retrieved rainfall is shown in Table 3.The Mean Absolute Error (MAE) for all the stations ranges from 0.4007 to 0.0009.The highest MAE is found in the FCT Abuja while the lowest is from Abeokuta.The comparative analysis plot of ten year monthly average along with bias error is shown in Figure 2(a-d).The scattered plot of 10-year rainfall accumulation between the satellite and ground data is shown in Figure 3(a).The plot shows good correlation coefficient R 2 of 0.8522.The Fitness of TRMM 3B43 V7 estimated algorithm as compared with ground measurement is found to be very high for all stations.The highest Fitness is 0.9991 which corresponds to Abeokuta station and the least on the table is linked to Abuja with corresponding value of 0.7139.The highest values for Improved SMAPE and RMSE are 0.2401 and 14.99 while the lowest values are 0.0005 and 2.38 with corresponding locations at Markurdi and Abeokuta respectively.
The geometrical mean of correlation coefficient for annual, monthly, wet and dry are 0.43, 0.79, 0.64, and 0.41 respectively.The variation in the correlation coefficient is shown in Figure 3(b).The estimates of the TRMM algorithms seem to be affected by season.The dry season has the lowest correlation coefficient value, followed by annual, wet, and monthly which gives an indication that TRMM algorithms underestimate rainfall in dry season compared to wet season.The details are as shown in Table 4. Similar results are reported by researchers across the globe (Adeyewa and Kenji, 2003;Feidas, 2009;Sergio et al., 2009).This may be as a result of surface background effects which are largely influenced by topography and orological effect ensuing from region elevated terrain associated with barrier width, slope steepness, updraft speed and wind ward shielding from larger weather and rain bearing systems (Gomez, 2007).The plot of correlation coefficient as a function of stations is shown in Figure 4(a-d).
Some rain or no rain categorical statistics such as accuracy, frequency bias index, probability of detection (POD), and false alarm ratio (FAR) are computed for different percentage bias thresholds as follows: 5, 10, 15 and 20%.The overall accuracy is season dependent.It increases from annual through wet, monthly and dry (Table 5).The accuracy values for annual, monthly, wet, and dry season are 0.629, 0.553, 0.559 and 0.583 respectively at 10% bias threshold.These values indicate that more than half of all rainfall observations for the average of 10 years are correctly detected and estimated.The plot is as given in Figure 5(a).
The frequency bias index increases as threshold bias increases for all seasons.FBI for annual prediction ranges from 0.613 to 1.406, which indicates an underestimation at lower bias threshold and overestimation of measured accumulation at higher bias threshold.For monthly, wet and dry season, FBI underestimates at 29.4 to 62.3%, 32.7 to 74.6% and 22.8 to 39.6% for monthly, wet and dry season respectively.Similar result was reported in researchers (Gottschalck et al., 2005;Su et al., 2008;Daniel et al., 2009).The plot is as shown in Figure 5(b).The 10% bias shows almost accurate estimation of ground rain at 0.901.
The probability of detection increases gradually from 0.449 for 5% bias threshold to 0.666 for 20%.POD decreases from annual, followed by wet, monthly and dry as percentage bias increases.For 10% bias, its POD is 0.571 which indicates that more than half of the observed rain events were correctly detected by TRMM algorithms.The plot of the variation is as shown in Figure 5(c).
The false alarm ratio for both threshold bias and seasonal changes follows the same trend.The plot of false alarm ratio as a function of seasonal changes is shown in Figure 5(d).The minimum FAR is 0.113 for dry season rainfall accumulation at 5% bias threshold.This value indicates that approximately 10% of the predicted rainfall event did not occur.FAR for dry seasonal analysis for all bias thresholds is lower than that of annual, monthly and wet season.

Conclusion
A ten year monthly rainfall measurement from 2001 to 2010 was selected for validation analysis and performance evaluation of space borne rainfall estimation of TRMM 3B43 V7 algorithms.Two different statistical approaches were employed.The results for continuous statistical analysis for RMSE, Improved SMAPE and Correlation Coefficient reveal good agreement.The best correlation coefficient, RMSE and Improved SMAPE for the algorithm 3B43 versus gauge are 0.8522, 2.38 and 0.005 respectively.However, correlation coefficient has been reported not to be the appropriate measure in rain algorithm inter-comparisons because frequency distribution of rain is a much skewed one and not Gaussian (Martina et al., 2006).
In view of this, categorical statistical approach was used for further testing of the algorithms.The result of categorical statistic of rain or no rain occurrence reveals that satellite estimate's performance on annual and monthly basis for Nigeria is good.This is as a result of low percentage of false alarm ratios of approximately 10% recorded in the analysis.The results of accuracy and FBI for ITU-R recommendation threshold of 1 to 10% percentage error for radio propagation applications are        assessment would be useful in the on-going Global Precipitation Measurement Mission (GPM) 2013.

Figure 1 .
Figure 1.Map of Nigeria showing locations of state capitals.

Figure 2 .
Figure 2.Ten years mean and bias of satellite and ground data plot for (a) Annual, (b) Monthly, (c) Wet season and (d) Dry season analysis.

Figure 3 .
Figure 3.The plot of (a) Ten year monthly accumulation of Ground and Satellite and (b) Geometric mean correlation.

Table 1 .
Topographic features and 10 year annual mean average rainfall accumulation of TRMM 3B43 V6 and NiMet.

Table 3 .
Continuous statistic analysis for monthly inter-comparison.

Table 4 .
Geometric mean of continuous statistical analysis.and 0.901 for annual while that of monthly are 0.558 and 0.416.This statistical performance evaluation for Nigeria is acceptable as compared with the result from other locations/countries.Notwithstanding, more improvement is still needed on the algorithms.It is hoped that this valuable

Table 5 .
The result of categorical analysis.