A filter based fisher g-test approach for periodicity detection in time series analysis

Periodicity is an interesting property of many time series data sets. A period can be defined as a self repeating pattern. This pattern provides useful information about the inherent structure in cyclic data set. In this paper, a filter based Fisher g-test approach is introduced. The filtering approach is based on the singular spectrum analysis. The power and running time of the proposed filter based approach are compared with non robust approaches. To evaluate the performance of the proposed approach we have performed a comprehensive simulation study. The results confirm the superiority of the proposed approach, considering various criteria which is insensitive to heavy contamination of outliers and short time series.


INTRODUCTION
Periodicity provides useful information about the inherent structure in cyclic data set. For example the human respiration pattern is an example of an important periodic process. Deviation from normal periodic behavior is observed in many diseases. Periodicity can be used to derive the signature of normal breathing patterns and thereby facilitating abnormality detection. Periodicity not only helps to understand the properties of a single time series, but can capture complex relationships among multiple time series. For example, the heart rate, chest volume and blood oxygen concentration can be related through their periodic pattern. A fundamental nonparametric tool for detecting the periodicities of time series data is the periodogram. Although it is a basic spectrum estimation tool widely applicable in different application, but it is not a consistent estimator of the spectral density function. However, despite the inconsistency of the periodogram as a spectrum estimator, it is a useful tool for developing statistical inference methods for the spectral since its statistical properties are known. Consequently, many of the traditional statistical tests of the detection of periodic time series such as Fisher's test (Fisher, 1929) can be expressed in terms of the periodogram. Although the aforementioned methods provide exact test because they are based on a Gaussian assumption and a type of least squares estimation; they are not robust and can fail if the original noise assumptions do not hold. For example in many applications, the exact noise characteristics are usually unknown and can be remarkably non-Gaussian. Furthermore, the observed time series data can exhibit outliers, short length and distortion from the original wave form. Therefore, the computational methods should preferably be in robust such anomalies in the data. To solve this problem, a robust version of the fisher g-test has been introduced by Wichert et al. (2004) and Ahdesmaki et al. (2005). We review this method in this study and compare it with the proposed approach in there after. Here we consider another alternative approach.
According to this approach, we start with filtering the perturbed data in order to reduce the effect of existence of outliers and then we use fisher g-test. It is expected that the obtained results by this approach are more effective than the first two as we do not remove outliers. Furthermore, our proposed approach works very well even for a small sample size. Moreover, we reduce the noise level in order to increase the data quality improvement. In line with this research, it has been shown that noise reduction is important for curve fitting in the linear and nonlinear regression models (Hassani et al., 2009a(Hassani et al., , 2010a. The next challenge is to choose a proper filtering technique. There are several linear and nonlinear methods for filtering noisy data. It has been shown that the singular value decomposition based techniques are more effective than the other ones for the noise reduction and filtering (Golyandina et al., 2001). Here, we use the singular spectrum analysis (SSA) technique which is an SVD-based approach as a filtering tool. SSA is designed to look for nonlinear, nonstationary, and intermittent or transient behaviour in an observed time series and following its successful application in the physical sciences, applications in economics and finance are now also finding favour (Thomakos et al., 2002;Hassani and Zhigljavsky, 2009;Hassani et al., 2009b). It is noticeable that there are several modification of SSA procedure (Golyandina et al., 2001;Hassani, 2010), however here we use basic version of SSA.
The structure of this paper is as follows: subsequently, we introduce the periodogram method as a standard periodicity detection tool to obtain Fisher's test and robust version of it, after which the new approach based on SSA is introduced. This is followed by a presentation of the result of comparison based on simulation studies. Finally, a brief conclusion is presented.

THE PERIODOGRAM AND FISHER G-TEST
Where t e is the noise term with distribution 2 (0, ) Model 1 we can test The fundamental, nonparametric tool for spectrum estimation is to use the periodogram as defined as follows: ( 1) is an autocovariance function at lag s. Thus, we are able to test whether a series contains multiple m periodic components by postulating the model. It should be noted that if we observe that the periodogram contains a number of sharp peaks, we should not conclude immediately that each of these peaks corresponds to a genuine periodic component obtained from series t y . It has been recommended that we need to apply a suitable test to the periodogram peak to determine whether its value is significantly larger than that which would be likely to arise if there were no genuine periodic components in the model. The usual procedure is to start by plotting the periodogram ordinates at the standard frequencies , and then test the value of the largest observed peak. Fisher (1929) derived an exact test for the detection of hidden periodicities of unspecified frequency in time series based on the following statistic: Test statistic (3) known as Fisher's g-test statistic. Since the g-statistic divides the maximum periodogram ordinate by the sum of all periodogram ordinates, large values of g indicate a strong periodic component and can lead to the rejection of the null hypothesis. Fisher showed that (for the case n odd) the exact distribution of g under 0 H is given by: n a n n az a n a n z n n z n z g p Where "a" is the largest integer less than . Thus, for any given significance level α , we can use Equation 3 to find the critical value α g , such that If the g value calculated from the series is larger than α g then we reject the null hypothesis and conclude that the series t y contains the specified periodic component (Wei, 1990).

Robust Fisher g-test
Let us turn back to the spectrum estimation problem. As it was mentioned in Priestley (1981), the periodogram ) (ω n I is equivalent to the correlogram spectral estimator ) ( k r , that is: Where ) ( k r is the biased estimator of the autocorrelation function. Since the time series data is often contaminated with different types of outliers, the spectral estimation method and our test results are not reliable in most cases. To overcome this problem we consider a ranked based autocorrelation estimator for the problem of spectrum estimation. This estimator is a moving-window extension of the Spearman rank correlation coefficient, quantifying the association between sequences {yj} and {yj + k}. More specifically, we consider the correlation coefficient between the data ranks Ry(i) and ) (i R y ′ defined by Ahdesmaki et al. (2005) as: Where C is a normalization factor, Ry(i) denotes the rank of yi in the set S = {yj : j ∈ Im} and ) (i We called the aforementioned test statistic as 'robust fisher g-test statistic'. Note that, the exact distribution of the g-statistic, for example, under the Gaussian noise assumption is unknown. Therefore, to obtain the significance values we may consider the simulation studies. Moreover, this method requires intensive numerical computations.

FILTER BASED FISHER TEST
Here, we aim to use a filter based approach as a new approach to circumstances which there are outliers in the dataset. We use singular spectrum analysis (SSA) that is a powerful technique for time series analysis incorporating the elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing (Golyandina et al., 2001). In what follows we give a breif explanation of the SSA method (Hassani, 2007).

Singular spectrum analysis
The SSA technique consists of two complementary stages: decomposition and reconstruction and both of which include two separate steps. The original time series is decomposed into a number of additive time series, each of which can be easily identified as being part of the modulated signal or as being part of the random noise. This is followed by a reconstruction of the original series. A brief description of the method will be given here. Consider the real-valued non-zero time series , , , , . . , L), then the SVD of the trajectory matrix X can be written as: According to the basic SSA algorithm, the L -dimensional data is projected onto this r -dimensional subspace and the subsequent averaging over the off diagonals allows us to obtain an approximation to the original series. The main postulate of SSA procedure is that this approximation has the least noise effect, therefore we expected that the resluts obtained by this method have high precision.

SIMULATION STUDIES
Let us now evaluate the performance of our proposed approach using simulation study. The test signal model is as follows: Where t = 1,…, N and is uniformly randomly chosen and is an i.i.d. noise sequence. We consider two types of noise levels: i) Gaussian noise (zero mean). ii) Gaussian noise and impulsive noise.
For the second case, we consider several data points randomly and multiply them with a constant number.

Power test
Let us now examine the power of our proposed test, that is, the probability that the test will reject a false null hypothesis. The power of the test is estimated for the three different procedures; that is, non-robust fisher gtest, robust fisher g-test and filter approach based on SSA. We have also considered different time series lengths and different noise parameters. The simulations were repeated 1000 times. The test power has been calculated as follows: using GenCycle package written by Ahdesmaki et al. (2005), we obtain the p-value of both the fisher-g-test and robust fisher g-test. Then, proportion of the rejection of false null hypothesis from 1000 p-value of the simulation runs gives the power test. Another point that we must to clarify is the parameters of the SSA. Certainly, the choice of the parameters depends on the data and the analysis we have to perform. Many rules have been proposed in the literature (Golyandina et al., 2001;Golyandina, 2010). According to common suggestion of the researchers for choosing the SSA parameters, we use half of the time series length for window length parameter. Choosing the number of needed singular values for the filtering stage, r depends on the structure of the series (Golyandina et al., 2001). Here we use r = 2 singular values to refine the series as we have a simple sine series and there is no intercept in the model. To gain a better understanding of the effect of filtering and evaluating the performance of the proposed approach, we consider our simulation studies with several levels, parameters, different percentage of contaminations and various noise levels. For all these the case-specific noise assumptions are used for both the null hypothesis (H 0 : β = 0) and the alternative hypothesis (H 1 : β > 0). Figures 1 to 4 represent the results. Solid, dashed and dotted lines denote SSA, non-robust fisher gtest and robust fisher g-test, respectively. Figure 1 shows the test power for periodicity detection with alpha levels 0.01, 0.05, 0.10 and 0.15 for Model 8 (considering β = 2 and = 0.05). As can be seen from the figure, there is a significant difference between the power of the filter based approach and other approaches for all cases. Figure 2 shows comparison of three methods with selected parameters β = 0.5, 1.0, 1.5 and 2.0. The results confirm the superiority of filter based approach. Figure 3 shows the results for four noise levels σ = 1.5, 2, 2.5 and 3. As can be seen from the figure, the power of new approach becomes better by increasing noise level. Different percentages of contamination considered in Figure 4. The results indicate that the power of filtered based fisher g-test remains high in the circumstances of high percentages of contamination.

Running time
Let us now consider the performance of all aforementioned      test with respect to running time. Table 1 represents the results. The results indicate that the running time of the proposed approach is also faster than the robust fisher gtest.

Conclusion
Our simulation results with strong evidence confirmed that the filtering approach using SSA and then using fisher g-test is more robust than the robust fisher g-test.
The proposed approach yields powerful results in finding periodicity in time series. As illustrated in the simulations study, the proposed filtered based approach has clearly better performance than the Fisher test and robust version of it considering different aspects. Moreover, the results confirm that the running time of the filtered based approach substantially is less than the robust version and has been used so far.