New application of principal component regression in estimation of electrical energy consumption in an abnormal automatic meter reading system

This paper proposes a new application of principal component regression (PCR) for estimating electrical energy consumption in case of abnormal automatic meter reading (AMR) systems. These events occur in a delivery metering system such as problems from mistakenly setting and connecting meters in electrical systems, broken metering accessories, etc. The estimation is performed by using MATLAB. The unclean sampled input data is used to estimate the target output data. The mean absolute percentage error (MAPE) is used as estimation performance. In this proposed estimation, load profiles obtained from the AMR are used as input data for training to create estimation model and for testing to validate model. Estimated results are verified by comparison between the proposed PCR application and other applications such as simple linear regression (SLR), multiple linear regression (MLR). The proposed PCR gives the best error results of MAPE for the lost electrical energy estimation.


INTRODUCTION
Presently, technologies of energy meter have developed rapidly (Alahakoon and Yu, 2016).In particular, the cost of energy meter technology is greatly reduced.The reason is that most electrical energy providers pay attention in the energy meter technology development and the energy consumption data record system for the customer monthly payment operation.
In the past, most electrical energy providers chose a mechanical or electronic meter for the electrical energy consumption measurement of the customer and the monthly energy payment operation, because of the price which was lower than a smart meter.But in the present, those electrical energy providers have required database technology, energy management and electrical energy consumption history.An automatic meter reading (AMR) is used for those requirements and designed to be used with smart meters (Paris et al., 2014;Stephen et al., 2014).
A smart meter has more advantages than the electronic meter in electrical energy consumption history capacity that data history period limit is 45 days and the cost of both gets nearly closer in the present.*Corresponding author.E-mail: kantikoon@hotmail.com.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License So this is inclination that the mechanical meter usage is decreased and the mechanical meter will be replaced by the smart meter in the present day and the near future.
The prominent feature of electrical energy data recording in the smart meter is specified to be used with the AMR which the electrical energy consumption data in the energy meters for all customers will be recorded at the database of the AMR.
Smart grids are the extension, which themselves are built upon AMR systems (Arif et al., 2013;Khan et al., 2014).AMR systems are achieved by using many communication technologies, including power-line, radio frequency and mobile network such as GPRS/GSM.As many energy providers look for upgrade progress toward smart grids, replacing mechanical meters by the entire infrastructure may not be economical.For that reason, most smart grid developing providers have formulated priority of suitable smart grid development region.
In Thailand, provincial electricity authority (PEA) has formulated framework of smart grid (Meenual and Thongchai, 2009).Current smart grid development of PEA is in the preparation stage, including suitable PEA smart grid Technologies selection and adaptation, implementation plan setting, and necessary PEA smart grid foundation development.PEA intends to aim at pilot project of PEA smart grid in the near future.
Currently, a soft computing method is widely used to determine the reliability of the distribution system or to predict the electricity load demand (Singh et al., 2013).The classification of distribution system loss in case of the economical profit can be defined into two items, namely technical loss and non-technical loss.The technical loss occurs by loss of transmission line and equipment in distribution line like copper loss.Unlike technical loss, non-technical loss is caused by abnormal metering equipment or violation and meter tampering by customer.
There are several methods used for the non-technical energy loss estimation which is aimed to claim payment from customers in case of energy theft or damaged equipment by customer or forced majeure.One method commonly used by most providers to estimate is the average monthly electricity consumption of the previous three months abnormal, or three months after an abnormal ending for consideration of that estimation.Those methods have limitations in affecting the performance and reliability of the estimation.Especially, the condition of the electrical consumption behavior will be relatively stable and continuous.
There is rarely research on the lost energy estimation in abnormal metering of electricity customer such as using simple linear regression (SLR) with a focus on a minimum variable input reduction for estimation of output, but some significant data is limit.The multiple linear regression (MLR) (Black and Henson, 2014) was used to estimate for improving the limitation of SLR application.However, MLR is still limited in the case of uncleanness Kantikoon and Kinnares 93 of input data to be used for simulation of the model in some times.In this paper, the sampled data is considered from the energy consumption data of the industrial energy customers in Thailand.These energy data will be managed by using AMR system (Ladarat and Naetiladdanon, 2015) of PEA in Thailand.The energy data history recorded in the data center of AMR will be used to increase the efficiency of the energy data management for customers and providers.Thereafter, the load profile of energy consumption will be used as input data for the real energy estimation in case of the abnormal metering system.

OPERATION OF AUTOMATIC METER READING
In the present study, most energy providers have more interest in the automatic meter reading (AMR) (Figure 1).This is the reason why many business factors have become a necessity for the advantage in business competition, such as the cost saving capacity of energy providers and the energy saving by monitoring and management of customers.The main part of an AMR system is electrical energy usage measure and recording equipment.The smart meter is an important electrical measure equipment, like a joint of service and communication between providers and customers in real time.The communication system is necessary for AMR as it plays an important part in energy consumption data transmission automatically, such as load profiles, monthly billing data, and alarm logs of the smart meter.These data will be transmitted to a meter interface unit (MIU) and be saved in the database of AMR system (AMR data center) for every fifteen minutes.Therefore, the energy provider can take advantage of the energy consumption data monitoring for non-technical loss or illegal electricity usage detection (Erdene et al., 2013).Similarly, the customers can observe and manage the self-energy consumption demand.

PRINCIPAL COMPONENT REGRESSION
Linear regression analysis for the electrical fields has been widely used in the load demand forecasting.Normally, the algorithm of linear regression is classified into two types according to the number of input data, including single linear regression (SLR) in case of single input data and multiple linear regression (MLR) in case of multiple input data.In this case study, both SLR and MLR are used for a comparison with the proposed algorithm.
Figure 2 shows the proposed algorithm of the principal component regression (PCR) models consisting of MLR and the principal component analysis (PCA).
The estimate procedure of regression has two parts,  including model creation and estimation.The regression learns a function that maps input variables to their target output.Conventionally, that function is a static function enabling the estimation of responses for new input variables.The multiple regression model can be expressed as , is supposed to be a linear relationship between these two sets of variables W .It can be estimated as The equivalent matrix norm equation is expressed as where U is an orthogonal matrix of scaled principal component scores, V is an orthonormal matrix of eigenvectors, and S is a diagonal matrix of the singular values, like the dimension of X , respectively.

MODELING
In this case study, the aims are to estimate the energy consumption data of an industrial electricity customer of PEA in Thailand.This will be used in case of the nontechnical loss energy which is the results of the abnormal metering from the AMR system in every fifteen minutes.And then, the data selection and statistic error will also be shown here in this work.

Data selection
The proposed PCR is used for the electrical energy estimation which is created in two models, including training model and testing model.
The variables of training model consist of the sampled input data and the sampled output data.The sampled input data are the voltages and the currents, whereas the sampled output data is the energy equivalent to one kilowatt of power sustained for one hour (kWh).Those are recorded before abnormal metering data period.
The variables of testing model consist of the sampled input data.These are the voltages and currents recorded during abnormal metering data period.
The parameters of the model simulation are used in the proposed PCR application.As shown in Table 1, items are the abnormal and normal input variables, the input PCs variables, and the output variables in units of kilo-  watt hours, which are used for the estimation.
The flowchart for the process estimation of electrical energy consumption by the PCR application is proposed as shown in Figure 3.

Statistic error
In this paper, the statistic error can be calculated by using Because the results of this case study are for every fifteen minutes energy estimates, MAPE was also used to evaluate the estimate accuracy for every fifteen-minute energy.The formulas are expressed as follows,

Sampling of Input Electrical Voltage Data for Training (15 min)
where n is the number of observations, at the 15 min interval i ,

SIMULATION RESULTS
This simulation uses the estimation for the abnormal voltage and current of any phase.PCR using 5 principal components (PCs) of the input parameters in case of the abnormal voltage of any phase, and 3 principal components (PCs) of the input parameters in case of the abnormal current of any phase is employed.The proposed estimation results will be used to compare with the estimation results using MLR for 5 normal input parameter and SLR for only 1 normal input parameter.
In order to demonstrate the error of the clear estimation, the unclean input voltage data that reflect the advantages of the method over other method is used.The unclean input voltage data can be found frequently in the case of an oxide at the joint of the electrical circuit wires or the electrical wire terminals of the meter, especially in case where the meter is installed in areas with high pollution.
In the training, the input and output data are achieved from the AMR system before abnormal metering occur.For example, the recorded input voltage data and current data of phases a, b, and c are illustrated in Figures 4 and  5, respectively.Also, the recorded electrical energy consumption data or target data is shown in Figure 6.
The input data is used for MLR in order to confirm the accuracy in comparison with PCR.The results of the estimation error are shown in Figure 7 in the event of irregularities such as the abnormal input voltage data of phases a, b, and c .
The estimation error results in using MLR are shown in Figure 8, in case of the abnormal input current data of phases a, b, and c.
The same output and input data with MLR for training is modeled in the proposed PCR by using MATLAB program, 5PCs matrix for the input training data in case of the abnormal voltage of any phase are shown in Table 2. 3PCs matrix for the input training data to create an MLR model in case of the abnormal input current data of any phase is shown in Table 3.  4, for using as a weight and bias of the testing model which is used to estimate the output testing data in case of abnormal input voltage data of any phase.
Also, the parameters of the regression matrix W and the estimation error matrix  for estimating the output testing data in case of the abnormal input current data of any phase are shown in Table 5.
The testing data obtained from the AMR consists of the voltages and currents data of phases a, b and c for testing as shown in Figures 9 and 10, respectively.The measured electrical energy consumption for testing are shown in Figure 11.
The input and output data for testing measured at other times is used to confirm the accuracy of the estimation by  According to results from Figures 12 to 17, they are in close estimation.They show that the proposed energy estimation method offers satisfactory performance.
For MAPE results of the proposed PCR, as shown in Table 6, it is found that MAPE is 4.48% in case of the abnormal input voltage data of phase a, 3.55% in case of the abnormal input voltage data of phase b, and 3.23% in

Abnormal voltage of PCs matrix
Phase a

Abnormal current of PCs matrix
Phase a case of the abnormal input voltage data of phase c in which MAPE is equal to that of the MLR application.
In addition, SLR using the best only 1 input current data of the model (which is not the abnormal current) will be used for estimating the output energy data in all cases of the abnormal input voltage data, for which MAPE is Kantikoon and Kinnares 99 Table 4.
W and  matrix of PCR for the simulated model in each case of abnormal input voltage data.

Abnormal voltage of W and  matrix
Phase a For MAPE results of the proposed PCR, as shown in Table 7, it is found that MAPE is 4.58% in case of the abnormal input current data of phase a, 3.75% in case of the abnormal input current data of phase b, and 8.03% in case of the abnormal input current data of phase c.MAPE for the proposed PCR is the least error value in contrast to the other applications.
From MAPE results for MLR, the normal current and voltage of all phases will be used for estimating the output energy data in case of the abnormal input current data of any phase for which MAPE is 5.00% in case of the abnormal input current data of phase a, 5.21% in case of the abnormal input current data of phase b, and 9.22% in case of the abnormal input current data of phase c.
For MAPE results by SLR, the best only 1 input current data of the model (which is not the abnormal current) will be used for estimating the output energy data in all cases of the abnormal input current data, for which MAPE is 13.86% in case of the abnormal input current data of phase a, and 10.06% in case of the abnormal input current data of phases b and c.

Conclusion
This paper has proposed a new application of PCR in estimation of electrical energy consumption in case of abnormal metering in an AMR system.The error results in this paper show effective performance of the application using the proposed PCR.The simulation results and comparison results between the proposed PCR, MLR and SLR have shown that the MAPE for the proposed application is the best error when compared to that for MLR and SLR of the estimation in case of the abnormal input voltage or current data at any phase.In the difference, MLR is estimated by using 5 normal input data, or SLR is estimated by using the best only 1 input data selection for the output energy estimation.

Figure 1 .
Figure 1.Structure of the AMR system.
error in which 2  is the variance of the observation.In step of estimation, a training part for model creation of input/output variables set pairs the estimated regression coefficient, the output variables can be predicted for the input variables.Generally, Equation 4 requires that invertible.However, this condition may not always be satisfied.To avoid the unclean input data for training and the irreversible problem of MLR, PCR replaces the input variables by principal components to estimate the output variables.Particularly, PCR firstly projects x onto a low- dimensional subspace value of P stands for the eigenvectors of subspace k  which can be obtained by eigenvalue decomposition techniques.Therefore, the solution reduces to current of phase a, b, and c ; principal component of 1st to 5th ; and kWh: is the electrical energy used in one hour.
voltages and currents recorded in every 15 minutes of the normal and abnormal data.Output data The electrical energy consumption.Divide data to training and testing patterns Training Calculate output of training pattern Record weights and biases Testing Calculate output of testing pattern Calculate and record MAPE of testing pattern Selecting abnormal testing model End Use the principal component analysis to determine the principal components for replacing the independent variables to estimate the dependent variables.

Figure 3 .
Figure 3. Flowchart diagram for the proposed estimation.

Figure 4 .
Figure 4.The input voltage data of the training.

Figure 5 .Figure 6 .
Figure 5.The input current data of the training.
using the proposed PCR application and other applications.The estimated energy results and the estimation error of the proposed PCR in case of the abnormal input voltage data of phases a, b, and c are shown in Figures 12 to 14 , respectively.Figures 15 to 17 show the estimated energy consumption and the estimation error in case of the abnormal input current data of phases a, b and c by the proposed PCR, respectively.

Figure 7 .Figure 8 .
Figure 7.The output estimated energy errors by MLR for the abnormal input voltage data of each phase (a, b, c).
matrix of the simulated model using PCR in each case of abnormal input Current data.

W
and  matrix of PCR for the simulated model in each case of abnormal input current data. in case of the abnormal input voltage data of phases a, b and c.

Fig. 16 .Figure 17 .
Fig. 16.The estimated output data by the proposed PCR for the abnormal current of phase b and corresponding error compared the measured output.

Table 1 .
Parameters of the model simulation using the proposed application

Table 2 .
5 PCs matrix of the simulated model using PCR in each case of abnormal input voltage data.

Table 6 .
MAPE comparison of the proposed PCR application with MLR and SLR in case of abnormal input voltage data at any phase.

Table 7 .
MAPE comparison of the proposed PCR application with MLR and SLR in case of abnormal input current data at any phase.