Knowledge discovery from consumer behavior in electronic home appliances market in Chennai by using data mining techniques

The global economy is improving every year and during the forthcoming decades, marketers need to enter new national markets towards an understanding of how data mining techniques influences consumer behavior, which will be vital for consumer researches. The comprehension of available data mining methods to the presence of outlying measurements in the observed data is discussed as a major drawback of existing data mining methods. The psychological and social processes involved in consumer behaviour forms the subject matter of this study. The objective in accordance with an optimistic approach in terms of studying cause and effect in consumer behaviour will be combined with interpretative prominence on trying to understand the emotional, non-rational aspects of the process. The scope of this paper is to: (1) provide knowledge discovery in consumer behavior, (2) provide experience in the application of K-means data mining techniques in consumer behavior concepts to marketing management decisions. The methodology involves through systematic sampling method and prepared questionnaire which helps to discover knowledge from consumer behaviour predominantly through data mining for the extraction of hidden predictive information from large databases organizations can recognize valuable customers, predict future behaviors, and enable firms to make practical, knowledge-driven decisions. The study will be based on market segmentation wherein, retailers will realize they could no longer sell whatever they bought but had to begin competing for their businesses. This paper proposes k means clustering methods and dendograms suitable for the analysis of data in management applications.


INTRODUCTION
Chennai is one of the biggest metropolitan cities in South India with huge electronic market which includes the imported electronic goods from Singapore, Malaysia and Japan flourishes the entire city.From the year 2011 the Chennai consumer durables industry has noticed substantial developments due to globalization policies in India.The globalization and importing policies have made Chennai a predominant place for the Electronic Market.The unreliable standard of living, higher throwaway income coupled with greater affordability and a haul in advertising has been instrumental in bringing about a major change in the consumer behavior pattern.An *Corresponding author.E-mail: vijayambaphd@gmail.com.increase in throwaway income is supported by an augment in the number of dual-income nuclear families.
Stable income gains, consumer financing and hirepurchase schemes have become a major driver in the Chennai consumer durables industry.In the case of luxurious consumer goods, such as high-end home appliances like color televisions (LED), refrigerators, washing machines and split air conditioners , retailers in amalgamation with banks and financing companies to market their goods persistently.Superior technology and growing competition have tapered the price gap of durable goods.Several global players like Sony, Whirlpool, Philips, Samsung and LG are well established in the Consumer durables sector in India, with antagonism from strong Indian players like Voltas, Videocon, Bajaj Electricals, Blue Star, Carrier, Godrej, and MIRC Electronics.In management applications, data mining is often executed with the aim to extract information relevant for making predictions and/or decision making, which can be described as selecting an activity or series of activities among several alternatives.Decision making integrates uncertainty as one of the aspects with an influence on the outcome.The development of computer technology has allowed to implement partially or fully automatic decision support systems, which can be described as very complicated systems offering assistance with the decision making process with the ability to compare different possibilities in terms of their risk.The systems are capable to solve a variety of complex tasks, to analyze different information components, to extract information of different types, and deduce conclusions for management, financial, or econometric applications (Gunasekaran and Ngai, 2012;Brandl et al., 2006), allowing to find best available decision within the framework of the evidence-based management (Briner, 2009).Unlike classical statistical procedures, the data mining methodology does not have the ambition to generalize its results beyond the data summarized in a given database.In data mining, one does not usually assume a random sample from a certain population and the data are analyzed and interpreted as if they constituted the whole population.On the other hand, the work of a statistician is often based on survey sampling, which requires also to propose questionnaires, train questioners, work with databases, aggregate the data, compute descriptive statistics, or estimating nonresponse.Moreover, a statistician commonly deals with observing a smaller number of variables on relatively small samples, which is another difference from the data mining context.The application of data mining tools and techniques answers business questions in the past and it answers to questions by mining the available data and make customer relationship management achievable.Various data mining techniques exist among data mining software, each with their own advantages and challenges for different types of applications.The robust version of k-means cluster analysis tailor-made for high-dimensional applications (Gao and Hitchcock, 2010).

Review of literature
The consumer is king in age of consumerism (McGuire, 2000).The term consumer behaviour defines as, state "the behaviour of consumers in deciding to buy or use or not to buy or use or dispose or not to dispose of the products which satisfy their needs" (Schiffman and Kanuk, 1995;Chunawalla, 2000;Solomon et al., 2001).Consumer behaviour has been one of the major issues in the purchase involvement because it can be a significant mediator of consumer behaviour, which can fundamentally influence the consumer"s evaluation processes on certain objects (Mitchell, 1981).Last few years have witnessed of growing demands for different consumer products (Chunawalla, 2000).Increase in demand is a result of increase in income of the people and increase in discretionary income too (Arora, 1995).Researchers have made momentous efforts to classify and explain purchase involvement.Cohen (1983) highlights the plurality of views that coexist in the relevant literature regarding its meaning.Certainly, it is a concept which is often described as a pot-pourri of ideas (Laurent and Kapferer, 1985;Mittal and Lee, 1989).Some definitions emerging in the literature provide further enlightenment and illustrate its application to related concepts, such as motivation, goals and personality.Mittal and Lee (1989) summarize some of the most practical involvement definitions, including: "Involvement is said to reflect the extent of personal relevance of the decision to the individual in terms of her basic values, goals and selfconcept" (Engel et al., 1982); "Involvement is an internal state variable and indicates the amount of arousal, interest or drive evoked by a particular stimulus or situation" (Mitchell, 1979(Mitchell, , 1981)).As a matter of fact, two definitions have been taken on as being most suitable for this research.First, Rothschild (1984) developed a generic definition which integrates involvement with other variables which either determine it, or are determined by it, as follows: "Involvement is a state of motivation, arousal or interest.This state exists in a process.It is driven by current external variables (the situation; the product; the communications) and past internal variables (enduring; ego; central values).Its consequences are types of searching, processing and decision-making."Secondly, a definition from (Mittal and Lee, 1989), who directly relate involvement to a goal object, and thus to needs, motives and benefits."Data mining" is defined as a sophisticated data search capability that uses statistical algorithms to discover patterns and correlations in data (Harry Newton).Data mining discovers patterns and relationships hidden in data (Edelstein, 1995), and is actually part of a larger process called "knowledge discovery" which describes the steps that must be taken to ensure meaningful results.The purpose of data mining is to try to extract all meaningful knowledge from the shape of data (Esling and Agon, 2012).Data mining software does not, however, eliminate the need to know the business, understand the data, or be aware of general statistical methods.A common disadvantage of popular data mining methods (linear regression, classification analysis, machine learning) is namely their high sensitivity (non-robustness) to the presence of outlying measurements in the data.Data mining does not find patterns and knowledge that can be trusted automatically without verification (Kalina, 2013).Data mining helps business analysts to generate hypotheses, but it does not validate the hypotheses.

Objectives of the study
1. To assess the factors influencing the knowledge discovery from the consumer behaviour of the respondents 2. To group the respondents into group based on the factors of active participation in purchasing the electronic products using data mining techniques.3. To study the impact of consumer behaviour in purchasing involvement on the influence of data mining techniques in family purchase decisions of durable goods.

Hypotheses
-There is no significant relationship on knowledge discovery between demographics and consumer behaviour.
-There is relationship between consumer behaviour and purchasing persuasion.

METHODOLOGY
Data mining can be characterized as a process of infor-mation or knowledge extraction from data sets, which leads to revealing and investigating systematic asso-ciations among individual variables.
The most common methodology to clustering includes the hierarchical clustering and k-means clustering (Vintr et al., 2012).
The best market conditions for successful segmentation seem to depend on three factors: potentiality, accessibility and size of prospective segment.In addition to that the study will adopt five forms of data segmentation: geographic, demographic, psychological, segmentation by use and benefit.The study was conducted in Chennai city of Tamilnadu state in India taking the database from various electronic home appliances retail shops in each zone viz., Central, North, West and South Zone.The researcher has adopted group sampling procedure for the data collection.The entire population was divided into Central, North, South and West Chennai based on geographical location, using customer directory as the source (Table 1).From each part of the selected store of Chennai city all possible areas were identified.Among them few areas were selected using systematic sampling method covering 50% areas from each cluster (Table 1).From each selected area, the required number of respondents was selected based on judgment sampling or purposive sampling by using some common criteria like reference groups, subject knowledge, occupational status and their attitude to cooperate for this study.The prepared questionnaires were distributed among the respondents visiting the shops for the survey purpose.The respondents were chosen through friends, relatives and using customer data base including telephone numbers as a source for identification.Of the 500 respondents contacted because of incompleteness and other survey difficulties, only 417 usable questionnaires were collec-ted.Out of 417 usable questionnaires the consumer behaviour in purchasing involvement were analyzed.
In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed.The methodology of this paper involves dendogram and Cluster analysis by k-means.

The questionnaire
Prior to the survey management, there was an earlier test of the questionnaire were conducted with a small group of customer respondents was collected and the results were agreeable.The first part of the questionnaire con-sisted of questions relating to consumer demographics namely age, education, income, type of family and family life cycle stage.The second part consisted of questions relating to consumer behaviour and purchasing involvement assessment.For this purpose, popular The Purchasing-Involvement Scale developed by Slama and Taschian (1985) was used.Five point Likert scale (1 Strongly disagree, 2 = Disagree, 3 = Not certain or undecided, 4 = Agree, 5 = Strongly Agree) was used to assess the personality of the respondent.The third part related to the measurement of purchase influence of the various family members at each stage of the product purchase decision making process.Consumer durables like Television, Refrigerator and Washing Machine were chosen for this study as they are highly expensive products and also because these are the most common products used by almost every household.

Analysis and interpretation
The analysis was carried out in two steps.In the first step, the respondent demographics and socio-economic characteristics were plotted and in the second step, the responses for Consumer behaviour and purchasing involvement assessment were analyzed through Factor Analysis.
Table 2 shows that the sample is represented and includes respondents of various age groups, different

Consumer behaviour and purchasing involvement assessment
Factor Analysis by principal component is applied on 15 variables of consumer behavior and purchasing  nor discount and bargained offers.

Factors of consumer behaviour of respondents in purchase involvement
The principal component method has reduced the fifteen variables into five predominant factors for respondents in purchasing involvement.At this point in time it is essential to classify respondents based on their discernment about five predominant factors.The two-step hierarchical cluster analysis, dendrogram as well as agglomeration schedule were used to emphasize the number of clusters in the sample unit.A dendrogram is used to evaluate the cohesiveness of the clusters formed and provides information about the appropriate number of clusters to remain.schedule shows the cases or clusters at every stage, the distances between the cases or clusters being combined, and the last cluster level at which a case joined the cluster.The results of all the three methods justified the presence of three clusters of respondents based on purchasing involvement.K-Means Cluster analysis is subjugated in this context to identify the existence of diverse groups of respondents...,param1,val1,param2,val2,...)

Description of Kmeans
IDX = kmeans(X,k) partitions the points in the n-by-p data matrix X into k clusters.This iterative partitioning minimizes the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances.Rows of X correspond to points, columns correspond to variables.kmeans returns an n-by 1 vector IDX containing the cluster indices of each point.By default, kmeans uses squared Euclidean distances.When X is a vector, kmeans treats it as an n-by-1 data matrix, regardless of its orientation.
[IDX,C] = kmeans(X,k) returns the k cluster centroid locations in the k-by-p matrix C.
[IDX,C,sumd,D] = kmeans(X,k) returns distances from each point to every centroid in the n-by-k matrix D.
The following are the results of Kmeans Cluster analysis (Table 4 and Figure 4).The final cluster centers table shows the mean values for the three clusters that reflect the attributes of each cluster.For example, the mean values of the Carefullness and Undisturbed for the first-cluster are 3.42 and 2.16 respectively.This means that the first cluster of respondents is giving high importance to Carefulness and less importance to Undisturbed.It is also distinguished from Table 4 that no particular factor is profoundly loaded on any particular cluster segment.The rank of the clusters on every factor is also given in Table 4 (Figures  1, 2 and 3).The description of all three clusters along with the label is as follows.

Cost conscious respondents
The first cluster has a low mean value of 2.69.It is ranked third in factors such as Carefulness, Shrewdness, Cost Consciousness, Triviality and Undisturbed.This means that respondents under this segment are not driven by cost and considers shopping judiciously as one of the main concerns in life and always be concerned about making the best shopping deals.They give lot of magnitude to brand names.They also look for getting best value for money.Hence, this group can be designnated as cost conscious respondents.

Economic cognizant respondents
The second cluster has a lower moderate mean value of2.74 and is ranked first in factors such as cost consciousness.It is ranked as third in Undisturbed.This means that respondents under this segment are cautious consumers who spend lot of time and efforts in buying product at the cheapest possible price irrespective of the brand name.They even decide to buy and wait for offers and discounts.

Luxury seeking respondents
The third cluster has a mean value of 3.08 and is ranked fisrt in the mean value of factors such as carefulness, Triviality and Undisturbed.It ranks third in terms of Shrewdness.This means that respondents harangue shopping wisely as a rather petty issue when compared to thinking about how to make more money.They are not interested in discount offers and bargaining.They do not like to worry about making the best deal when they go for shopping and they like to spend money as they please.Hence, this cluster can be designated as Luxury seeking respondents.
The final cluster centers table shows that the three clusters differ in mean value of all the five factors (Table 5 and Figure 5).The ANOVA table indicates that the difference exists among the three clusters in the mean values are significantly different.All the eight factors have significant contribution on dividing respondents into three segments based on purchasing involvement.The number of cases in each cluster table indicates that there are around 155 respondents out of 417 respondents in cluster I of cost conscious respondents followed by 139 respondents in cluster II of economic cognizant respondents and 123 respondents in cluster III belonging to the group of luxury seeking respondents (Table 6).This means that 37% of respondents are cost conscious consumers and 33% of respondents are economic cognizant consumers.This implies that respondents are more influenced by cost and economic in durable purchases.

Relationship between consumer behavior in purchasing involvement segments and purchase decision making of durables
It is essential to analyze the factors that resolve con sumer behaviour in purchasing involvement based seg-mentation (Figure 6).The chi-square analysis was done to find out whether consumer behaviour in purchasing involvement has impact over the authority of respondents at various stages of purchase decision making process (Table 7 and Figure 7).

Conclusion
This paper fills the gap of statistical methodology for data mining by introducing k-means clustering methods and dendrograms.Data mining methods are routinely used by practitioners in everyday management applications.This paper persuades the readers that consumer behaviour is a crucial aspect of data mining which has remained only a little attention so far, although sensitivity of data mining methods to the presence of outlying observations has been repeatedly reported as a serious problem (Wong, 1997;Yeung et al., 2010).The study has assessed five key consumer behavior factors of respondents namely Carefullness, Shrewdness, Cost Consciousness, Triviality, and Undisturbed.Based on these key Consumer behaviour dispositions, Chennai respondents can be divided into three segments: Cost Consciousness respondents (37.17%);Economic Cognizant respondents (33.33%) and Luxury seeking respondents (29.49%).This entails that Chennai respondents are obsessed by value of the product and savings made by means of elegant purchases.Association between demographics and consumer behaviour clusters was found to be momentous only in the case of occupation, which means that there is a major difference between housewives and working wives is in terms of their consumer behaviour in terms of purchasing involvement.The rationale accredited to this may be related with their part to the family income as well as with their revelation to mass media and society.Findings of this study through data mining techniques may be the result of the fact that respondents are contributing more and thus have expanded their power.Another viewpoint is that the decision making might be influenced by the relative expertise of individuals (French and Raven, 1959).It is possible that as more respondents join the labor force and engage in other activities outside of the home, their skill into different areas may have amplified.In the earlier period, the spouse may have been considered the "authority" in    many of the resolution areas; the contemporary respondents may have gained knowledge and experience as well as buoyancy in these areas.More experiences outside of the home may lead to increased proficiency in the product areas.Respondents are more or less knowledgeable as their spouse in most of the decision making areas.This augmented capability has increased their strength in decision making and family purchase decision making process of home appliances.This paper introduces new tools for the clustering analysis using an intuitively clear requirement to down-weight less reliable observations.The Dendrogram and k-means clustering helps to discover the knowledge factor from the selected questionnaires and k-means have played a predominant role in knowledge discovery from the consumer behaviour analysis and there are further opportunities for research in other areas of data mining tools and techni-ques.The k-means approach still requires an extensive evaluation with the aim to detect a possible over fitting by means of a cross-validation or bootstrap methodologies.

Figure 6 .
Figure 6.Criteria segments discriminant plot for consumer behaviour in purchasing involvement.

Figure 7 .
Figure 7. Association between purchasing involvement segments and purchase decision making of washing machines.

Table 1 .
Electronic home appliances stores zone wise division of Chennai City.
types of occupation.The typical respondent in this study is a graduate working respondents in the age group of 31 to 40 years earning family monthly income of Rs. 10001 to Rs. 20000 been married having a nu-clear family with two children.Table 2 also shows an almost equal representation of the respondents from all the stores of the four zones of Chennai city hence representative of the Chennai population.

Table 3 .
Factor analysis results.

Table 4 .
Results of K-means cluster analysis.

Table 5 .
ANOVA (Analysis of Variance) for the factors of consumer behaviour in purchasing involvement.

Table 6 .
Number of cases in each cluster.

Table 7 .
Chi-Square value for association between purchasing involvement segments and purchase decision making of washing machines. S/
Figure5.Results of analysis of variance.