Clusters and factors associated with complementary basic education in Tanzania mainland

Complimentary Basic Education in Tanzania (COBET) is a community-based programme initiated in 1999 to provide formal education system opportunity to over aged children or children above school age. The COBET program was analyzed using secondary data collected from 21 regions from 2008 to 2012. Cluster analysis was applied to classify the 21 regions in terms of enrolments by cohort, dropouts, gender, and regional per capital Gross Domestic Product (GDP). The cluster analysis classified 21 regions into four (4) distinct clusters. The first cluster constituted nine regions; second cluster had four regions; the third had seven regions and fourth cluster had only one region. There are variations between those clusters with cluster four (Dar es Salaam region), with minimum dropout and cluster two (Kilimanjaro, Mbeya, Arusha and Iringa regions) with minimum enrolment among all clusters. The study concluded that the number of enrolment by cohort, dropout, gender, regional per capital GDP, and time in years can be used to classify regions into four distinct clusters. However, among the factors associated with the number of enrolment and dropout in the COBET centre; time in years, cohort (age) and clusters were statistically significant at 0.05 level of significance. This study recommends that new plans should be initiated based on these classifications in order to make this programme sustainable and set the new tracking system for follow up of COBET students after completing their studies.


INTRODUCTION
The Sustainable Development Goals (SDGs) emphasize estimates using national aggregates as well as variations across different population defined by group and individual characteristics aggregates.According to the fundamental policy, non-formal education is generalized as out of school education, distinguished from formal education which is obtained in schools.However, either type may include, at certain stages, some aspects of the other.However, due to the existence of over aged children who cannot be enrolled in formal primary school, Complimentary Basic Education was established.
According to UNICEF (2006), complementary basic education in Tanzania (COBET) or its Kiswahili equivalent MEMKWA) was a program initiated in 1999 to provide opportunity for the acquisition of basic education to out of school children aged between 8 to 18years.The research of Johnson et al. (2005) show that, this program was initiated with a special focus on girls, orphans and vulnerable children following a specialized three-year course of study.
In Tanzania as it's in most Sub-Saharan African countries, priority in education is favourable to male children compared to female.UNICEF (2015) report revealed that female children continue to have severe disadvantage of being excluded in education systems despite recent year's positive progress.According to Segumba (2015), dropout is also highest to female compared to male children.The UNICEF report of ( 2006) defined Orphans and Vulnerable Children (OVC) as those children at risk of missing school, from households with poor food security, suffering from anxiety and depression, and who is at higher risk of exposure to human immunodeficiency virus infection and acquired immune deficiency syndrome (HIV/AIDS).
According to Segumba (2015), there were increased number of out of school children caused by dropout due to sickness, pregnancy, lack of food in the household, forced labour, fear of teachers, excessive corporal punishment, overcrowded classrooms, ineffective teaching, persistence poor performance, long distance from school, lack of food provision in school and poor administration.These lead the demand side for COBET tobecome higher than the availability of centers, which are known to suffer from limited resources and as a result most of them are closed.On the other hand, dropping out of school has emerged as a major threat to achieving Education for All (EFA) goals.This is because it threatens the very fabric of education in terms of inputs/outputs of its structure, organization and provision.
That why COBET assessment was vital.Based on the work of Ngodu (2010) dropout rate was highest in Standard/Grade III-IV in the year 2008/9.An average trend of drop out increased from 3.4% in 2005/6 to 3.7% in year 2008/09.Most of the studies conducted in COBET were focused in piloted districts, and were applied using qualitative techniques and selected small sample with no statistical justification examples Levira (2002) and Michael (2008), therefore this study employs quantitative methods.Based on the UNICEF report (2015) there is relationship between education and regions example in sub-Saharan Africa there is lowest gender parity proportion compared to all other regions.
According to Tanzania development report (2014), Kilimanjaro, Arusha and Dar es Salaam regions are more developed compared to all other regions.Likewise according to the Tanzania Development Report (2014) and UNICEF report (2011) dropout and enrolment varies regionally.

Edwin 495
This therefore, calls for this study to assess the number of students enrolled across the regions by triangulating COBET data (that is, dropout, age, year, gender and enrollment) and regional per capital Gross Domestic Product (GDP).The study also identifies the factors associated with COBET enrollment and dropout so that policy makers can be aware and hence take necessary measures to reduce dropout and increase enrolment.

Research objectives
The main objective of this study was to triangulating Tanzania regions based on the number of COBET enrolment by cohort or age, dropout, gender, year, and regional per capital GDP.Moreover, the study examined factors associated with the number of enrolment and dropout in the COBET centers from secondary data collected from Ministry of Education and Vocation Training (MoEVT) and regional per capital GDP collected from National Bureau of Statistics (NBS).

MATERIALS AND METHODS
Given the diversity of the student populations' needs, as well as teachers' availability, a country-wide evaluation was performed to classify regions based on total enrolment by cohort or age, dropout, year, gender and regional per capital GDP from 2008 to 2012.Cluster analysis was used to classify regions based on similar characteristics.The clusters formed were then analyzed to identify the variation between them and factors associated with enrolment and dropout were then performed.

Summary statistics
Table 1 reveals that both enrollment and dropout for both male and female were declining with time being highest in the year 2008 and lowest in 2012.Although according to Johson et al. (2005).COBET were introduced to favour girls compared to boys the results shows that in all years male (boys) enrollment were higher compared to girls.

Cluster analysis
The main objective of conducting cluster analysis is to discover natural groupings of the items or variables.Hierarchical cluster analysis is the major statistical method for finding relatively homogeneous clusters of cases based on measured characteristics.It starts with each case as a separate cluster, i.e. there are as many clusters as cases, and then it combines the clusters sequentially by reducing the number of clusters at each step until only one cluster is left.
According to Antonenko et al. (2012), cluster analysis is an important technique used for examining data in educational research.The prepared by Johnson and Wichern (1992) shows that the data are grouped on the basis of similarities.In its most general  Romero and Ventura (2007).
Cluster analysis and k-means analysis can be used as data mining techniques.The area of application can be education, different from the usual data mining studies.The research conducted by Erdoğan and Timor (2005) which illuminated on clusteranalysis reveals that use of this technique in education may provide us with more varied and significant findings, and may lead to the increase in the quality of education.
For this study, cluster analysis was used to classify regions with similar characteristics in terms of COBET enrolment by cohort, year, dropout, and per capital regional GDP.Therefore, the clusters formed were further analyzed to determine the variation between clusters as well as if a cluster is one among determinant of COBET dropout and enrolment.

Poisson regression model
In Basic Education Research, one often encounters situations where the outcome variable is numeric, but in the form of counts.The Poisson regression models are often used to model count data.The book prepared by Kutner et al. (2005) shows, Poisson regression models are appropriate for count data because they use probability distributions for the dispersion of the dependent variable scores around the expected value for dependent variables which take on only nonnegative integer values.
Also, Daniel (2008) supports Kutner et al. (2005) idea that, Poisson varieties can take any non-negative integer value.The Poisson-regression model is a nonlinear model for the expected response whereby the expected response is a count.The Poisson distribution is characterized by a parameter  whereby the probability that variable Y equal to variety y is given by; (1) The Poisson mean in GLMs is commonly modelled using a log-link, . For this model, the mean satisfies the exponential relationship: In this study, robust standard deviation had being used as being recommended by Cameron and Trivedi (2009).

FINDINGS AND DISCUSSION
The main attempt of this study was to classify regions based on common characteristics.The hierarchical cluster analysis was performed, by using ward's method to partition the data points into disjoint groups.This implies that, data points belonging to same cluster are similar while data points belonging to different clusters are dissimilar.
According to Dymnicki (2011), ward's method is one of the clustering methods which use centroids to represent clusters by optimizing the squared error function.In this analysis, dendrogram is presented to visualize the clusters formed based on regional per capital GDP, enrolment and dropout rate at regional level.
Figure 1 illustrates four distinct clusters formed.The distance represents variation between clusters.At first, cluster 1 and 2 were merged at approximately Euclidian distance of 4 units then the two clusters, (clusters 1 and 2) merged with cluster 3 at approximately Euclidian distance of 8 units and lastly clusters 1, 2 and 3 merged with cluster 4 at approximately Euclidian distance of 25 units.Thus, there was a great variation between the first threeclusters (clusters 1, 2, and 3) in comparison with cluster4 (Dar es Salaam region).There was a difference of 17 units when the first three clusters werejoined with the fourth while Euclidian distance which merged together the first three clusters (1, 2, and 3) was 8 units.
The distribution of 21 regions was categorized into four clusters (Figures 1 and 2).The first largest cluster consists of 9 regions, namely Manyara, Mara, Lindi, Mtwara, Ruvuma, Morogoro, Rukwa, Tanga and Mwanza.The second cluster consists of 4 regions, namely Kilimanjaro, Mbeya, Arusha, and Iringa.The third cluster which was the second largest consists of 7 regions namely Dodoma, Singida, Kagera, Shinyanga, Kigoma, Pwani and Tabora.The last cluster which was the smallest consists of only Dar es Salaam region.These classifications put Dar es Salaam region in its own cluster.This is because Dar es Salaam is the region leading in per capital regional GDP and highest literacy rate.Also the first, second, and third clusters show that there is close relationship between regions belonging to the same cluster in terms of regional per capital GDP, literacy rate, females and males literacy gap, and urbanization as also being supported by NBS (2011) Tanzania mainland report.This classification results were also supported by Ministry of Education and Vocational Training report (MoEVT) tests which showed Dar es Salaam to be different in terms of illiteracy rate, female male literacy gap and poverty levels when compared to other regions.

Within-and between-group characteristics
Through Ward's method, the distances between clusters were examined.According to Loureiro, Torgoand Soares (2004) work, cluster analysis is also known as a method for outlier detection, where all the variables from the original data set are used for the description.Clusters summary statistics which shows variations between them are presented in Figure 3 and Figure 4.
Based on the clustering features, the smallest cluster which consists of Dar es Salaam is referred to as an outlier.In explaining the characteristics features related to this outlier, some factors were suggested by UNICEF ( 2006).These factors include per capital income (deepening poverty), weather condition, food insecurity, migration, lack of enough education facilities such as books, facilitators, classrooms and desks as well as willingness of the guardians to take their children to the Complimentary Basic Education centers before or even after joining the centers.Moreover, urbanization was also suggested as the cause of an outlier by United Republic of Tanzania Vice President's report (2005).
Figure 3 shows that enrolment for clusters 1 and 2 declined from 2008 to 2010 whereas for the year 2011 moved upward.Cluster 3 enrolments were decliningin all years.Unclear pattern was observed in cluster 4. Figure  4 reveals that the dropout rate was declining over years for all four clusters from one year to another.

Multivariate Poisson regression model for enrolment
The preliminary analysis was done to check the relationship between each predictor with the response.
All covariates were significant at 5% level .
Therefore all covariates were included in multivariate analysis.Table 1 presents the parameter estimates together with standard error of the final model.
The result shows that clusters, cohort, regional per capital GDP and time in years were significant predictors of enrollment (p<0.05)whereas gender was not (p>0.05).The mean number of COBET enrolment varies from one cluster to another.Controlling the other covariates in the model, the mean number of enrolment for cluster 3 (0.3798) was highercompared to cluster 1.But the mean number of enrolment for cluster 2 (-0.6682) and cluster 4 (-1.2618) were lower than that of cluster 1.
The other significant predictors for the mean number of enrolment were cohort.The model shows that the mean number of enrolment increases with increase in cohort.The result also shows that COBET enrollment increases with increase in regional per Capital GDP.In case of time in years, parameter estimate was -0.164, hence mean COBET enrollment decreases with increase in years.However gender was not statistically significant (p>0.05) the coefficient for females is -0.7169 (negative), and the male students group was taken as control group.This implies that the mean enrolment for males was higher compared to that of females (Table 2).

Multivariate Poisson regression model for dropout
The preliminary univariate analysis was also done to check out the relationship between each predictor with the response.All covariates were significant at 5% level .All covariates were included in Multivariate analysis.In addition, interactions between gender and cluster and between year and gender were associated to mean number of dropouts.However, some interaction effect between cluster and year for individual clusters, were also included in the model because the overall effect was significant and the model converged.
Table 3 presents the parameter estimates together with robust standard error of the final model.The result shows that the mean number of dropouts varies from one cluster to another.Controlling the other covariates in the model, the mean number of dropout for cluster 2(-0.5014),cluster 4 (-1.4856) and were lower as compared to cluster 1, whereas that of cluster 3 (0.2439) were higher compared to cluster 1.The effect of cluster on mean number of dropout depends also on gender.The other significant predictors for mean number of dropout were cohort and years.
The model shows that the mean number of dropout decreases with increase in years while it increases with cohort.However gender was not significantly associated with mean drop out, the coefficient for femaleswere-51.58(negative) and since males were taken as a control group, this implies that the mean dropout for males is higher compared to that of females.However dropouts for both males and females decrease with increase in years.The results also reveals that mean dropout was not related with regional per capital GDP (p>0.05)

Conclusions
On the basis of the research findings, the following conclusions have been made; that the 21 regions of Tanzania Main land can be grouped into four dissimilar clusters of regions.There are variations between those clusters with cluster four (Dar es Salaam region), having minimum dropout and enrollment compared to all others.Also, this study concluded that, based on the result of Poisson regression model the significant predictors for enrolment and dropout were the same except regional per capital GDP which was significant predictor for enrolment but not dropout.The significant predictors were time in year, cohort (age) and clusters.

RECOMMENDATIONS
This study found out that there are variations between clusters identified.Therefore evaluation for the program as its more than 15 years since its establishment may be vital, in order to identify if objectives of its establishment have been attained.More researches should be done on how COBET can be sustainable programme as there is still dropout in formal school since availability of these schools will make it possible for those dropouts to have another chance for schooling.

Figure 1 .
Figure 1.Ward's Linkage Dendrogram showing four clusters of regions.

Figure 2 .Figure 3 .
Figure 2. Map showing distribution of four clusters of regions.

Figure 4 .
Figure 4. Mean regional dropout for the four clusters.

Table 1 .
Complimentary basic education enrollment and dropout in relation to gender.

Table 2 .
Parameter estimates and robust standard error of multivariate Poisson model for enrolment.
*Indicates reference category.

Table 3 .
Parameter estimates and standard error of multivariate Poisson model for dropout.
*Indicates reference category.