Coastal climate and beach dynamics at Ponta do Ouro, Mozambique

Annual field surveys conducted at Ponta do Ouro in Southern Mozambique have determined that coastal variability is driven by high wave energy and consequent northward longshore drift. There is a log-spiral headland-bay system marked by 80 m tall forested dunes in the south that give way to a broad flat sandy beach. Marine climate processes that affect coastal erosion and accretion are studied using local and remote data sets. Since surveys began, the climate has undergone a prolonged dry spell (2002-2007) followed by increased run-off and easterly winds (2010-2013) that have re-built the beaches. Sand grain sizes vary from 240 410 m (coarser south – finer north) and are mobilized by frequent longshore wind events > 10 m/s. Ocean drifters reveal a northward current in the outer surf zone of 0.6 m/s and an onshore gyre of 0.1 m/s in the recessed bay. Wave-driven sand transport is estimated at 5 10 6 kg/yr/m. While the upper beach has flattened due to pedestrian traffic and urban development, the lower beach has recovered from erosion due to a greater frequency of easterly waves and rainy weather.


INTRODUCTION
Geological history, wave characteristics, tidal range and littoral drift are responsible for shaping coastlines and associated hazards such as instability, erosion and inundation (Kaliraj et al., 2013).The bathymetry and shelf slope influence sediment deposition (Trenhaile, 1997;Ridderinkhof et al., 2000), while the concentration of suspended sediment is related to wave energy, depth of surf zone and longshore wave-driven currents (Christie and Dyer, 1998); and can be monitored by satellite imagery (Bassoullet et al., 2000;Lee et al., 2004).Wave propagation over a narrow shelf results in high energy on the coastline and greater mobilization of sediment (Carter et al., 1990;Frihy and Lotfy, 1997;Lacey and Peck, 1998;Manson et al., 2005).Changes in the width of beach dunes can be traced to natural forces and human activities that deplete coastal vegetation, particularly compaction at access points (Maktav et al., 2002).Seasonal to multi-annual variations in wave energy produce cycles of shoreline erosion and accretion (Wright and Short, 1984;Chauhan et al., 1996;Benumof et al., 2000;Georgiou and Schindler, 2009;Saravanan et al., 2011;Smith et al., 2014).Rapid changes happen with storm surges and tourism development (Chandrasekar et al., 2000;VanRijn 2009).
Log-spiral bays are found on many of the world's wavedominated coastlines (Carter, 1988).They consist of an up-coast headland with a hook shape and a down-coast open bay, whose processes transition from dissipative to reflective beach stages associated with coarser grain size and steepening slope.Silvester (1984) found that the logspiral shape depends on the prevailing oblique wave energy and distance between headlands.Large log-spiral bays like Algoa on the south coast of Africa have wavedriven longshore transport of 1 m/s on the headland and 0.1 m/s in the leeward bay under prevailing westerly winds (Goschen and Schumann, 2011) that induce longshore sand transport of 3×10 5 m 3 /yr.Sediment suspended by breaker turbulence is pushed along by the current, in conjunction with bedload transport.Sand also moves on the beach face by swash and wind, accounting for ~15% of the total (Silvester and Hsu, 1993).
Sandy beaches backed by elevated dunes are a prominent feature of the coastline between South Africa and Mozambique 26-27.5°Sand 32-33.5°E.Coastal sediments are mobilized by wave-driven longshore currents on the outer edge of the surf zone.Sediment available for reworking into coastal landforms is derived from previous geological fluctuations and shelf deposits.Variations in sea level have left a system of inland lakes and wetlands behind a line of recently formed sand dunes.Small rivers drain into these wetlands, so the main source of coastal sediment is from distant South African rivers such as the Umfolozi and Tugela.Northward near-shore currents cast these sediments toward Mozambique.The warm Agulhas current ensures coral reef growth, tropical fisheries and comfortable air and sea temperatures ~ 25°C.Yet the weather is changeable due to marine storms which pass at regular intervals even in summer (Tinley, 1985).The lower beach experiences a tidal range of ~ 1.8 m and wave action from the southeast (Figure 1a).Sand bars line the coast < 100 m offshore and create breaker zones linked by rip currents.
The warm eastern seaboard incubates a great diversity of marine and terrestrial life.About 10 million people live within 10 km of the coast from Maputo to Durban and there has been a coastward shift in economic production over the past 30 years.Coastal tourism plays a growing role (⅓) in Mozambique's $15 B economy.The coastal zone is urbanizing so an understanding of its dynamic nature is essential for effective management under rising sea levels (~0.17 cm/yr).Our focus here is on Ponta do Ouro at the Southern border of Mozambique where a prominent headland diverts the winter waves.The town was deserted in the 1975-1990 civil war, but has since recovered with the advent of coastal tourism.People flock to the beaches for recreation and diving on the coral reefs (~ 100 000 times/yr), bringing economic benefits and environmental consequences (Bjerner and Johansson, 2001;Perry, 2001;Jury et al., 2011b).
Our study region is exposed to passing storms from the Indian Ocean and Mozambique Channel.There are no weather stations or oceanographic moorings, so a variety of secondary datasets are used to describe regional conditions, supplemented by in-situ field data collected as part of a long-term coastal monitoring project.One aim of this work is to support sustainable development through generation of knowledge on the coastal dynamics; specifically to study the apparent cycle of coastal erosion and accretion and its relation to changes in coastal weather and storm-induced waves.

Background
Ponta do Ouro is located at 26.84°S and 32.89°E (Figure 1a, 4a) atop low sand dunes with an easterly view of the Indian Ocean.The area of the town is ~ 10 km 2 , situated 120 km south of the capital city of Maputo.The sandy soils drain quickly and have little organic content for crop production (Table 1b).The southern coastal dunes have gradients near 35%, while the town is relatively flat.The vegetation is sub-humid savannah comprised of dry coastal forests that transition abruptly to prairie grasslands.The farming capacity is limited by low soil moisture; consequently most food is imported.
Until 2000 the natural landscape was conserved except for cropping around the wetlands and some deforestation for fuel-wood collection (Faria and Sitoi, 1996).Residential developments have since spread over the grassland near the point.There are no rivers in Ponta do Ouro only two small lakes ~ 2 km inland.Currently there is no water distribution system; water is drawn from wells of reasonable quality.In pre-war surveys there were over 5000 species of animals recorded in Southern Mozambique (Faria and Sitoi, 1996).The biodiversity has largely recovered since the civil war, despite a dysfunctional municipality.Ponta do Ouro's population has grown from 1600 in 2005 to 3000 by 2014, with rural development spreading inland and tourism along the coast.
Ponta do Ouro extends prominently ~ 1 km seaward as a narrow band of 80 m forested sand dunes in the south that give way to 10 m sparsely covered dunes in the recessed bay.Sheets of sand penetrate the dune forest on the south side of the headland, and are continually revegetated.The warm waters, tropical fishery and coral reefs attract divers (Bjerner and Johansson, 2001) and tourism resorts have proliferated in the leeward bay.

DATAS AND METHODS
Annual field surveys have been made since year 2000 to understand the coastal dynamics and resource use via photographs, opinion polls and objective data collection.Low-tide beach profiles have been surveyed by theodolite in the bay next to the headland and at the hotel further north, so that evolving beach profiles can be tracked.Google digital-earth scenes over the past decade were viewed to delineate the beach edge.Sand samples of ~ 200 g were collected twice on the lower beach and foredune.Samples were dried and sieved at ranges from 1000 to 50 m and grain size distribution was analyzed.Nutrient analysis was done to determine the proportion of P, Ca, K, Mg, pH, NO3, C, cat-ions and moisture available.Littoral zone currents were monitored using a drifting drogue launched in the surf and tracked at one minute intervals using two theodolites on the foredune.Drifter tracking was repeated on many occasions in different seasons and frequency histograms were calculated.Local wind patterns were observed during two surveys and composite vectors were calculated for prevailing directions: 1. onshore, 2. southerly and 3. northerly.Continuous monthly discharge of the Umfolozi River 185 km to the south was obtained from the South African Dept of Water Affairs website.
Apart from in-situ surveys, regional characteristics (26-27.5°S and 32-33.5°E)were analyzed at 5 km resolution from NASA satellite derived surface temperature, rainfall (Joyce et al., 2004) and marine suspended sediment (or water clarity, euphotic depth).Spatial maps were averaged over the years 2000-2013 and monthly time series were extracted for the grid point at Ponta do Ouro.Coastal weather was analyzed for surface wind and moisture via Coupled Forecast System (CFS) model reanalyses at 20 km resolution (Saha et al., 2010).Wave generation was studied using histograms of daily CFS marine wind data sorted for easterly (-U) and southerly (+V) gales, and the associated composite weather maps were plotted.Ocean wave data were analyzed from the South African Data Centre for Oceanography ship database for a grid point off Ponta do Ouro, and from European Community (ECMWF) model reanalysis per direction sector, as height -period histograms (Sterl and Cairns, 2005).Ocean reanalysis mean maps and frequency histograms were calculated using the 8 km hybrid  (Chassignet et al., 2009) based on Navy coupled data assimilation (since 2005).Long-term sea surface height trends were analyzed from the nearest gauges together with ship-based sea surface temperature anomalies.The results are given in sequence from regional-to micro-scale; some figures are called out of order to maintain the flow of discussion.

Marine climate pattern and trend
Figure 1a to d summarizes the coastal climate.The topography consists of a 70 km plain of lowlands < 100 m elevation.The coastline near the border is slightly convex and thus exposed to the marine weather.The wind speed pattern (Figure 1b) is dominated by a strong gradient formed by high speeds (7 m/s) offshore and low speeds (3 m/s) inland.Marine suspended sediment is relatively low (Figure 1c), but there is a coastal strip ~10 km wide with values up to 0.3 kg/m 3 that pass northward with wave-driven currents.The satellite rainfall map (Figure 1d) reveals a marked dry zone < 1 mm/day that extends along the coast north and south of Ponta does Ouro.Further inland and offshore rainfall increases to 2.5 mm/day.The low rainfall relates to accelerated winds over the convex coastline.
Mean maps and histograms from the Hycom 8 km ocean reanalysis are analyzed in Figure 2. The frequency distribution of daily longshore currents is Gaussian with a 45% occurrence of northward flow, and median near zero (Figure 2a).Southward currents are equally likely outside the surf-zone.The mixed layer depth near the coast averages ~20 m, and its histogram has an inverted Gaussian distribution with many shallow and deep cases due to calm and stormy weather (Figure 2b).The mean map of temperature reveals a 15 km wide coastal strip of 24°C water; values increase offshore to 25°C (Figure 2c).
Mixed layer depths (Figure 2d) are greatest in the 10-50 km coastal margin, reflecting stronger winds there.Meridional currents at the coast are weak (Figure 2e) and there is a coastward reduction in onshore flow (Figure 2f) that characterizes the land-sea interface.The onshore flow traps sediment in a narrow coastal corridor, as seen in Figure 1c.
Monthly time series since 2000 are given in Figure 3ac.The water budget from CFS reanalysis (Figure 3a) indicates that 2000 year was wet, but 2002-2005 were dry with little run-off.Soil moisture remained near 200 mm (10%) from 2002 until 2010.Thereafter, many rainy spells induced greater soil moisture, run-off and Umfolozi River discharge in Jan 2011 and Oct 2012 -Jan 2013.This put increased sediment into the coastal zone.Spectral analysis of Umfolozi discharge anomalies back to 1948 (not shown) indicates cycling at 2-4 and 11 years, consistent with Corbello and Stretch (2012b).
Satellite estimated land and sea temperatures (Figure 3b) exhibited regular annual cycling.Land temperatures rose quickly in early summer, while sea temperatures peaked in late summer.The seasonal amplitude exceeded inter-annual variance in the period of study (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013).The marine euphotic depth or seawater clarity varied from 100 m in summer to 50 m in winter (Figure 3b) as storm generated waves and associated near-shore currents stir the sand.Of all the variables reviewed, land surface temperature is best correlated with euphotic depth (r = +.67 for N=168), indicating that warm weather is associated with clear water.

Winds and waves
Surface winds (Figure 3c) are plotted for offshore and inland points.The U time series exhibits a strong coast to inland gradient, and was most negative (onshore from    6b generates strong local winds that converge leeward of the headland (Figure 6e).ECMWF wave climate histograms are analyzed for the three directions (Figure 6a to c).ESE waves are characterized by 1-3 m height and 5-6 s period, SSW waves are of longer period and greater height, while NNE waves are of short period (4-5 s).Each wave regime generates different coastal responses.The short period waves from NNE and ESE dampen the longshore currents, trapping sediment and building the beaches.But long period waves from the SSW have a deeper reach and put sand in suspension for northward export.Wave production is constrained by Madagascar to the northeast, eastward storm movement and minimal heat flux except during southerly winds (Figure 6c).
The north-easterly scenario offers a wind-blown dimension.Wind speeds > 10 m/s capable of mobilising sand of ~ 300 m grain size (Kaczmarek et al. 2005)   easterly directions (Figure 6a,d) the flow splits and winds decline at the coast, aiding sand deposition.

Coastal dynamics
Sand samples from Ponta do Ouro have grain sizes in the range 240 -410 m.Coarser sediments are found south of the point, finer sediments are seen in the recessed bay (Figure 7b).Wave-induced longshore drift (Figure 8a) creates a sandbar northwest of the point which acts as a sediment bypass and creates a fine wave for recreational surfing.The seaward excursion of the current across the headland leaves an onshore gyre in the recessed bay, where beach and dune sedimentation are de-coupled (Psuty, 1992).Further north the beach widens and coastal dunes grow (Figure 4c).
Offshore coral reefs are too deep to cause refraction or dissipation of wave energy around Ponta do Ouro, so back-beach berms and cliffed foredunes are common.Concave sections of dune face, exposed headland reef, chunks of sandstone rock on the beach (Figure 5b), and beach profiles (Figure 7a) give evidence of coastal recession and recovery from 2005 to 2013 depending on the marine climate (Table 2).

Sediment budget
Littoral transport processes are dependent on wave climate and sediment characteristics (Miller, 1999).These data are available from climatological sources and project information (Figure 3a to c, 7b).The wave climate in summer-winter consists of 140-170° direction, 1.6-1.9m height, and 6.5-7.8 s period.Ocean swells > 4 m occur ~ 10% of the time and correspond with cyclonic storms (Corbello and Stretch, 2012a).Surf zone currents then reach 1 m/s, whereas during anticyclonic intervals the currents subside.The mean wave energy is calculated as E = ∫ Ϸ (H 2 T sin ɵ), integrated over frequency distribution (Ϸ), where H is wave height, T is wave period, and ɵ is the diffracted wave angle (~30°).A value of 9 kW/m is calculated at Ponta.Like many southern hemisphere locations, high wave energy is maintained throughout the year.About 85% of longshore sediment transport takes place in the shallow surf zone (Aijaz and Treloar, 2003).
Littoral transport is estimated as in VanWellen et al. (2000) and Esteves et al. (2009): Q s = ∫ Z (V*C), where V is the longshore current and C is sediment concentration (~ 0.3 kg/m 3 ± 0.1).Drifter results at Ponta indicate V ~ 0.6 m/s ± 0.2 (Figure 8a), similar to the theoretical longshore current (Kaliraj et al., 2013): V = 20(s)(gH) ), H is diffracted wave height (~1 m) and ɵ ~30°.From these inputs, it is estimated that ~ 5 10  sand bypass and human activities.On-going surveys in 2013 found the lower beach had regained mass while the upper beach had flattened (Figure 7a).

SUMMARY
Coastal dunes are mobile features, yet town planners and managers often view the beach as static.The headland at Ponta extends < 1 km seaward and shelters a bay from wave action and northward drift.Our results show that easterly waves which favour accretion increased from 2011-2013.Southerly waves which favour erosion, were most frequent around 2005.Northerly winds suppress longshore currents and build the dunes.
The natural quasi-decadal cycle of beach erosion and accretion is amplified because dry spells coincide with southerly waves, while wet spells (river sediment) coincide with easterly waves.These assertions require data analysis over another decade to understand wavesediment relationships in the context of storm events.There is on-going human pressure on marine resources despite the partial recovery of tidal fauna after beach driving was banned in 2003.Ponta do Ouro is the only town on this coast that supports waterfront housing and tourist facilities.Development there is at risk of beach recession from inadequate planning and storm surges.Offshore waters are warming (+.01C/yr) and sea levels are rising (Figure 7c) so impacts are on-going.A pedestrian boardwalk along the foredune is recommended to limit erosion.The management of eco-tourism development should be guided by scientific insights and serviced by local government, to sustain this beautiful and dynamic coast.

INTRODUCTION
Data mining is the process that attempts to discover patterns in large data sets.Distributed Data Mining (DDM) (Datta et al., 2009) is one of the important and active areas of research due to the challenges and applications associated with the problem of extracting previously unknown knowledge from very large real-world databases.Document clustering groups similar documents into a single cluster.To cluster documents accurately the similarity between a pair of documents must be defined (Anna Huang, 2008).The quality of information retrieval in both centralized and decentralized environments can be improved by using an advanced clustering framework (Khaled et al., 2009).Distributed document clustering algorithms perform clustering based on the availability of the distributed resources (Eshref et al., 2003).Along with the recent advances in algorithmic and conceptual changes an advanced clustering framework is needed for processing large amount of distributed document datasets.The reason is due to the decentralization of huge volume of documents to be *Corresponding author.E-mail: judithjegan@gmail.comAuthor(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License processed (Datta et al., 2009) and the inability of the large amount of documents to be processed by central supercomputers.Centralized data warehouse based mining cannot scale (Khaled et al., 2009) to that extend.Storage and processing of mass documents in a distributed environment can solve this problem.It is difficult to handle problems like data distribution, fault tolerance and system communication that occur in such parallel and distributed environment.In order to solve these problems, new tools, technologies and frameworks for distributed processing (Surendra and Xian-He, 2011) are emerging.One of the most popular of these emerging technologies is Hadoop, an open source software framework for handling large amount of distributed data in distributed environment.

Distributed document clustering
Recently different distributed document clustering algorithms have been proposed to cluster documents from distributed resources.The main objective of these algorithms is to move the computation (clustering algorithm) to the documents in each node of distributed site instead of moving all the documents to a central node and then performing the computation.A local model is computed by applying clustering algorithm to each node and is aggregated to produce optimized clusters.The issues in the distributed clustering algorithms can be categorized in to algorithmic issues and implementation issues.Recently many conceptual and algorithmic changes (Yang et al., 2006;Datta et al., 2009;Khaled et al., 2009;Odysseas et al., 2011;Hu et al., 2013) have been made to these traditional clustering algorithms by adding many concepts like fuzzy theory, swarm intelligence, genetic algorithms, ontology, wordnet, word sense disambiguation and many more to increase the efficiency and quality of the algorithm.The implementation issues are related to the distributed environment in which the distributed document clustering algorithms are studied.An exhaustive review on recent research of distributed document clustering algorithms for distributed environments like peer-to-peer networks is emphasized with top concerns to the clustering quality.Datta et al. (2009) proposed an approximate P2P K-means algorithm which requires that each node must synchronize only with the nodes that is connected directly to it.This is more effective for a dynamic network but still it is sensitive to the distribution of documents to each peer.Quality of clusters generated is of concern.Hammouda and Kamel (2009) proposed a hierarchically distributed peer-to-peer clustering algorithm (HP2PC) for large P2P systems.In this work K-means algorithm is applied to the data in each peer node to generate a set of centroids that are passed to the neighbors until it reaches the supernode which contains the centroids of the whole dataset.Scalability is of concern as more and more nodes are added to the network.The authors determined the clustering quality based on the skewness of similarity histograms of the individual cluster which deteriorates with increase in hierarchy.Decentralized Probabilistic Text Clustering for peerpeer networks is proposed by Papapetrou et al. (2012).This work uses a probabilistic approach using Distributed Hash Table (DHT) for assigning documents to clusters which increases the scalability algorithm but there is a decrease in speed-up with increase in dataset.Clustering quality is of concern in this work.Thangamani and Thangaraj, (2012) proposed an effective fuzzy semantic clustering scheme for decentralized network through multi domain ontology model.The results show better clustering results using fuzzy concept with semantic concepts like ontology but still there are scalability issues.Thus the issues identified by the clustering algorithm in the peer-to-peer environment are scalability, speedup, and distribution of input data.This can be overcome by making possible tweaks in implementation of the algorithm on distributed environment where it is studied.Also this traditional distributed computing approach might not be able to meet the next generation requirement of distributed processing.Many easy-to-use distributed processing tools have evolved to handle the drastic increase in data.Hadoop MapReduce framework (Wan et al., 2009;Lei Qin et al., 2011;Ping et al., 2011) can be used for distributed computation which overcomes these issues to improve the algorithmic performance.The proposed work overcomes the implementation issues identified in peer-to-peer environment using a scalable tool for distributed processing called Hadoop.A review was meticulously carried out to mine knowledgeable data using Hadoop.
MapReduce based distributed Latent Semantic Indexing (LSI) and K-means for document clustering is proposed by Yang et al. (2010).It provides comparison with standalone LSI and distributed k-means LSI using socket programming.The result shows great improvement in speedup and scalability.A distributed MST (minimum spanning tree) algorithm based on MapReduce programming model was proposed by Kehua et al. (2012).A distributed MST text clustering algorithm is designed and implemented and its performance is compared.The speed-up of the algorithm can still be improved.MapReduce based particle swarm optimization clustering (MR-CPSO) algorithms is proposed by Ibrahim et al. (2012).Clustering is considered as optimization problem which is used to find the best solution.The results show that the scalability of MR-CPSO is high when there is an increase in dataset size.Liu and Ge (2012) proposed a MapReduce based name disambiguation system.The document clustering task is parallelized by dividing data to a number of maps and reduces and disambiguation is performed using LSI.Hu et al. (2013)  To the best of the authors none of the algorithms consider the hybrid of MapReduce based PSO-KMeans-LSI that improves the quality and performance of clustering.The proposed work aims in improving the speed-up and quality of clustering algorithm.

Particle swarm optimization algorithm
Clustering is considered as optimization problem using PSO.It is used to find optimal cluster centroid rather than finding optimal partition.These optimal centroids are found for minimizing the intracluster (within) distance as well as maximizing distance between clusters.PSO performs globalized searching (Ibrahim et al., 2012) in order to determine optimal centroids.PSO algorithm is based on social behavior of birds flocking.Birds in a flock are represented as particles.Each particle is considered as a document.A particle contains information like location and velocity.A particles location represents one solution.A new solution is generated when the particle moves to a new location.This new solution is evaluated using fitness function in Equation (1), which is the average distance between document and cluster centroids.


 Where d (ti, nj) is the distance between document nij and the cluster centroid ti., Pi is the document number, Nc is the cluster number.The velocity and position of new particle are updated based on the following equations: This process is repeated for maximum number of iterations.The optimal centroids are generated using this method.

K-Means clustering algorithm
K-Means algorithm is sensitive to the selection of initial cluster centroids and uses these centroids for maximizing intra-cluster similarity (within) and minimizing inter-cluster similarity.It performs localized searching to determine the initial centroids (Wan et al., 2009).K-Means clustering uses randomly generated seeds as initial cluster centroids.Each document is compared to all the cluster centroids (Datta et al., 2009).The document is assigned to the cluster based on the similarity (Anna, 2008).The Jaccard similarity measure used is described as: Where ta and tb are n-dimensional vectors over the term set.It compares the sum weight of terms shared to the sum weight of terms in any of the two documents but is not the terms shared.The cluster centroids are recalculated as the mean of the document vectors that belong to that cluster using the following Equation ( 5): Where nj is the number of document vectors that belong to cluster Qj and dj is the document vector that belong to Qj.

Latent semantic indexing (LSI) algorithm
LSI is a method of dimensionality reduction that can improve the efficiency of clustering.
which is the least square best fit approximation of matrix A with k singular values.The given number of dimensions is the k singular values.
Using SVD on PSO-KMeans (PKMeans) clusters enhances the performance by capturing the important semantic structure in the association of terms and therefore reducing the dimensionality.

OVERVIEW OF THE PROPOSED METHODOLOGY
The different steps followed in this proposed methodology are summarized as: 1. Choosing a corpus of documents.
2. Preprocessing the text documents and Vector Space Model representation based on MapReduce.

Document preprocessing and representation based on MapReduce
Document preprocessing and vector space representation is done using MapReduce framework for efficient representation of the documents.It takes a set of input plain text document and transforms it in to a form (Datta et al., 2009) to be included in the vector space model.These preprocessing steps are performed in parallel using MapReduce programming methodology.Some common words like stopwords are removed.Stemming is done to reduce words to their base form or stem.Porter's algorithm (Porter, 1980) is the defacto standard used for stemming.In order to represent the documents using Vector Space Model (Salton et al., 1975), documents have to be transformed from full text version to document vector which describes the content of the document as a vector.Each document is represented by a vector d = tf1, tf2… tfn, where tfi is the frequency of each term (TF) in the document.The tfi*idfi representation of the documents is done in parallel using MapReduce methodology.
In order to represent the documents in the same term space, the number of times term appears in a given document, number of terms in each document, number of documents in which the given term appears and the total number of documents are determined.Thus, each component of the vector d now becomes tfi*idfi.This is represented on a document-term matrix (Jianxiong and Watada, 2011).This represents the term weight of the document.The frequency of a term t in the document d gives the term weight of the document d in a collection of documents D that is described as: Where df(t) is the frequency of documents in which term t appears and tf (d,t) is the frequency of term t in document d.

Proposed distributed document clustering algorithm based on MapReduce
The proposed algorithm is based on MapReduce methodology on Hadoop framework.It provides the ability to transparently distribute the documents (Lei et al., 2011;Ping et al., 2011) to one or more entities and apply operations to each subsets using Hadoop.The proposed algorithm consists of two phases of MapReduce operations.The Phase I of MapReduce operation is for the generation of optimal centroids using PSO, whereas Phase II of MapReduce is for the purpose of KMeans clustering using PSO generated centroids.Figure 1 depicts the complete methodology of the proposed distributed document clustering algorithm.Latent Semantic Indexing (LSI) technique is applied to the resultant document-term matrices which truncates the matrices to reduced dimensions and describes the relationship between terms.The proposed algorithmic steps are given as follows: Hadoop Distributed File System (HDFS) stores the input document vectors and the initial input document centroids.

Phase I
1.The Map function splits the input documents into several data blocks (64 MB each) with the initial document velocity and position.The fitness evaluation function evaluates the position of document vectors and is assigned a fitness value.Fitness function is evaluated as the average distance between the document and cluster centroids as in Equation ( 1).The document position with the highest fitness value in the entire document set is considered the global best solution.2. The document velocity and position values are updated based on the Equation (2) and Equation (3). 3. Repeat the steps 1 to 3 until maximum number of iteration is reached.The number of iteration is fixed to 100. 4. The reduce function updates document vector based on the new centroids.This process is repeated until all the document vectors are updated.5.The optimal centroids generated are stored in HDFS along with the input document vectors.

Phase II
1.The Map function splits the input documents in to several data blocks (64 MB each).The similarity between the input document vectors and PSO generated optimal centroids are evaluated for each data block using Jaccard similarity given in Equation (4).2. The reduce function collects the map outputs and updates the centroids as the mean of all cluster documents.3.All the optimal centroids are aggregated and the dimensionalities of centroids generated are reduced based on Latent Semantic Indexing technique.This performs singular value decomposition on the resultant smaller matrices.This reduces the overhead in computation which increases the speed of the algorithm.

EXPERIMENTAL RESULTS
The distributed environment is set up using Hadoop cluster environment.Each node of the cluster consists of Intel i7 CPU 3GHz, 1TB of local hard disk storage reserved for HDFS and 8 GB for main memory.All nodes are connected by standard gigabit Ethernet network on a flat network topology.Parallel jobs are submitted on the parallel environment like Hadoop (MapReduce).The Reuters document dataset (Reuters-21578 text collection distribution 1.0) and several other massive sizes of document datasets RV1 was taken up for extensive processing across distributed nodes, which of these labeled a totaling of approximately 36398MB in size, comprising of about 860 diverse subjects from almost every domains of the knowledge base.There are a variety of evaluation metrics in order to evaluate the performance (Anna, 2008;Datta et al., 2009;Khaled et al., 2009) of the proposed clustering algorithm.The performance is evaluated by varying the document size and the number of nodes.The metric used for evaluation is clustering quality and execution time.The performance metric used to evaluate quality of the clustering is purity.The execution time and speedup of the proposed algorithm is also evaluated.

Purity
This metrics is used to evaluate whether the documents in a cluster are from a single category (Anna, 2008).Purity of C j is formally defined as: Where   h hj max n is the documents that are from the main category in cluster C j and n j h represents the number of documents from cluster C j assigned to category h.For an ideal cluster the purity value is 1 because it contains documents.
The PSO parameters used for evaluation are inertia weight w = 0.72 and the acceleration constants c1 and c2 are set to 1.7. Figure 2a and b shows that the purity values of the proposed algorithm increase while varying the number of clusters and nodes.The results show that the proposed hybrid algorithm is able to assign documents to the correct cluster with increased purity value of 0.75 when compared to standalone KMeans algorithm.It was observed that the purity value of clustering results after 50 iterations are better than the K-Means clustering results.

Execution time
The execution time of the proposed hybrid algorithm is evaluated by measuring the time taken for clustering different document sizes and by increasing the number of nodes.
Figure 3 shows the execution time analysis of the proposed algorithm when the number of clusters and nodes are increased.When the number of clusters is increased, the execution time increases.Figure 3a shows that the execution time taken by the proposed algorithm is less when compared to time taken by MapReduce based K-Means algorithm.Figure 3b shows that the execution time decreases almost linearly with increasing number of nodes of the Hadoop cluster.The execution time in a single node system is high and is decreased when the number of nodes increases.

Speedup
Speedup is the relative increase in speed of one   performance of the proposed algorithm in a standalone system that uses LFS (Local File System) to the performance of the proposed algorithm that uses Hadoop cluster.

DISCUSSION
Extensive experiments were performed on the proposed MapReduce based hybrid clustering algorithm which addresses performance issues such as increasing the quality of clusters and reducing the execution time.The impact of PSO algorithm on the performance of K-means clustering algorithm is that it ultimately performs a globalized search to find the best solution for the clustering process.This dramatically overcomes the two major drawbacks of K-means (Cui et al., 2005) algorithm such as sensitivity to the selection of initial cluster centroids and the local optima convergence problem.PSO is an iterative algorithm that meticulously finds the optimal solution based on a specific similarity measure.This optimal solution is determined iteratively by using similarity measure called Jaccard coefficient.
Jaccard coefficient compares the sum weight of terms present in the documents and its value usually ranges from 0 to 1. Apparently, the quality of cluster generated depends primarily on this similarity measure.In order to increase the quality of cluster, the intra-cluster similarity has to be maximized and inter-cluster similarity has to be minimized; from which we understand that the similarity of documents within cluster should be increased and the similarity of documents between clusters should be decreased.This membership of documents within a cluster depends on this similarity measure.The advantage of Jaccard coefficient measure is that it serves to find out more coherent clusters (Anna, 2008).In this paper the performance of K-means algorithm is improved using a PSO optimized centroids and the cluster membership is determined based on Jaccard similarity.
The new centroids are recomputed after each iteration and all documents are reassigned based on these new centroids.
The impact of clustering algorithms on clustering quality is evaluated after each iteration.Purity is used as the quality measure to evaluate performance of the proposed clustering algorithm.Higher the purity values, better the clustering solutions (Anna, 2008).Table 1 shows the average relative purity values of the MapReduce based individual algorithms compared to the proposed hybrid algorithm for the given input document datasets.It describes that purity values of the proposed hybrid algorithm is improved when compared to the performance of standalone algorithms.
Latent Semantic Indexing (LSI) algorithm also has an impact on the purity values of the PSO optimized resultant clusters.Generally, the evaluation of LSI algorithm with changing parameter setting depends on the application targeted.LSI algorithm is can be used to reflect the semantic structure of documents.It can also be used as dimensionality reduction technique.Here, the performance of LSI is mainly targeted to improve the quality of clusters.Also the computational complexity of Singular Value Decomposition (SVD) involved in LSI is drastically reduced since the algorithm is applied to the Judith and Jayakumari 21 resultant smaller document-term matrices of PKMeans algorithm.Table 2 describes the initial feature space dimensions of the input document data set.
In this paper the term frequency and document frequency TFIDF are combined as the feature weighting scheme, which is based on the idea that if a feature appears many times in a document, that feature must have more weight.A feature that appears in many documents is not important since it is not very useful to distinguish different documents.Therefore, it should have a lower weight.The impact of clustering quality on dimensions is determined using purity values.Dimensions are reduced using LSI based on k singular values.The original matrix of each cluster generated by K-Means algorithm is reduced to the k number of dimensions determined by the singular values of the original document term matrix.For experimental evaluation, the dimensions were increased from 50, 100, 150, 200, 250, and 300.
The algorithm is repeated for at least about 50 iterations.Results show that the purity values increase for lower dimensions and degrade as the dimensions increases.Table 3 shows the purity values for the document set under different dimensions.Also, it indicates that for a given range of dimensionality from 50 to 100, the purity value is high for the proposed hybrid algorithm and degrades with increase in dimensions.
The execution time of the proposed MapReduce based hybrid algorithm is statistically compared with the centralized hybrid algorithm as in Figure 5.In this paper Hadoop Distributed File System (HDFS) is primarily used for distributed storage while executing MapReduce algorithms.This enables automatic data distribution and assists in distributed storage of intermediate and final results.Eventually, it overcomes communication overhead and makes parallel execution of task a lot faster than ever.Basically, the centralized hybrid algorithm is based on Local File System (LFS).The impact of execution time on the performance of the proposed clustering algorithm is evaluated by comparing the execution time of HDFS based algorithm to LFS based algorithm.Figure 5 shows that the execution time consumed by Hadoop based hybrid algorithm is reduced when compared to centralized LFS based algorithm.
Thus a scalable hybrid PKMeansLSI algorithm is proposed using MapReduce distributed methodology to overcome the inefficiency of clustering for large datasets.It's indicated that the hybrid PKMeansLSI algorithm can be successfully parallelized with the MapReduce methodology running on commodity hardware.Most centralized clustering algorithms suffer from scalability problem with increase in dataset size, and are computationally expensive.Due to these aforementioned reasons, the distributed computation of data clustering algorithm is paramount in order to deal with large scale data.In order to develop a good distributed clustering algorithm that takes big data into consideration, it is

INTRODUCTION
In the developing world, only 62% of deliveries are attended by skilled attendants against 99% in the developed world (UN, 2007).In regions, such as Sub-Saharan Africa and South Asia, less than one-third of deliveries are attended by a doctor, nurse or midwife (Koblinsky et al., 2006;Graham et al., 2006)

LITERATURE REVIEW
Literature review revealed that community linkages have successful stories in increasing skilled attendant delivery care in Peru and Afghanistan (Lema et al., 2009, Abdulai et al., 2007).Othero et al. (2008) showed success of the dialogue model in implementing integrated management of childhood illnesses (IMCI) in Nyando District Kisumu County, however this intervention has not been applied to address skilled attendant births.Focused antenatal care (FANC, 2007) was designed to promote the use of delivery by skilled attendants for all women in order to reduce maternal mortality (MOH, 2007).In Kenya skilled attendant delivery care use is 43% despite the high (92%) first antenatal care (ANC) attendance (KDHS, 2009).Many mothers attend ANC and choose to deliver in the absence of skilled attendants (56%) yet the services are available at primary care within communities.Due to the low (43%) skilled births, maternal mortality has increased from 414 per 100000 in 2003 (KDHS, 2003) to 488 per 100000 in 2008 (KDHS, 2009) with WHO reporting MMR of 530 per 100000 (WHO, 2010) raising the need for innovative community interventions to reverse this trend.
To save the lives of mothers and newborns, skilled birth services must reach the poor and marginalized communities in rural areas where so many are dying (2010 decade report).The deaths occur due to post partum haemorrhage, puerperal sepsis, obstructed labour, induced abortion and eclampsia.These deaths could be averted if preventive measures were taken and adequate care made available within communities.
For every woman who dies of pregnancy-related causes, an estimated 20 women experience acute or chronic morbidity, often with tragic consequences (Reichenheim et al., 2009).The three delays causing maternal deaths occur at individual level due to delay in decision making, during transportation due to lack of access and at the health facility due to delayed action.
CHWs trained on the dialogue model bridged the gap (weak linkages) and empowered pregnant mothers with information and skills, supporting and enabling them to make informed choices regarding the place for childbirth (Benzeval et al., 1995).CHWs mediated between the community and service providers promoting understanding on the benefits of early ANC attendance and subsequent skilled attendant care.
Many deliveries (56%) took place at home with unskilled attendants: only 44% of women are delivered by skilled birth attendants and 43% of such deliveries take place in health facilities (KDHS, 2009).Interventions by CHWs included provision of health education and communication at health facility for mothers to understand the importance of early ANC attendance and sensitization in the community on the benefits of skilled attendant births.CHWs used persuasive communication on individual birth plans (IBP) with mothers as every pregnancy faces risks and the need for knowledge of complications during pregnancy, labour and post partum.
Communication with women about their lived experiences during social interactions through frequent dialogue improved community awareness on the importance of skilled attendant care in regard to the health of the mother and the unborn child.Prenatal care visits were used to educate mothers on how to avoid logistical barriers at the time of delivery (Hatt et al., 2007).
Community health strategy links communities to formal health systems, however the linkage has not improved health facility delivery care given that 39 community units have been established in Rachuonyo District and HFD has remained low (23%).Strengthening the linkage was done by training CHWs on the community dialogue model in rural context in addressing the barriers between mothers and formal health systems.
Other studies conducted in Kenya and elsewhere have shown health worker attitude, inadequate health education, access to health facility and transport issues as barriers to provision of health facility delivery care in rural settings (Naanyu et al., 2011;Owino et al., 2012;Perkins, 2009;Magoma et al., 2009;Mbaruku et al., 2009).

METHODOLOGY
To achieve the research objectives a prospective study design was adopted.Prospective study which is longitudinal is a study in which a defined group of individuals who share certain experiences and/or who are exposed to a particular intervention are followed over time.
*Corresponding author.E-mail: mothissy@gmail.comAuthor(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License This was ideal for this study because the researcher having been in contact with the facility and noted high first ANC attendance against low skilled birth attendance adopted a five -phase process for the study.The study was implemented in 5 phases involving exploring experiences of mothers' with individual birth plans (IBP) or birth preparedness.
The study was conducted with rural communities in Kabondo, Rachuonyo South Sub County, Homa Bay County.The study targeted pregnant mothers attending ANC at Kabondo SDH and the eligible population for the study was 760.Participants were purposively recruited by community health workers from the facility's catchment population.The study was conducted from April 2013 to March 2014.
A team of 4 research assistants were recruited from local communities in the study district.Research assistants had a minimum of form IV level of education, as well as good interpersonal skills, and the ability to communicate fluently in English and Dholuo (the local dialect).Research assistants were paired in teams to conduct interviews, with one person being responsible for leading the interview and the other responsible for taking notes and managing recording equipment.All interviews were tape recorded.
In-depth interview guides were developed for pregnant and post natal mothers, and key informant interview guide for health workers.A check list was developed for observations to verify equipment at the health facility for provision of skilled delivery care.A pilot study was carried out at a different health facility for pre testing the instruments to determine their reliability.Instruments were examined by an expert from the department of community health Great Lakes University of Kisumu.The suggestions cited by the expert were used to make the necessary corrections before data collection.
The researcher sought permission from Nyanza Provincial Director of Medical Services who gave the permission for data collection and a letter was written to that effect.The researcher also informed the Medical Officer of Health who is the in charge of Rachuonyo South District.The researcher then visited the sampled health facility and requested the officer in charge to allow her administer research instruments.Health workers and CHWs were interviewed at the health facility.Review of registers and summarized reports were also done at the health facility.The research team interviewed each of the sampled persons to administer the instruments to the respondents.Community health workers guided the research team to each of the homesteads of pregnant mothers where interviews were conducted by recording the responses using digital tape recorders.
A total of 21 pregnant, 5 post natal mothers, 5 CHWs and 5 health workers were interviewed.Mothers were interviewed at their rural contexts (homes) and health workers at the health facility.The guiding principle in sample size determination was the point of saturation.Saturation was the point in data collection when no new or relevant information emerged from participants.Hence, the researcher considered this as the point at which no more data needed to be collected.Post intervention in depth interviews were conducted with mothers taken through dialogue to explore the effectiveness of dialogue in improving deliveries.
Data from the tape recorders were transcribed, then subsequently translated from Dholuo to English.The transcripts were entered into Microsoft excel and then analyzed thematically.Specifically, the coding process involved identifying major themes in each of the transcripts.During data analysis, identified themes were compared across the transcripts to determine differences and similarities in the perspectives of the study participants on childbirth and the factors influencing women's decisions to seek skilled care.Moth et al. 25 Health worker key informant interviews complemented this information by allowing us to explore in more detail individual experiences with reproductive health care services in rural context.The process of triangulation was used to validate the findings and involved comparing the identified themes from in depth interviews and key informant interview transcripts with the participant observations.Discrepant findings between the observations and the transcripts were addressed by follow-up informal discussions with care providers.Clearance was sought from Great Lakes University Ethical Review Committee.The committee gave consent for the study to go on and a letter was written to that effect.Before embarking on any interview consent was sought from every participant and the interview would only go on in earnest once consent was given.The participants were asked to sign a consent form and they were assured of confidentiality of the information gathered.No identifying information was recorded from any participant and data summaries only included their descriptions in aggregate form.

Demographic characteristics of participants
A total of 21 pregnant mothers were interviewed on their experiences with current pregnancy and preparedness for childbirth (birth plan).All mothers were young from 15 to 35 years except one who was 37 years and were all married.The respondents were Christians of varied denominations.Mothers' education levels ranged from standard 8 to form 4 and they were allowed to choose a language convenient to them for the interview.The occupation of the respondents were; small scale farming, small scale business, house wives, and one worked as a cook at an orphanage.Duration of pregnancy at first visit ranged from 2 to 8 months with only 2 mothers starting ANC during the first trimester.Five post natal mothers were interviewed after dialogue to explore effectiveness of information imparted during home visits and choice of place for childbirth.

Knowledge of expected date of delivery (EDD)
All mothers knew their last monthly period (LMP) and this was adequate to assist health workers in calculating expected date of delivery (EDD).After calculating the EDD, the health worker informed the mother of the finding so that she knows the probable time when her baby may be born.Knowledge of EDD is part of Individual Birth Plan (IBP) so that the mother prepares how to reach the hospital during early labour.
Few mothers stated that, they were told the expected date of delivery at the hospital and some could actually name the date.However many of them stated that they were not told the date and therefore they did not know.Additionally, one mother stated that "I do not know they did not inform me at the hospital may be they wrote it on the clinic book I do not know" Some mothers also stated that they were told the date but they have forgotten and therefore did not remember the exact date.When a mother was asked what advice to give any pregnant mother in her community she stated that, "I can encourage her to be patient because it"s only God who knows her date of delivery." After dialogue all mothers were able state the EDD as explained by this excerpt: "I was reminded on the teaching from ANC during home visits by the CHW.The EDD was explained to me well and this helped me on decision to deliver in hospital.Time for delivery reached when the nurses" strike was on and when I went to the facility I was turned away and there was nothing I would do but went to community-based providers (TBAs) who helped me.A neighbour escorted me to the TBA who examined me and delivered my baby.I was told how I can help myself after delivery, saved some money (500) to use at the time of delivery.I paid ksh.150 for motorbike transport and 300 to the TBA for the service received."

Knowledge of ANC frequency
When a mother attends ANC for the first time, she is cared for as if this is the only visit she would make for this pregnancy.She is given the entire package including health education and any other services that she may require during the whole period of pregnancy, a plan for the subsequent visits is written on ANC card.Most mothers interviewed did not know the frequency of ANC attendance, they stated that in every attendance they are given the next appointment a mother said "they make appointments for you after every one month, sometimes you go after 2 months then sometimes its every month, I don"t know:" Another mother further had to say that "I cannot really say accurately because personally I do not follow the appointments strictly, sometimes I forget and at times the date finds me far away, maybe on a long journey, I can finish even one week before remembering the appointment date so I even forget any injection that I would have been given.I don"t know about that maybe you can help me" After dialogue mothers were able to explain ANC frequency; "I was prepared for childbirth, hospital delivery is good because if there is a problem you can be helped; ANC attendance is four times, the first one should be at the time you realise that you are pregnant.I preferred hospital delivery against home delivery; I was tested for malaria, TB and HIV.When I went to the hospital for delivery I was helped by the midwife and I can share my experience with other mothers in the community to deliver in hospital".

Knowledge of complications
All participants stated that hospitals do not know about a condition "rariw", it's better to go to TBAs because they know it well and have a treatment for it.The explanation for the condition so called "rariw" was either a sexually transmitted infection or urinary tract infection.They are given herbs and this was preferred instead of going to hospital for treatment.Knowledge of complications is part of Individual Birth Plan (IBP) so that the mother prepares for any dangers that may occur during pregnancy, labour and immediate post partum.
Limited knowledge of complications was evident among mothers who have attended ANC more than once: when a primigravidae mother was asked the danger signs experienced during pregnancy she said that "I do not know because I have never given birth before" A pregnant mother said that, "the baby can have mal presentation sometimes, or like in my case I get severe stomach cramps so I have to seek professional help.There is another complication; I do not know if it affects only me we call it "rariw" (a barrier that occupies the space of the baby).Now that delivery is free I do not see anything that can prevent mothers from delivering in the hospital, and what I have mentioned about "rariw" most hospitals do not know about it, it is better to go to traditional midwives because they know it well and have treatment for it." After dialogue knowledge of complications increased among CHWs and mothers, they were able to list the complications during pregnancy, labour and delivery and during post partum.
During pregnancy there can be bleeding before the birth of the baby (APH), malpresentation when the baby is already mature for delivery, swollen legs, breathlessness, dizziness, nausea and vomiting, low blood volume (anaemia).During labour and delivery there can be tiredness, inability to push, bleeding, nausea and vomiting, big baby, obstructed labour, malpresentations like breech, transverse lie etc.After delivery there can be bleeding after the birth of the baby (PPH) severe causing anaemia and shock, after pains (ojiwo), severe headache, dizziness, malaria (chills), Weakness after delivery, retained placenta, severe backache (nyatong tong) which can kill the mother.

Decision to seek care
In some communities seeking care is dictated by other family members, however in this study most mothers decided to attend clinic on their own.One mother said "I decided to go for ANC because it was for my own benefit, I would be given tetanus injections and treatment in case of any illness and also acquire a clinic card, I decided for myself, I just informed my husband of my decision" Some mothers also said CHWs helped influence their decisions to attend clinic.One mother said that "Leah, a CHW came and advised me to go to the ANC; she also advised that I should save some money to use on the delivery date."Some mothers stated that the influence of what their mothers had told them earlier on in life influenced their decision to attend clinic.
Further some said that they had gone to the hospital normally, but during that time they were informed that they start attending clinic.She said "I did not know my condition at first till I went to the hospital and got tested for HIV, so that is when they told me to come in April and start attending clinic" Mothers who had delivered previously had different experiences and decided to attend for various reasons.One said "I valued my health apparently, during my 5th pregnancy I experienced prolonged labor that went on for a week so I decided to start ANC early this time" ANC card is given to the mother on the first visit and it documents her profile, medical history, obstetrical history and any other service that she gets during every visit until she delivers.The purpose of this card varied with different mothers as asserted by this excerpt "If you do not have a card you can be scared, but if you have a card you do not fear because you were with your people and they know you well.Sometimes if you have a problem you are sure of being helped in the hospital.You can take the card as a shield (kaka kuot) when for sure you know you will not deliver in the hospital but for caution in case you develop a problem." Another pregnant mother said that, "I can advise a mother in the community to start attending clinic so that on the date of the delivery she has a clinic book because when you do not have a clinic book, health workers may not attend to you in time even if you have complications" Decision to seek skilled delivery depended on the mother herself and the spouse where possible and was also influenced by the teaching from CHWs explained by this excerpt; "I decided i will follow the teaching I got from CHW on the benefits of hospital delivery and my husband supported the idea."

Gestation at first ANC
First ANC attendance should occur during the first trimester in order to detect and treat high risk cases early, however most mothers did not start ANC during the first trimester denying them a chance of early detection of complications and appropriate referral.Many mothers visited antenatal clinic during the third trimester.
Antenatal clinic attendance enables health workers to give health education and carefully evaluate individual mothers given that every pregnancy has its own customs, and may pose risks at any stage.Health education was not mentioned as a package during ANC.There were mixed reactions on the kind of reception by the health workers at the ANC.On average pregnant mothers when they arrived at the health facility were issued with the clinic book, their heights and weights taken, they take blood pressure then they did some tests which included malaria and HIV tests.They then gave me medicine and also did counselling, "they also gave me insecticide treated mosquito net."

Information gained at ANC
Health education is a package of care during ANC and every mother is educated on Individual birth Plan (IBP).It was revealed that health workers had little time for giving individualized care and health education to mothers.A pregnant mother stated that "health workers need to talk to mothers politely on benefits of delivering in hospital; I have been at the antenatal clinic but I have not been taught anything.A.....when in hospital you line for your turn to see the doctor then leave for home." Mothers who attended a number of ANC visits were unable to list complications and basic items necessary for delivery, however information on birth preparedness improved after dialogue."Scheduledclinic attendance and preparedness for childbirth, and hospital delivery is good because if there is a problem you can be helped.ANC attendance is scheduled four times during the entire pregnancy, the first one at the time when you realise that you are pregnant.Hospital delivery proffered against home delivery so that you are tested for malaria, TB and HIV.If there was a problem CHW would come to visit at home for teaching with other family members."

DISCUSSION
Findings revealed a number of similarities in this study and previous studies conducted in Kenya and other countries.Demographic characteristics were consistent and were similar with other findings from studies conducted in other countries.
All mothers knew their last monthly period (LMP) and this was adequate to assist health workers in calculating the EDD to recognize labour as the event occurs around the EDD.Mothers who do not know their EDD would not be able to go to seek delivery care in hospital on time when labour starts or may recognize labour when in second stage.In this case the mother may not reach hospital in good time for skilled delivery care.
Knowing when a woman is due to deliver was considered important, and there was general consensus that attending antenatal care at the health facility would enable the woman to know the probable date of delivery and in turn, prepare her adequately for the arrival of the baby.Mothers did not know the EDD even after more than one ANC visits and this indicated that limited information was gained from health workers at ANC.After the dialogue all mothers stated the EDD and some of them mentioned that this knowledge facilitated them on decision to seek hospital delivery (Perkins et al., 2009).
Mothers benefit from just a few antenatal visits, as long as those visits are thorough.In normal pregnancy mothers receive at least 4 thorough, comprehensive, personalized antenatal visits, spread out during the entire pregnancy.Always each visit is viewed as if it were the only visit the mother would make, the first of which should take place in the first trimester.Antenatal care is more beneficial in preventing adverse pregnancy outcomes when it is sought early in pregnancy and continued through to delivery.Early detection of problems in pregnancy leads to more timely referrals in case of mothers in high-risk categories or with complications; this is particularly true in Kenya, where three-quarters of the population live in rural areas and physical barriers pose challenges to health care delivery.
Knowledge of complications that were considered traditional was believed to be treated by TBAs as opposed to health workers.These conditions were also known to health workers and they perceived that communities do not seek such services from health facilities.A common complication mentioned by all participants was "rariw/God" a barrier that occupies the space of the baby.They however stated that hospitals do not know about it, it is better to go to TBAs because they know it well and have a treatment for it.The explanation for the condition was stated that the so called "rariw" was either a sexually transmitted infection or urinary tract infection.They are given herbs and this is preferred instead of going to hospital for treatment.
It was revealed that symptoms or problems women experience during pregnancy determined the type of providers they would go to for care and advice.They were likely to seek facility-based care if they experienced unexplained bleeding, severe abdominal pain, haemorrhage or spontaneous abortion, pelvic inflammatory disease, or anaemia.Skilled caregivers were perceived to be better equipped and trained to provide specialized treatment for problems perceived as medical in nature, whereas community-based providers (TBAs) were perceived as having unique expertise in managing problems that fall outside the realm of western medicine (Perkins et al., 2009).
Readiness for skilled attendant care and complications is promoted to reduce the delays so that all women receive appropriate care promptly.Human resources at rural health facilities would be complemented by trained local community health workers (CHWs) to reduce the workload of nurses (Decade report, 2010), introduce more regular education, counselling, and encourage use of skilled delivery care (Abdulai, 2007).Alleviating misconceptions and fears like the case of "rariw" reduces the gap in maternal health knowledge during prenatal care would contribute to increased utilization of skilled delivery care.
In many communities the decision to seek care is dictated by other family members like the husband, mother in-law or senior wives.In this study mothers decided to seek antenatal clinic and skilled delivery on their own.This finding agrees with previous studies as explained by recently delivered mothers from a previous study where they stated that mothers decide on the care to receive on their own and are supported by their husbands (Magoma et al., 2010).Obtaining a Mother's Card (antenatal card) was also cited as a reason to attend facility-based antenatal care.The Mother's Card is seen by some as a "ticket" to skilled care during delivery because without it, women who present for delivery care or for treatment with obstetric complications may be turned away from the facility or subjected to scolding and abuse by facility-based staff.
Generally, even TBAs mentioned the importance of getting an antenatal card than did other categories of respondents, and noted that they asked women to obtain an antenatal card before seeking their services so that they can be sure that the woman is unlikely to have any complication (Perkins et al., 2009).TBAs ensured mothers have ANC cards because from this document they are able to evaluate a woman likely to have complications because it is written on the card.Mothers attend to the entire clinic visits, adhere to the doctor's prescriptions for the required period up to the last minute.But when they remember/think that things could change in the last minutes that needed operation or an episiotomy, they become fearful and opt to deliver at home (Naanyu et al., 2011).
Mothers attend ANC once they realize that they are pregnant for early diagnosis of complications and appropriate referral.Mothers did not start ANC during the first trimester denying them a chance of early detection of complications and appropriate referral.First ANC attendance should occur during the first trimester in order to detect and treat high risk cases early, however most mothers did not start ANC during the first trimester denying them a chance of early detection of complications and appropriate referral.Many mothers visited antenatal clinic during the third trimester.
To achieve the MDG target of reducing by three quarters the maternal mortality ratio by 2015, along with the MDG target of decreasing infant mortality, the international community has placed an emphasis on increasing antenatal care (ANC) and skilled attendant deliveries care (Perkins et al., 2009).Many women do not complete 4 ANC visits.Mothers who attend regular ANC know the frequency during the period of pregnancy as part of individual birth plan (IBP) to prepare the mother for the expected baby.This information is imparted during ANC from the findings mothers did not know the frequency revealing that health education at ANC is inadequate to impart this information (Perkins et al., 2009, Cater, 2010).
Information on birth plan is shared during this time and any misconceptions that mothers have from the community are corrected appropriately.
There was widespread feeling among study participants that dialogue between providers and clients is vital, and treatment options should be presented in a clear and transparent manner.Many study participants observed, however, that skilled attendants did not provide clients with sufficient information (Perkins et al., 2009).
Mothers who have attended more than one ANC visits were unable to list the roles of a husband and basic items necessary for delivery revealing that health workers are too busy to give health education.The task of conducting health education would be shifted to community health workers who are the link between the community and health facility.Health workers are overwhelmed with various activities at the health facility hence unable to conduct health education during ANC visits.Health education would influence mothers' decision to complete four ANC visits and subsequently hospital delivery.
Communication with mothers about their expectations and perceptions of health facility deliveries would improve community awareness of the importance of skilled attendant delivery care in regard to the health of the mother and the unborn child.Prenatal care visits would also be used to educate mothers on how to avoid logistical barriers at the time of delivery (Hatt et al., 2007;Cotter et al., 2006;Magoma et al., 2010;van Eijk, 2006;Titaley et al., 2010).After dialogue all mothers were confident is stating the frequency of ANC during the entire period of pregnancy.
The study concluded that mothers lack adequate information from interaction with health workers during ANC given that health workers are few and are engaged with integration of services.Preparedness for skilled attendant birth was inadequate and was evidenced by limited information on ANC frequency, complications, EDD and therefore unable to recognize labour signs for accessing the health facility for skilled delivery.
Dialogue was effective in improving information on birth preparedness and skilled attendance.The role of CHWs supported by CHEWs, as a link between the household and the formal health system was a key element in the dialogue process.CHWs were critical in bridging the gap between theory and practice, demand and supply.
The study recommended facility improvement and provision of adequate staffing, supplies and equipments.Equipments were worn out and needed urgent replacement.There is need for task shifting of health education role from health workers to CHWs because they were overwhelmed with integration of services and unable to provide adequate birth preparedness information to mothers.CHWs need allocation of resources to enable them strengthens the linkages between the community and the health facility.

INTRODUCTION
The concrete as a material is considered to be a composite material wherein the aggregates forming the backbone solid and playing the role of reinforcement ensuring the mechanical performance in relation to the various stresses.In this context, many studies focus on the correlation between the proportions, size distribution and mechanical properties of aggregates on the one hand, and the concrete strength on the other hand.In *Corresponding authors.E-mail: safi_brahim73@yahoo.fr Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License cycles at lower stress levels (Cachim et al., 2000;Lee and Barr, 2002).The flexural strength under cyclic loads (fatigue) is a key criterion for evaluating the mass concrete intended for road applications.These tests consist in subjecting prismatic samples to bending loads lower than the rupture and determine the number of loading cycles that the concrete can support.The loading levels typically vary between 50 and 80% of the flexural strength at the time of the test, with a number of cycles up to one million according to ASTM standard.At present, the concept of fatigue limit is related to the hypothesis of the existence of a horizontal asymptote of the curve from SN 106 or 107 cycles.It is thus considered that any sample not broken at 107 cycles lasts infinite life and very high.Generally, the possible forms of the SN curve are shown in Figure 1, showing four zones (Bathias and Pineau, 2008): Zone I: corresponds to the oligocyclic fatigue also known as LCF (Low Cycle Fatigue) that is defined with a small number of cycles (about 105cycles).It is characterized by testing in high stress amplitudes Zone II or Zone of limited endurance which is generally between 105 and 107 cycles.This is the fatigue area many cycles (HCF).It is noted that in this area a lower loading involves a greater number of cycles to failure (Nf).
Zone III shows the fatigue having very large number of cycles (for the number of higher than 107 cycles).It relates generally to the field tests on the requesting machine piezoelectric samples to very high frequencies, in the case III (a), the curve tends to a limit defined as horizontal asymptote of conventional strain (Papadopoulos and Panoskaltsis, 1996).In the case III (b) where it continues to monitor fractures caused by mechanisms of initiation surface or samples at an inclusion, in the case of steel (Blanche, 2012).
Zone IV shows a case corresponding to a material that continues to damage, even at lower stresses, whereas the other two cases show that the material has reached a stress below which microstructural mechanisms are perfectly reversible or their irreversibility would be negligible (Mughrabi, 2006).
According to carried researches on the mechanical behavior of materials, the fatigue failure will take place under the effect of repetitive or cyclic load, or the maximum value of the load is substantially smaller than the safety load estimated (the static load).In the concrete case, these changes are mainly related with the progressive growth of internal micro cracks, which result in an increase of irrecoverable strain which manifest at the level changes in the material's mechanical properties (Sun et al., 1999;Shang and Song, 2006;Hasan et al., 2008;Li et al., 2011;Shi et al., 1993;Lee and Barr, 2004;ACI 215R-74, 1997).However, it was noted by Mudock and Kesler that fatigue failure in concrete is due to progressive deterioration of the bond between the coarse aggregate and the matrix cementitious.These authors have found that the section reduction of the specimen leads to its failure due to fracture of matrix (Murdock and Kesler, 1958).
Many studies have already been carried out on the effect of the nature and type of aggregate on the mechanical performance of concrete (Malesev et al., 2010;Tavakoli and Soroushian, 1996;Corinaldesi and Moriconi, 2010;Thomas et al., 2013;Corinaldesi, 2011;Zaetang et al., 2013).The physical properties and mechanical performance of the recycled aggregate concrete have been compared to those concrete made with natural aggregates.Other works have been realized on the recycled aggregate-based concretes with the fly  ash presence in a binder mixture (Lima et al., 2013;Abo-El-Enein, 2014).These studies have shown that the improvement of physical properties can be conducted to significant improvement of mechanical performances of concrete material.Although a few studies have been dealing the effect of the parameters of coarse aggregates on the flexural fatigue resistance of concrete, there are still insufficient experimental data about to the fatigue behavior of concrete realized with different types of coarse aggregates.For this, the present study focuses on the parameters effect of coarse aggregates on the flexural fatigue behavior of concrete.

Materials used
The cement used was a Portland cement (CEMII 42.5) with specific gravity of 3.15.The chemical and mineralogical composition is given in Table 1.The natural sand (0/5 mm) is used as a fine aggregate.Three coarse aggregate types of class (8/15 and 15/25) were also used from three different careers (CA1, CA2 and CA3).Table 2 gives the physical and mechanical characteristics of these aggregates.Also, Figure 2 gives the grain-size distribution of the studied aggregates.According to this grain-size distribution, it is clear that the aggregates have practically the same size and fraction for the two class cases (8-15 and 15-25).

Concrete mixtures
In order to see the effect of nature and the aggregate parameters on the concrete behavior at fatigue stresses, a serial of concrete mixtures were established using the Dreux Gorisse method.The formulation calculations are performed to achieve minimal resistance Rn = 35 MPa of concrete with a workability around Af = 10 cm for a granular coefficient G = 0.5.After mixing concrete, the workability test was carried out on each mixture using the Abrams cone test according to EN norm (EN 12350-2: Testing fresh concrete -Part 2: Slump test).To conduct this work, a prismatic (70×70×280 mm 3 ) and cylindrical (320160 mm 2 ) concrete samples were manufactured for each mixture.One day after casting, samples were stored in water under 21±1°C, and various tests and measurements were carried.The prismatic (70×70×280 mm 3 ) samples were used for flexural test and fatigue test.The cylindrical samples of concretes were used for compressive test.
The concrete specimens are made and controlled according to ASTM C 597-1980 standard.The tests are carried out using the ultrasonic pulse velocity testing (UPV testing).This system consists of several functional units which are pulser/receiver, transducer and display devices as schematically described in ASTM C597-97 (The UPV testing -ASTM C597-97) (ASTM C597-97, 1993).
Nondestructive evaluation of concrete by ultrasound is a technique commonly used in research (Breysse, 2012) shows that the choice of a given model type has no material consequence, since all models lead roughly to the same quality of the evaluation, the error of each model is much smaller than that due to the measurement uncertainties.
The bending tests are performed on a frame bending Toni technique, with a force sensor and at 100 KN load and the computer driven through testXpert software version 7.0.The test protocol is established according to NF EN 12390.
Compression tests are performed on the two parts obtained by bending test bed through a maximum compressive strength of 3000 kN.The test speed is 10 mm/min.Bending tests are conducted at 28 days of curing age.

Fatigue test
The main objective of this part of work is to see the influence of the aggregates properties on the fatigue behavior of the concrete prepared at various levels of fatigue stress on prismatic specimens (280×70×70 mm).The test protocol involves applying level solicitations 50, 60, 70 and 80% of the maximum bending strength with a 0.6 ratio between the max load and the min load.
Fatigue tests were performed on an electromechanical machine (vibraphone; Figure 1) for fatigue testing high frequency (30-300 Hz) by the application of alternative sinusoidal loads using the principle of resonance with a variable amplitude and mean load, allowing obtaining the S/N curve any saving energy.
Fatigue testing machines apply cyclic loads to test specimens.Fatigue testing is a dynamic testing mode and can be used to simulate how a component/material will behave/fail under real life loading/stress conditions.They can incorporate tensile, compressive, bending and/or torsion stresses and are often applied to springs, suspension components and biomedical implants.

Workability of studied concretes
The obtained results of concrete workability are given in Figure 3.According to these results and the Figure 4, all concrete mixtures have workability acceptable for concrete construction.Also, it was noted that all concretes studied have the same workability even concrete implementation, which gives us virtually no effect of the aggregates types on workability.

Non-destructive testing
After the non-destructive testing on prismatic samples of the studied concrete by ultrasonic waves, the speed obtained is used to calculate the compressive strength using the following Equation (7) (Table 3): R C =0.08177.e(0.00147.Vt) (2) Average speeds used for all concrete specimens with aggregates made up of different careers are above 4200 m/s, this value is an indication of good quality concrete according to ACI 228.2R-98 (American Concrete Institute Report ACI 228.2R-98, 1998).

Destructive testing
The studied concretes were also tested in destructive  testing by measurement of the flexural and compressive strength of all concrete specimens after 28 days of curing age.The obtained results are given in Table 4.
The results show that strength can be influenced by the aggregates properties that have a direct impact on the properties of concrete.However, the Figure 5 shows the relationship between the parameters of aggregates namely: Micro-Deval and Los Angeles on the mechanical bending and compression resistance.This figure shows that the resistance to compression deflection is inversely proportional to the parameters of aggregates; this relationship is the case for highly flax Los Angeles.

Fatigue test
The fatigue machine is controlled by computer through the testXpert (12.1) software that can give different settings fatigue test: Resonant frequency and these variations, changes in static and dynamic loads, report "r" and the number of cycle.At the beginning of the test, the machine applies a scanning frequency until the resonant frequency of the test piece, in this case the phase shift between the mass-spring system and the specimen is zero, and Table 4 illustrates the results obtained on different types of specimens.
The results can be commented through two parameters: The cycle number and the resonance frequency; the second parameter is proportionally related to the compactness of the material (Table 5).The specimens based on CA2 aggregates exhibit resonance frequency 90 Hz, respectively, followed by those of CA1 and CA3.From this parameter, may have concluded that CA2 aggregates have a higher stiffness compared to other, this is confirmed by the results of characterization of aggregates (Los Angeles and Micro Deval).
Figure 6 give the fatigue results of all studied concretes.This figure presents the applied loading level  as a function the cycle numbers.It is clearly that the curve fatigue has the same trend as that described in the literature.Indeed, it was observed that a concave portion and a linear portion representing the reduction of the stress with increasing of number of cycles applied.The CA2 aggregate-based concrete has a best fatigue behavior compared to other concrete mixtures.Indeed, the CA2 aggregate-based concrete can support about 70% of maximum load during 1milion of the cycle.For CA2 aggregate-based concrete, the endurance limit is around this level of stress comparatively to the CA1 and CA3 aggregate-based concretes which have a limit is estimated to be around 55% of max loading.The curves obtained in the case of test-based CA3 aggregates are similar to the theoretical model of Wöhler.This means that the load distribution during the test in this case is more homogeneous than in the other cases.Taking into account the mixture homogeneity, the concrete strength and the main characteristics of aggregates such as (Micro-Deval and L.A), the fatigue flexural behavior of concrete with CA1 aggregates appears much better than other concretes (Figure 7).

Conclusion
This study focuses on the research of the influence of aggregates parameters on the mechanical properties of concrete under static and dynamic state.The obtained results show that the aggregates which have parameters and the Los Angeles and micro Deval similar as the case of coarse aggregates 3 (CA3), provide better adhesion with the cement paste, and therefore a greater mechanical strength to bending.As against, the compressive strength and Los Angeles coefficient are inversely proportional: aggregates that have a high  The relationship between the endurance limit and the coefficient of Los Angeles is inversely proportional: The aggregates which have good resistance to fragmentation (a low LA coefficient) as the case of CA2 aggregates provided an endurance limit at around 70% max load bending.

Figure 2 .Figure 3 :Figure 3 .
Figure 2. Frequency histogram of coastal (a) V current and (b) mixed layer depth based on daily Hycom data at Ponto do Ouro: point in (c).Mean ocean maps from 8 km Hycom reanalysis of: (c) temperature, (d) mixed layer depth, (e,f) V and U currents (shaded and vectors).

Figure 4 :Figure 4 .
Figure 4 : (a) Digital-earth composite photo of Ponta do Ouro 2013 and bay inset with 2005 shoreline and 2010 cusp (red lines).(b) Aerial photos of the point viewed toward south, (c) and bay toward north.
are NNE 15% of the time, mostly in spring: 27 Sep 2012, 29 Aug 2013 and 19 Sep 2013.Local winds are channeled and accelerate around the headland.There is an inland gradient of wind ~ 1 m/s per 100 m owing to friction of the headland compared with the adjacent sea.During

Figure 5 :Figure 5 .
Figure 5 : Photos at Ponta do Ouro of (a) Wind blown dunes outside the point in 2013 (at S), (b) headland reef at low tide 2007 (at E), (c) leeward bay beach 2003, (d) receded beach and flattened dunes 2013 (at N in Fig.8a).

Figure 6 :Figure 6 .Figure 7 :Figure 7 .
Figure 6 : Regional composite patterns of: (a) East wave scenario in late summer; (e) South wave scenario in winter with region of turbulent flux > 150 W/m 2 shaded yellow; (c) North wind scenario in spring.Block arrows denote pattern translation.(df) Corresponding observed local winds with length scale.Inset on left are ECMWF wave climatology histograms per direction.Red dot is Ponta do Ouro.

Figure 8 :Figure 8 .
Figure 8 : (a) Drifter tracks 23-25 April 2002, with vectors representing mean values in surf and bay zones.Frequency distribution of all drifter tracks: (b) direction and (c) speed.Labels in (a) refer to sites of sand and soil samples, and beach profiles (dashed).0 0.3 0.6 0.9 1.2 1.5 1.8 2.1 Speed (m/s) proposed a Fuzzy Approach to Cluster Text Documents Based on MapReduce.Fuzzy set is used to categorize text documents.A parallel text clustered framework is designed based on MapReduce according to the proposed text clustering procedure.Patil and Nandedkar, (2014) proposed a MapReduce based K-Means and hierarchical clustering algorithm which shows improvement in performance but lacking clustering quality.

Figure 1 .
Figure 1.Proposed Distributed Clustering Algorithm based on MapReduce Framework.

Figure 2 .
Figure 2. Analysis by varying the number of clusters and number of nodes.

Figure 3 .
Figure 3. Execution time analysis by varying the of clusters and number of nodes.(a) Execution time analysis based on number of clusters; (b) Execution based on number of nodes

Figure 4
Figure4depicts that the speedup of the proposed algorithm increases linearly but is stable with increase in the number of clusters.Figure5describes the

Figure 5 .
Figure 5. Execution time comparison of the algorithm.

Figure 2 .
Figure 2. Particle size distribution of coarse aggregates.

Figure 4 .
Figure 4. Workability evolutions of studied concretes based on aggregate types.

Figure 7 .
Figure 7. Cycle numbers obtained in the level of the applied load.

Table 1 .
Beach sand / soil characteristics (a) south of Ponta and (b) inland from the point.

Table 2 .
Number of days per 3 year period with wave-generating storm winds > 10 m/s lasting > 3 days averaged over a 15 x15° area southeast of Ponta.

Table 1 .
Purity analysis of algorithms.

Table 2 .
Initial feature space dimensions using terms.

Table 3 .
Impact of clustering quality on dimensions for clustering algorithms.

Table 2 .
Characteristics of used aggregates.

Table 3 .
Compressive strength from longitudinal velocity ultrasound.

Table 4 .
Flexural and compressive strength of studied concretes at 28 days.

Table 5 .
Results of fatigue testing of concrete samples.