New methodology for microarray spot segmentation and gene expression analysis

DNA microarray analysis is the main core in genome mapping. Each microarray image contains millions of information about genes. Microarray analysis is considered one of the most recent and important technologies in exploring the genome. One of the key steps in microarray analysis is to extract gene information from the gene spots, these information represent gene expression levels in the microarray. This paper proposes a new methodology to improve microarray spot analysis based on spot extracted segments. It concentrates on each spot segment area independently rather than analyzing all the spots area together of the microarray image. This paper provides a formal model to enhance the intensity values obtained from gene expression levels of the microarray at any intensity expressed level. It also this paper presents the adaptive threshold techniques to be used for microarray segmentation. The experimental results show that the mean of the gene expression intensity value was 87.77.

interaction.Almost all the software systems tested require human intervention (Moffitt et al., 2011).It requires the user to specify the geometry of the array, such as the number of grids, number of rows and columns, etc.For example: SPOT from UCSF (Emmanouil et al., 2009), IMAGENE from Bio Discover (Rueda and Vidyadharan, 2006) and DAPPLE from University of Washington (Buhler et al., 2000).
The first step in microarray image analysis is addressing the spots locations the second step is spot segmentation (Tiwari, 2005;Wang et al., 2007;Efron et al., 2001) thirdly estimate the intensities of the gene spot (Newton et al., 2001;Yang et al., 2002).
There are five main issues that should be considered while analyzing microarray images that include gridding the microarray, determining the location of spots in the microarray image, removing noise and unwanted particles, removing background and finally image enhancement.
The main aim for microarray image analysis is to determine the percent of gene in every spot of the microarray and this will is done by determining the percent of red or green color's in a specified spot.The process of analyzing each spot basically depends on the segmentation method used to extract the spot.The current processes of extracting the spot and perform further analysis depend on analyzing and enhancing the microarray image global values for all the spots.This paper proposes a new methodology to improve microarray spot analysis based on spot extracted segments area taking into account the segment background and noise.It concentrates on each spot segment area independently rather than analyzing all the spots area together of the microarray image.The methodology involves noise removal, background subtraction, percent of color calculation and gene profiling.This paper provides a formal model to enhance the intensity values obtained from gene expression levels of the microarray at any intensity expressed level.
The remainder of the paper is organized as follows.First is a discussion of the related work, followed by data presentation.Furthermore, the proposed algorithm and its application to the microarray images was presented and finally, conclusions are summarized.

LITERATURE REVIEW
Segmentation of microarray analysis is classified into four types: fixed circle segmentation, adaptive circle segmentation, histogram segmentation and adaptive shape segmentation (Zacharia and Maroulis, 2008;Antonio and Ceccarelli, 2004;Giannakeas and Fotiadis, 2008;Bariamis et al., 2010).The fixed circle segmentation algorithm assumes that the spot has a perfect circle shape and all spots have the same size (Zacharia and Maroulis, 2008).The adaptive circle segmentation Hudaib et al. 127 algorithm assumes that the spot has a circular shape and permit adjusting the size of each spot (Antonio and Ceccarelli, 2004;Stefano and Luo, 2004).It provides more accurate results than the fixed circle algorithm (Zacharia and Maroulis, 2008;Wu and Yan, 2003).
The histogram segmentation (Giannakeas and Fotiadis, 2008) uses a clustering algorithm to partition the pixels based on their intensity values.
The adaptive shape segmentation segments a spots by its shape either by the seeded region growing (SRG) (Bariamis et al., 2010) or the globally optimal geodesic active contours (GOGAC) (Alhadidi et al., 2006).The spot is adaptive in size and can be of irregular shape.(Jain et al., 2003;Lee et al., 2000).Alhadidi et al., 2006 developed a new algorithm for determining grid in the microarray images based on spots position in the microarray image Alhadidi et al., 2006.Clustering is also used in microarray image segmentation (Saal et al., 2002).Clustering have shown some advantages when applied to microarray image segmentation such as reducing computational time, and producing complete segmentation despite the false edges of the spots (Saal et al., 2002;Mangalam 2001;Kamberova and Shah, 2002).However, clustering produces irregular shaped spots which makes clustering based algorithms produce noisy pixels in the foreground regions and incorrect quantization measures for spots intensity (Bariamis et al., 2010;Meher, et al., 2011).
The segmentation algorithms extract gene spots based on either the whole background of the microarray image or the on the locally segmentation of microarray images in each sub-image.However, microarray images consist mostly of low-intensity features that are not well distinguished from the background, which leads to errors that propagate to all stages of the microarray analysis (Zacharia and Maroulis, 2008;Adwan et al., 2013;Fuyong, 2013).In this paper, we propose an algorithm that divides the microarray image into small images and extract gene spot using adaptive threshold based on the background of each small image.

PROPOSED METHODOLOGY
The proposed methodology segments the gene spots from the spot area using adaptive threshold.The methodology removes the noise from the spot area and calculates the spot intensity for each spot individually in order to reduce the error that result from processing the whole microarray background.
Figure 1 shows the steps of the proposed methodology where the input for this algorithm is the microarray image, and the target output is the gene intensity expression level.The methodology steps are subsequently expressed.

Crop microarray image
The microarray image represents the whole genome mapping.It consists of thousands of sub-microarrays.The microarray image is cropped into sub-microarray images.Each sub-microarray image will be used to extract each spot area separately.
The microarray is cropped using crop image tool in MATLAB software by determining the positing of the spot this step is done manually.Figure 2 shows microarray image and cropped submicroarray image.

Separate red and green layers
The microarray is stained with two dyes: Cy5 (red) and Cy3 (green) (Newton et al., 2001), usually Cy3 for the control and Cy5 for the experimental channel (Newton et al., 2001;Alhadidi et al., 2006).The microarray are scanned at 540 nm (green) for the control (Cy3) and e630 nm (red) for the experimental channel (Cy5).The microarray image is produced by scanning the microarray monochromatic images, which are registered into two channels red and green.The red and green layers are produced by Cy5 and Cy3 dyes respectively with a zero blue component (Antoniol and Ceccarelli, 2004;Kim et al., 2001) The output of this process is submicroarray image with two colors red and green.
The separation of the red and green layers from the submicroarray image is done using the Matlab red Map and green Map tools.Figure 3 shows the red and green layers of a sub-microarray image.

Gridding and applying the grid on red and green layers
This step determine the spot position in the sub-microarray image using grid lines.The grid lines are the lines that are drawn at the center point between any two adjacent spot rows or columns.The intersection of these lines results a square that surrounds each spot.
To apply gridding, first we calculate the intensity of the spot raw and spot columns, where the intensity represents the value of red and green colors in each pixel.Then the autocorrelation function is applied to enhance the self-similarity of the vertical and horizontal intensity averages of rows and columns.After that, we calculate the center of peak for each spot raw and spot column.The lines of the grid will be drawn at the center of each peak.Figure 4 shows grid lines on red and green layers.

Extracting the spot segment
This step extracts spot segments according to the spot location in the grid.The result of gridlines (row and columns) intersection is used to determine the spot segments coordinates.Every segment has four coordinates: (x1, y1), (x2, y2), (x3, y3), (x4, y4), where (x, y) values are relevant to gridline intersection points (Alhadidi et al., 2006;Fuyong, 2013).Each extracted segment contains only one spot.Every spot intensity value represents the gene expression level for the gene that the spot represent (Figure 5).

Enhancing the extracted segment
Enhancing the extracted segment is a key step before calculating the gene expression level of the microarray image.This enhancement involves three steps: (i) applying logarithmic transformation, (ii) enhancing the contrast, and (iii) image sharpening.

Applying logarithmic transformation (LG)
Since microarray spots are different in shape and have large variation in brightness, the logarithmic transformation is applied using Equation 1 (Jiao and Sun, 2010).This step is aims to equalize large variations in spots color magnitude, and leads to increase the visibility of low brightness spots.
Where c is a constant value, f is the value of the image pixels, the greater the value of c the better the intensity values appearance.

Enhancing the contrast
The poor absorption by some microarray genes causes slow spot intensity of red and green levels (Antoniol and Ceccarelli, 2004;      Giannakeas and Fotiadis, 2008).Therefore, we need to enhance the contrast of these spots to resolve the low spot intensity.The nonlinear contrast stretching as shown in Equation 2 (Kim et al., 2001) is used to solve this problem. ( Where Lis the total number of gray levels in the microarray image.Nk: Number of pixels with gray value rk,n: Total number of pixels in the spot image.Pin: denote the probability density of the Gray values in the spot image.

Image sharpening
Microarray images are taken either by digital scanner or digital camera which produces soft images without sharpen edges.The sharpen edges are needed for the segmentation step.The shock filter is applied to sharpen these edges (Jiao and Sun, 2010).The shock filter enhances the image and produces a sharp discontinuity called shock at the borderline between the spots and the background.

Segmentation using adaptive threshold
To calculate the gene expression level for spots, first we need to segment the spots from extracted spot segments.To achieve this segmentation the constant mean adaptive threshold of the extracted spot segment intensity is applied by calculating the average intensity (M) of the foreground and the background using Equation 3.
Where, IF is the intensity values of the foreground, and IB is the intensity values of the background.Due to the variation in the spot intensities, the average intensity (M) is not suitable to be considered as a threshold value at some spot segments.Therefore, the threshold (T) is calculated using Equation 4 (Kamberova and Shah, 2002).
Where C is the constant, W: is the spot segment size.
The constant value (C) is calculated using the mean of the experimental results that resulted from testing the set of spot images.In our experimental results, we calculate the constant value (C) using 300 extracted spot segments.All of the extracted spot set images have shown the same behavior as the value of the constant fall in the range of 0.5 to 0.01.
The spot segment size (W) is the window size, which represents the dimension of the spot segments that extracted from the microarray image w is calculated using Equation 5.

W = w * h (5).
Where w is the distance between any two horizontal grid lines, h is the distance between any two vertical grid lines.Because the grid lines are parallel the perpendicular distance between them is constant.The distance between grid lines L1, and grid line L2, is the distance between the two intercepts of these lines (Fuyong, 2013).

Gene expression intensity
The extracted segment usually includes the spot, the background, noise, and other objects such as piece of the surrounded spots or other artifacts.To calculate the gene expression intensity, a spot measurement model is used as Shawn in Equation 6[R].
Spot measured value = A/B +  + ebi (6) Where, A/B is the log-transformed true fold change of gene of condition A with respect to condition B (log (intensity of red color / intensity of green color )), -dye effect, emeasurement error with E[e] = 0 and Var(e) =  2 , and bi is the background intensity.However, this model does not taking the other objects and the other artifacts (OOA) that may exist in the extracted segment into consideration.Therefore, in order to enhance the spot measure value (Equation 6), we propose equation 7 to obtain better for quality gene expression intensity (GEI) value that takes all factors (OOA) into consideration.We subtracted the background intensity of the extracted segment for each spot rather than taking the whole image background.And this is due to that the whole image background intensity gave false and inaccurate result for some spots area's with low intensity values.
The experimental results for tested sets of spots show a negative values when we used the whole image background.But, using the extracted segment background area with all its objects and other artifact, it gave positive results even with the low intensity values.

EXPERIMENTAL RESULTS
The experimental results and the comparison of the methodology were carried out in a set of real microarray images, obtained from genome database (Image Dataset, 2016).The experimented genome database for whole yeast Genome (ISB Version -provided by Laura Hoopes and Allen Kuo) (Figure 2).The yeast genome contain 20 sub microarray images each one contains 480 extracted segments.
The experimental results show that the proposed methodology is able to calculate the intensity for all of the studied sub microarray images.
Table 1 shows the calculated gene expression intensity (GEI) for the 20 sub microarray.The average of the calculated GEI is equal to 87.77.
To evaluate the proposed methodology, we compare its results with the results of the Gaussian Mixture Models Hudaib et al. 131 Table 2. Comparison between the proposed method and GMM.(GMM) by (Emmanouil et al., 2009).Table 2 shows the experimental results for the proposed methodology and the results of GMM.The comparison of the proposed methodology and GMM algorithms was carried out in a set of real microarray images, available at (Image Dataset, 2016).The experimental results are presented in Table 2.The average of the calculated GEI using the proposed methodology is equal to 87.77, and for the GMM is equal to 79.01.The experimental results show that the proposed methodology has ability to segment the spot and calculate the gene expression intensity, and the experimental result was comparable.Figure 6 represents the comparison between the two methodologies.Some spot in the extracted segments were hazy with low resolution as shown in Figure 7.Even though, the methodology was able to calculate the intensity for such images and this is due to the enhancement steps that were able to deal with such images and treat them to give correct results.Some spots were not fully expressed in the microarray images and this is due to incorrect staining of the microarray or incorrect capturing to some parts of the microarray images; this lead to low intensity value of the extracted segments.The proposed methodology was able to detect and identify the low intensity of the spot because the methodology relay on applying adaptive threshold on extracted segment images rather than using the microarray full image.Some methods of segmentation that relay on spot shape such as fixed circle segmentation or adaptive   circle segmentation (Zacharia and Maroulis, 2008;Antoniol and Ceccarelli, 2004) are not able to detect spot with unclear shape because they rely on the shape of the spot and not the extracted segment of the spot.Some cases are presented in Figure 8.

Sub microarray
However, the proposed methodology is able to calculate the intensity of the spots with unclear shapes because it depends on the segmentation using adaptive threshold rather than segmentation using the spot shape and calculating the intensity of the every extracted segments rather than extracting the spot based on its shape.
A comparison between the features of the proposed methodology with other methodologies and techniques was summarized in Table 3, which shows a comparison according to spot extraction, enhancement methods, noise removal, background, spot feature analysis and spot size used in microarray analysis.
To test the ability of the proposed methodology to get gene expression intensity, we tested the accuracy of the 20 sub microarray yeast genome with the noises, that is, Gaussian white noise, salt and pepper noise, multiplicative noise.In fact, the aim of this test is to show the ability of the proposed methodology to extract the spot and measure its intensity despite the circumstances of preparation of the microarray and the resulted noise.Since the proposed methodology relay on extracting each spot segment separately and removing the noise with the proposed steps then it was able to calculate the intensity with these different added noises.The results of intensity accuracy calculation are shown in Table 4.
Microarray image modified by adding Gaussian white noise of mean m and variance v to the image, where m is zero mean noise with 0.01 variance then, we calculate the GEI of the sub microarray images after adding the  Gaussian noise (we call it G-GEI).Microarray image is modified by adding salt and pepper noise to the image with noise density value.05.Then we calculate the GEI of the sub Microarray image after salt and pepper noise (we call it SP-GEI).Microarray image is also modified by adding multiplicative noise to the image, using the Equation 9 (Goodman, 1976), J= I+n*I (9) Where n is uniformly distributed random noise with mean 0 and variance v is equal to 0.04.Then we calculate the GEI of the sub Microarray image after adding multiplicative noise (we call it M-GEI).Table 4 expresses the accuracy results of 20 sub Microarray images after adding Gaussian, Salt and Pepper, and Multiplicative noises.
The results show that the proposed methodology was able to calculate the gene expression intensity values for the different spot segments after adding noise and this is due to using the local spot segment enhancement methods that is applied to each spot segment separately and then calculating the intensity of each spot which represent the gene expression level.The mean of the accuracy for the whole 20 yeast genome sub-microarray was 87.77.After the addition of Gaussian white noise, salt and pepper noise, and multiplicative noise the mean of the GEI values were 87.40 for G-GEI, 86.73 for SP-GEI and 85.72 for M-GEI.

Conclusion
This paper proposes a new methodology for spot gene expression intensity in microarray images.The proposed methodology analyzes spot in the extracted segments independently in the microarray image and uses the adaptive threshold for each segment separately rather than analyzing all the spots together in correspondence to the global background and noise of the microarray image.It also improves the microarray measurement model to calculate the gene expression intensity values.The experimental results showed that the proposed methodology was able to calculate the intensity values of spots in all extracted segments with accuracy of 87.77.

Figure 3 .
Figure 3. Red and green layers of a sub-microarray image.

Figure 4 .
Figure 4. Grid lines on red and green layers.

Figure 5 .
Figure 5. Extracting the spot segment based on spot location in the grid.
GEI = A/B + + e -(bi+ OOA) (7) Where, A/B is the log-transformed true fold change of gene of condition A with respect to condition B (log (intensity of red color / intensity of green color )), -dye effect, emeasurement error with ……… (2)

Figure 8 .
Figure 8. Spot segments with small portion of spot gene expression.

Table 1 .
The gene expression intensity.
E[e] = 0 and Var(e) =  2 , bi is the background intensity and OOA is the objects and the other artifacts intensity values.

Table 3 .
Comparison between proposed methodology and proposed techniques.