Seeking colorectal carcinoma related genes based on regulation network

Colorectal carcinoma (CRC) is one of the most frequent cancers worldwide with very high mortality. In this study, the microarray data was used to construct a regulation network to identify potential genes related with CRC. The results showed SP1, RELA, STAT1, PPARA and TP53 arise as hub nodes in transcriptome network in accordance with previous studies, and new transcription factors and target genes related with CRC were also identified, such as HIF1A and NFIC both regulated the PFKFB3 in response to CRC. In general, it is demonstrated that transcriptome network analysis is useful in identification of CRC related genes.


INTRODUCTION
Colorectal carcinoma (CRC), commonly known as bowel cancer includes cancerous growths in the colon, rectum, appendix and depending on the definition used can include those found in the anus as well.Major advances have been made in our understanding of molecular events leading to formation of CRC.CRC is a disease originating from the colon epithelial cells, most frequently as a result of mutations in the Wnt signaling pathway (Najdi et al., 2009).CRC is the third leading cause of cancer-related death in both men and woman in industrialized countries (Arnold et al., 2005).The most commonly mutated gene in all colorectal cancer is the APC gene.
The APC protein is a "brake" on the accumulation of βcatenin protein.Without APC, β-catenin accumulates to high levels and translocates (moves) into the nucleus, binds to DNA, and activates the transcription of genes that are normally important for stem cell renewal and differentiation but when inappropriately expressed at high levels can cause cancer.Besides, the p53 protein, produced by the TP53 gene, normally monitors cell division and kills cells if they have Wnt pathway defects.*Corresponding author.E-mail: luxs.xueshu@gmail.com.# These authors contributed equally.
Eventually, a cell line acquires a mutation in the TP53 gene and transforms the tissue from an adenoma into an invasive carcinoma.Other apoptotic proteins commonly deactivated in colorectal cancers are TGF-β and DCC (Deleted in Colorectal Cancer).
TGF-β has a deactivating mutation in at least half of colorectal cancers (Wu et al., 2009).PTEN, a tumor suppressor, normally inhibits PI3K, but can sometimes become mutated and deactivated (Markowitz and Bertagnolli, 2009).DCC commonly has deletion of its chromosome segment in colorectal cancer.Some oncogenes encoding the proteins KRAS, RAF, and PI3K are overexpressed in colorectal cancer.
DNA microarray analysis as a global approach is applied to investigate physiological mechanisms in health and disease (Spies et al., 2002).A high-throughput microarray experiment was designed to analyze genetic expression patterns and identify potential genes to target for CRC (Joyce and Pintzas, 2007).Genomic expression profiling evolves as a useful tool to identify novel pathomechanisms in human cancer (Guo, 2003;Vazquez-Naya et al., 2010).
The purpose of this paper is to propose a method that a transcriptome network can be developed to find a set of transcription factors which regulated the differently expression genes in CRC.Furthermore, significant regulated pathways were also identified by an impact analysis method.

Pathway data
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals (Kanehisa, 2002).The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms (http://www.genome.jp/kegg/).Total 130 pathways, involving 2287 genes, were collected from KEGG.

Regulation data
There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors (Wachi et al., 2005).The combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development (Brivanlou and Darnell, 2002).These transcription factors are grouped into 5 super class families, based on the presence of conserved DNAbinding domains.TRANSFAC database contains data on transcription factors, their experimentally-proven binding sites, and regulated genes (Wingender, 2008).
Transcriptional Regulatory Element Database (TRED) has been built in response to increasing needs of an integrated repository for both cis-and trans-regulatory elements in mammals (Jiang et al., 2007).TRED collected the transcriptional regulation information, including transcription factor binding motifs and experimental evidence.The curation is currently focusing on target genes of 36 cancer-related transcription factor (TF) families.774 pairs of regulatory relationship between 219 TFs and 265 target genes were collected from TRANSFAC (http://www.generegulation.com/pub/databases.html).5722 pairs of regulatory relationship between 102 TFs and 2920 target genes were collected from TRED (http://rulai.cshl.edu/TRED/).Combined the two regulation datasets, total 6328 regulatory relationships between 276 TFs and 3002 target genes were collected (Table 1).

Differentially expressed genes (DEGs) analysis
For the GSE22242 dataset, the limma method (Smyth, 2004) was used to identify DEGs.The DEGs only with the fold change value larger than 2 and p-value less than 0.05 were selected.

Co-expression analysis
For demonstrating the potential regulatory relationship, the Pearson Correlation Coefficient (PCC) was calculated for all pair-wise comparisons of gene-expression values between TFs and the DEGs.The regulatory relationships whose absolute PCC are larger than 0.6 were considered as significant.

Gene ontology analysis
The BiNGO analysis (Maere et al., 2005) was used to identify overrepresented gene ontology (GO) categories in biological process (Harris et al., 2004).

Regulation network construction
Using the regulation data that have been collected from TRANSFAC database and TRED database, we matched the relationships between differentially expressed TFs and its differentially expressed target genes.Base on the two regulation datasets and the pathway relationships of the target genes, we build the regulation networks by Cytoscape (Shannon et al., 2003).Base on the significant relationships (PCC > 0.6 or PCC < -0.6) between TFs and its target genes, 61 putative regulatory relationships were predicted between 29 TFs and 37 target genes.

Significance analysis of pathway
We adopted an impact analysis that includes the statistical significance of the set of pathway genes but also considers other crucial factors such as the magnitude of each gene's expression change, the topology of the signaling pathway, their interactions, etc. (Draghici et al., 2007).In this model, the Impact Factor (IF) of a pathway Pi is calculated as the sum of two terms: (1) The first term is a probabilistic term that captures the significance of the given pathway Pi from the perspective of the set of genes contained in it.It is obtained by using the hyper geometric model in which pi is the probability of obtaining at least the observed number of differentially expressed gene, Nde, just by chance (Tavazoie et al., 1999;Draghici et al., 2003).
The second term sums up the absolute values of the perturbation factors (PFs) for all genes g on the given pathway Pi.The PF of a gene g is calculated as follows: (2) In this equation, the first term ΔE (g) captures the quantitative information measured in the gene expression experiment.The factor ΔE (g) represents the normalized measured expression change of the gene g.The first term ΔE (g) in the above equation is a sum of all PFs of the genes u directly upstream of the target gene g, normalized by the number of downstream genes of each such gene Nds(u), and weighted by a factor βug, which reflects the type of interaction: βug = 1 for induction, βug = −1 for repression (KEGG supply this information about the type of interaction of two genes in the description of the pathway topology).USg is the set of all such genes upstream of g.We need to normalize with respect to the size of the pathway by dividing the total perturbation by the number of differentially expressed genes on the given pathway, Nde(Pi).In order to make the IFs as independent as possible from the technology, and also comparable between problems, we also divide the second term in Equation 1 by the mean absolute fold change ΔE, calculated across all differentially expressed genes.The result of the significance analysis of pathway was shown in Table 3.

Regulation network between TFs and pathways
To further investigate the regulatory relationships between TFs and pathways, we mapped DEGs to pathways and got a regulation network between TFs and pathways (Figure 2).

Regulation network construction in colorectal carcinoma
To get pathway-related DEGs of colorectal carcinoma, we obtained publicly available microarray data sets GSE22242 from GEO.After microarray analysis, the differentially expressed genes with the fold change value larger than 2 of GSE22242 and p-value less than 0.05 were selected.594 genes were selected as DEGs from GSE22242.To get the regulatory relationships, the coexpressed value (PCC ≥0.6) was chose as the threshold.Finally, we got 61 regulatory relationships between 29 different expressed TFs and their 37 differently expressed target genes.By integrating the regulatory relationships above, a regulation network of colorectal carcinoma was built between TFs and its target genes (Figure 1).In this network, SP1, RELA, STAT1, PPARA and TP53 with higher degrees form a local network which suggesting that these TFs may play an important role in colorectal carcinoma.Besides, the MET was regulated by 6 TFs and CHGA target gene was regulated by 4 TFs also observed in our network.

GO analysis of the regulation network in colorectal carcinoma
Several GO categories were enriched among these Yang et al. 1469 genes in the regulatory network, including positive regulation of transcription, regulation of transcription from RNA polymerase II promoter, positive regulation of gene expression and positive regulation of cellular metabolic process and so on (Table 2).

Significant pathway in colorectal carcinoma
To identify the relevant pathways changed in CRC, we used a statistical approach on pathway level.Significance analysis at single gene level may suffer from the limited number of samples and experimental noise that can severely limit the power of the chosen statistical test.
Pathway can provide an alternative way to relax the significance threshold applied to single genes and may lead to a better biological interpretation.So, we adopted a pathway based impact analysis method that contained many factor including the statistical significance of the set of differentially expressed genes in the pathway, the magnitude of each gene's expression change, the topology of the signaling pathway, their interactions and so on.The impact analysis method yields many significant pathways contained Phosphatidylinositol signaling system, Adherens junction, Antigen processing and presentation and so on (Table 3).

Regulation network between TFs and pathways in colorectal carcinoma
To further investigate the regulatory relationships between TFs and pathways, we mapped DEGs to pathways and got a regulation network between TFs and pathways (Figure 2).In the network, SP1, RELA, STAT1, PPARA and TP53 were shown as hub nodes linked to lots of colorectal carcinoma related pathways.NFIC, PPARA both regulated Butanoate metabolism, Valine, leucine and isoleucine degradation, PPAR signaling pathway and Terpenoid backbone biosynthesis.Fatty acid metabolism, retinol metabolism and metabolism of xenobiotics by cytochrome P450 were regulated by lots of TFs.

DISCUSSION
From the result of regulation network construction in CRC, we could found that many TFs and pathways closely related with CRC have been linked by our method.The gene SP1, RELA, STAT1, PPARA and TP53 arise as hub nodes in our transcriptome network in agreement with previous studies.Many target genes and pathway are also identified to play an important role in CRC directly or indirectly.DNA-dependent protein kinase (DNA-PK) functions in DNA double-strand breads repair in colorectal cancer, which is regulated by Sp1 throygh recognition elements in DNA-PK promoter region (Hosoi et al., 2004).Telomere repeat binding factor 2 (TRF2) plays a key role in the protective activity of telomere and is overexpression in CRC.Study showed transcription factor Sp1 is also involved in upregulation of TRF2, that is, TRF2 is overexpressed in CRC mediated by Sp1 regulation (Dong et al., 2009).
NF-kB is always bound to REL proteins to form the complex.One of them, the p50 (NFKB1)/p65 (RELA) heterodimer is the major NF-kB complex.So NF-kB usually refers to a p50-p65 (RelA) dimmer.The activity of NF-kB is controlled by inhibitory protein IkB and IKK (IkB kinase).RelA/NF-kB was increased expression in CRC tissue to play an important role in the transition from colorectal adenoma with low-grade dysplasia to adenocarcinoma in the pathogenesis of colon cancer in humans (Yu et al., 2003(Yu et al., , 2004)).
STAT1 is a member of the STAT protein family.In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo-or heterodimers that translocate to the cell nucleus where they act as transcription activators.The expression levels of STAT1 have been found to be reduced in transformed intestinal epithelial cells, consistent with tumor suppressor properties of STAT1 (Klampfer, 2008).MicroRNAs (miRNAs) have emerged as important gene regulators and are recognized as key players in colon cancer, but STAT1 were verified as direct miR-145 targets (Gregersen et al., 2010).Peroxisome proliferator-activated receptor alpha (PPARA) gene is a nuclear transcription factor influencing the expression of target genes involved in cell proliferation, cell differentiation and in immune and inflammation responses.PPARA may be screened as one candidate TSGs on 22q13 region contributing to carcinogenesis and progression of sporadic CRC (Zheng et al., 2006).Tumor protein p53 is a DNA-binding protein which responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism.
TP53 gene mutations were expressed in CRCs.And majority of the TP53 mutations occur in the core domain (Exons 5 to 8), the most important region responsible for folding and therefore, for stabilization of the tertiary structure of the protein.These mutations result in the loss of p53-binding ability to DNA and eventually its function (Al-Kuraya, 2009).MYC protein as a transcription factor is a multifunctional, nuclear phosphor-protein that plays a role in cell cycle progression, apoptosis and cellular transformation.Over-expression of c-myc exists in CRC, which may play important roles in the carcinogenesis of CRC (Sanchez-Pernaute et al., 2005).The proto-oncogene MET product is the hepatocyte growth factor receptor and encodes tyrosine-kinase activity.The primary single chain precursor protein is post-translationally cleaved to produce the alpha and beta subunits, which are disulfide linked to form the mature receptor.C-Met is over-expressed and plays an important role in invasion and metastasis of human CRC cells (Herynk, 2002).Besides the gene have been proved by the previous studies, some new gene not widely study are also identified by our method, such as CHGA, ADH1C and so on.CHGA protein belongs to a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins.It is found in secretory vesicles of neurons and endocrine cells.The presence of CHGA positive cells occuring in scattered elements or in clusters within human colonic adenocarcinomas has been documented (Romeo et al., 2002).ADH1C (Class I alcohol dehydrogenase, gamma subunit) is a member of the alcohol dehydrogenase family exhibiting high activity for ethanol oxidation and plays a major role in ethanol catabolism.Chronic alcohol consumption is a risk factor for colorectal cancer.ADH1C exists resulting in different acetaldehyde concentrations following ethanol oxidation in alcohol-related colorectal carcinogenesis (Homann et al., 2009.From the result of regulation network between TFs and pathways in CRC, we could found that there are many pathways such as PI3K-Akt pathway, adherens junction and antigen processing and presentation closely related with CRC have been linked by our method.PI3K-Akt pathway can be regarded as an important signaling mechanism for the growth and maintenance of colon carcinoma cells.PIK3CG, a catalytic subunit of phosphatidylinositide 3-kinase (PI3K), was found reduced in colon cancer cells by CpG hypermethylation raising the hypothesis that PIK3CG might contribute to the growth and progression of colorectal cancers (Semba et al., 2002).Identically, mutations in the PIK3CA gene, which encodes the p110ácatalytic subunit of phosphatidylinositol 3-kinase (PI3K), have also been reported in human colorectal cancer and activation of the downstream targets Akt and p70S6K in vivo and NIH 3T3-transforming ability to further bring about carcinogenesis (Ikenoue et al., 2005).
Adherens junctions, desmosomes, and tight junctions are three primary types of cell-cell junctional complexes.Intracellular adhesion in adherens junctions is mediated by E-cadherin, which are involved in the organization and maintenance of intestinal epithelial structure and suppression of tumour invasion.E-cadherin is associated with the actin cytoskeleton through cytoplasmic proteins, such as β-catenins, together forming the cadherin/catenin complex.Study showed decreased expression of the cadherin/catenin complex was associated with colorectal tumour progression and invasion (Hao et al., 1997).And palladin is an integral component of adherens junctions and plays a role in the localization of E-cadherin to the junctions.The loss of palladin may be an integral part of epithelial-mesenchymal transition, an early step in the metastatic spread of colon carcinoma (Tay et al., 2010).
Antitumor inflammatory response is known to inhibit tumor growth in CRC.The density and functionality of tumor-infiltrating lymphocytes (TIL) is regulated by the antigen processing machinery through regulator proteins such as transporters associated with antigen processing (TAP) and major histocompatibility complex (MHC) class I antigen.TAP1 and TAP2 expression were found to be significantly associated with MHC class I antigen expression.Increased density of CD8 (+) TIL was predominantly found in TAP1, TAP2 and MHC class I antigen-positive cases.Down-regulation of the antigen processing machinery would cause a loss of inflammatory response in colorectal cancer (Kasajima et al., 2010).
The basic understanding of the mechanisms underlying the function of CRC genes is important.A deeper understanding of transcription factors and their regulated genes remain an area of intense research activity in futures.Our regulation network is useful for investigating the complex interaction mechanisms of transcription factors and their target genes in CRC.The results indicated that some new transcription factors and target genes were found related to CRC in our study.Besides, many pathways were also identified to be linked with CRC through our method, such as phosphatidylinositol signaling system, adherens junction and antigen processing.The results will have important implications in drug discovery and pharmacology against CRC (Vazquez-Naya et al., 2010).However, further experiments are still needed to confirm the conclusion.

Figure 1 .
Figure 1.Regulation network construction in colorectal carcinoma.

Table 1 .
Regulation data form TRANSFAC and TRED.

Table 2 .
GO biological process analysis.