African Journal of
Biotechnology

  • Abbreviation: Afr. J. Biotechnol.
  • Language: English
  • ISSN: 1684-5315
  • DOI: 10.5897/AJB
  • Start Year: 2002
  • Published Articles: 12501

Full Length Research Paper

DEXseq and Cuffdiff approaches weighing differential spliced genes exons modulation in estrogen receptor β (Erβ) breast cancer cells

Noel Dougba Dago
  • Noel Dougba Dago
  • Unité de Formation et de Recherche (UFR) Sciences Biologiques, Unité Pédagogique et de Recherche de Génétique, Université Péléforo Gon Coulibaly BP1328 Korhogo, Côte d’Ivoire.
  • Google Scholar
Nafan Diarrassouba
  • Nafan Diarrassouba
  • Unité de Formation et de Recherche (UFR) Sciences Biologiques, Unité Pédagogique et de Recherche de Génétique, Université Péléforo Gon Coulibaly BP1328 Korhogo, Côte d’Ivoire.
  • Google Scholar
Inza Jesus Fofana
  • Inza Jesus Fofana
  • Unité de Formation et de Recherche (UFR) Sciences Biologiques, Unité Pédagogique et de Recherche de Génétique, Université Péléforo Gon Coulibaly BP1328 Korhogo, Côte d’Ivoire.
  • Google Scholar
Jean-Luc Aboya Moroh
  • Jean-Luc Aboya Moroh
  • Unité de Formation et de Recherche (UFR) Sciences Biologiques, Unité Pédagogique et de Recherche de Génétique, Université Péléforo Gon Coulibaly BP1328 Korhogo, Côte d’Ivoire.
  • Google Scholar
Hermann-Desire Lallie
  • Hermann-Desire Lallie
  • Unité de Formation et de Recherche (UFR) Sciences Biologiques, Unité Pédagogique et de Recherche de Génétique, Université Péléforo Gon Coulibaly BP1328 Korhogo, Côte d’Ivoire.
  • Google Scholar
Didier Martial Yao Saraka
  • Didier Martial Yao Saraka
  • Unité de Formation et de Recherche (UFR) Sciences Biologiques, Unité Pédagogique et de Recherche de Génétique, Université Péléforo Gon Coulibaly BP1328 Korhogo, Côte d’Ivoire.
  • Google Scholar
Olefongo Dagnogo
  • Olefongo Dagnogo
  • Unité de Formation et de Recherche (UFR) Biosciences, Laboratoire de Pharmacodynamie Biochimie, Université Felix Houphoet-Boigny BP V 34 Abidjan 01, Côte d’Ivoire.
  • Google Scholar
Souleymane Silue
  • Souleymane Silue
  • Unité de Formation et de Recherche (UFR) Sciences Biologiques, Unité Pédagogique et de Recherche de Génétique, Université Péléforo Gon Coulibaly BP1328 Korhogo, Côte d’Ivoire.
  • Google Scholar
Armel Herve Nwabo Kamdje
  • Armel Herve Nwabo Kamdje
  • Department of Biomedical Sciences, Faculty of Science, University of Ngaoundere P. O. Box 454 Ngaoundere, Cameroon.
  • Google Scholar
Lamine Baba-Moussa
  • Lamine Baba-Moussa
  • Laboratoire de Biologie et de Typage Moléculaire en Microbiologie, Faculté des Sciences et Techniques, Université d’Abomey-Calavi, Cotonou, Benin.
  • Google Scholar
Alessandro Weisz
  • Alessandro Weisz
  • Laboratory of Molecular Medicine and Genomics, Department of Medicine and Surgery, University of Salerno, Via S. Allende, 1, Baronissi, SA 84081, Italy.
  • Google Scholar


  •  Received: 28 December 2016
  •  Accepted: 09 June 2017
  •  Published: 21 June 2017

 ABSTRACT

While the alternative transcription and splicing mechanisms have long been known for some genes like oncogenes, their prevalence in almost all multi-exon genes has been recently realized with the increasing application of high-throughput experimental methods, named Next Generation RNA Sequencing (NGS). Henceforth, understanding the regulation of these processes in comparisons between cell types and cancer requests, sensitive and specific bioinformatics as well as bio-statistic approaches, that is, Cufflinks/Cuffdiff, DEXseq and RESEM, detecting gene transcript/isoforms and exons abundance is necessary. Isoforms and exons expression analysis by NGS is complicated by several sources of measurement variability causing numerous statistical defies. Here, with the purpose of minimizing this statistical challenge, we integrated both Cufflinks/Cuffdiff and DEXseq bioinformatics approaches assessing whole alternative splicing (AS) events, focusing on alternative transcripts regulation and their exons modulation respectively, by processing our previous prepared Estrogen Receptor β (Erβ+ and Erβ-) breast cancer (BC) cells, stimulated by estradiol (E2). Results showed that Estradiol (E2) induced Erβ+ BC (Erβ+E2), exhibited dissimilar reply as opposed to the other’s analyzed BC cell lines in terms of intragenic, exons and junction reads count ratio. Relationship analysis between expressed genes and transcript isoforms, suggested a substantial role of alternative promoters in AS event occurrence in Erβ+ BC as opposed to Erβ- BC. Indeed, merging Cufflinks/Cuffdiff and DEXseq approaches, 79 multi-exon genes were detected as statistically differentially modulated (spliced) in Erβ+ hormone induced BC cell line, and around 38% of these spliced genes claimed to be induced by alternative promoters. The present survey discriminated between several cancer specific alternative splicing genes like LIFR a BC metastasis suppressor, PBX1 a pioneer factor defining aggressive Erβ- BC and PHLPP2 a tumor suppressor, as exhibiting significant exon modulation in early AS occurrence in hormone responsive Erβ+ BC exclusively. Although, our findings supported dissimilar reply comparing both Cuflinks/Cuffdiff and DEXseq approaches called AS events, it is noteworthy to underline their relative agreement, evaluating spliced genes functional annotation as well as their complementarity performing whole AS survey. This study therefore proposed the integration between Cufflinks/Cuffdiff and DEXseq tools as a reasonable complementary methodology assessing full AS pattern in hormone responsive Erβ BC cells.

 

Key words: Cufflinks/Cuffdiff, DEXseq, RNA-Seq, alternative splicing (AS), exons, transcript isoforms, estrogen receptor β (ERβ), breast cancer (BC) cells.


 INTRODUCTION

Alternative splicing is a central cellular process that produces different mRNA transcript isoforms from a single gene. The qualitative and quantitative identification of such transcript isoforms is more complex as well as essential for understanding the different roles of alternatively spliced genes occurrence in a cell. However, detection of disease-specific transcript isoforms is an important task because aberrant splicing is known to be responsible for various diseases (Kim et al., 2008) and associated with different cancer types (Christofk et al., 2008; Venables et al., 2009). Several studies provided an intriguing insight into the mechanism of cancer specific alternative transcription and alternative splicing, which have long been implicated in the development of cancer. Cancer-specific isoforms have enhanced proliferative, invasive, and migratory abilities and provide a survival advantage to the tumor cells, suggesting that there is specific manipulation of the alternative event regulation in cancer that is beyond the general lack of fidelity of the splicing or the alternative transcription regulatory machinery. It is conceivable that the balanced expression of isoforms, rather than just activation or inhibition of those genes, may hold the key to impeding tumor growth and accordingly it is important to target the disease associated genes at the isoform level rather than at the gene level. The well-known application of exon arrays (Moller-Levet et al., 2009) and the advent of massive parallel sequencing named Next Generation RNA Sequencing (NGS-RNA-Seq) are allowing whole cancer genomes and transcriptomes to be sequenced with extraordinary speed and accuracy, providing insight into the bewildering complexity of isoform-specific expression in cancer genomes (Cancer Genome Atlas, 2012). Detecting alternative isoform regulation is inherently difficult in RNA-Seq, as sequencer reads are often one or more orders of magnitude shorter than the transcripts themselves. While there are several utilities that attempt to de-convolute read data into isoform abundances, the accuracy and robustness of these methods is difficult to establish (Chandramohan et al., 2013; Zhang et al., 2014). In addition, transcript isoforms expression estimates seem to vary considerably between different tools, and generally depend on the quality and completeness of the transcript assembly (Kanitz et al., 2015; Rehrauer et al., 2013). Existing well-established methods detect alternative splicing process mainly by considering sequencing reads that map uniquely to single isoforms or by assembling transcripts and estimating the most likely isoform abundance levels according to the given sequencing reads (Jeffrey and Zhong, 2011). Short Nassa et al., 2011; Tarallo et al., 2011).


 MATERIALS AND METHODS

Erβ+ breast cancer cell model preparation and gene and transcript isoforms differential expression analysis
 
MCF7 and 5B12 breast cancer (BC) cell line model have been prepared to mimic Erβ/Erβ; (Erβ+) and Erβ/Erα; (Erβ-) breast cancer (BC). The preparation of Erβ+ and Erβ- BC cell lines have been entirely described in Nassa et al. (2011), Grober et al. (2011) and Dago et al. (2015). Differential genes, transcript isoforms and exons expression analysis, assessing the effect of early stimulation of estradiol (E2) hormone on alternative splicing occurrence in breast tumor cell models, have been performed by Cufflinks/Cuffdiff (Trapnell et al., 2013) and DEXseq (Anders et al., 2012) bioinformatics/bio-statistic packages respectively. Next generation RNA sequencing (NGS RNA-Seq) data used for the present analysis have been deposited in the Gene Expression Omnibus genomics data public repository (http://www. ncbi.nlm.nih.gov/geo/) with Accession Number GSE64590.
 
RNA sequencing (RNA-Seq) data generation and gene and/or transcript isoforms expression measurement
 
1 µg micrograms of high-quality total mRNA was used as starting material for the Illumina mRNASeq library preparation kit and was prepared to manufacturer’s directions (Illumina). Libraries were sequenced on the Illumina Genome Analyzer II as 101 base pair paired-end reads. Tophat v.2.0.8 (Kim et al., 2013) was used to align all reads including junction-spanning reads back to the human genome (Homo sapiens Ensembl GRCh37). The reads alignment quality and distribution were estimated using SAMtools. Cufflinks v2.1.1 (Trapnell et al., 2013) was used to identify differential spliced genes, isoform transcript and gene expression changes between analyzed experimental groups and/or conditions. We defined statistical significance in expression q-value and/or adjusted p-value for multiple testing ≤ 0.05, and Fragments per Kilobase of exon per Million reads mapped (FPKM)>0.5, since FPKM is a measure of expression used in high-throughput sequencing data that is normalized for both transcript length and total number of reads sequenced.
 
DEXseq measuring exons differential modulation usage by RNA-sequence
 
DEXseq is a package for the statistical programming language R (R Development Core Team, 2009) available as open source software via the Bioconductor project (Gentleman et al., 2004). For the preparation steps, namely the flattening of the transcriptome annotation to counting bins and the counting of the reads overlapping each counting bin, two Python scripts are provided, which are built on the HTSeq framework (Anders et al., 2015). The first script takes a GTF file with gene models and transforms it into a GFF file listing counting bins, the second takes such a GFF file and an alignment file in the SAM format and produces a list of counts. The R package is used to read these counts, estimate the size factors and dispersions, fit the dispersion-mean relation and test for differential exon usage. Then, exons with significant change and/or modulation at a false discovery rate lower or equal to 0.05 (FDR≤0.05) have been selected as involve in alternative splice events. All bioinformatics and biostatistics analyses and comparisons  were  implemented  and  performed   using   in-house scripts written in Unix and R.
 
Functional annotation gene ontology analysis by DAVID
 
We performed Database for Annotation, Visualization and Integrated Discovery (DAVID http://david.abcc.ncifcrf.gov/) analysis (Glynn et al., 2003) focusing exclusively on the set of differentially spliced gene transcripts and/or isoforms, discriminated by both Cufflinks/Cuffdiff and DEXseq approaches. Then, we performed a gene ontology (GO) survey (FDR ≤0.05 with at least 2 fold enrichment) by processing significantly differentially spliced genes, evaluating alternative transcript and exon change and/or modulation in multi-exon genes.


 RESULTS

Reads sequences from both Erβ+ and Erβ- human BC cell line models high-throughput RNA sequencing (RNA-Seq) statistical analysis
 
mRNA sequencing experiment basing on illumina genome analyzer II (GAII) processing both Erβ+ and Erβ- hormone induced breast cancer (BC) cell lines, yielded approximately 94911602-99876796 million pair-end read (101 bp) sequences (Table 1). From these reads, low quality sequences, were eliminated, resulting in around 65-71 million reads corresponding to 68-70.2% of total reads for each processed replicate sample (Table 1). In total around 65-71 million reads were aligned to the Homo sapiens Ensembl GRCh37 reference genome (Table 1). The number of reads per genes and transcript isoforms were further normalized to Fragment per Kilobases of exon per Million mapped reads (FPKM). Then, in order to include a maximum number of genes and transcript isoforms, reducing as possible statistical type I error in calling differentially modulated and spliced genes as well as exon modulation events, we adopted 0.5 FPKM, as genes and/or transcript isoforms expression level threshold in both processed Erβ+ and Erβ- BC cell exemplars (Figure 1). Student test analysis, based on both loaded junctions and found junction parameters from transcriptome and/or genome reconstruction through Cufflinks package, suggested a reasonable difference (p-value <0.01) between Erβ+ BC cell line under estradiol stimulus (Erβ+E2) and the other’s analyzed BC cell conditions (Table 1). In other words, the present result supported that Erβ+ and Erβ- BC cell lines in normal conditions (Erβ+noE2, Erβ-noE2) as well as Erβ- BC cell line induced by E2 (Erβ- E2), exhibited the same behaviors as opposed to Erβ+E2 breast cancer cell reacting to E2 induction (Table 1).
 
 
 
Since each analyzed condition have been processed in three technical replicates, Erβ+E2_Rep1 indicates the replicate 1 of Erβ+ BC cell line condition under estradiol (E2) treatment, while Erβ+noE2_Rep1 indicates replicate 1 of the same cell line with any E2 treatment The same nomenclature has been adopted for Erβ-E2 and Erβ+noE2 analyzed samples conditions.
 
Genomic re-distribution of RNA sequencing (RNA-Seq) reads sequences evaluating breast cancer cell line reply to estradiol (E2) hormone stimulus
 
Next, we focused on genomic re-distribution of reads sequences, merging read count analysis results from HTSeq/DEXseq with those from Tophat2/Cufflinks software’s for each analyzed hormone induced Erβ BC cell line conditions (Erβ+E2, Erβ+noE2, Erβ-E2 and Erβ-noE2) appraising early alternative splicing events monitoring differential spliced genes and exons change in breast tumor. The present survey highlighted a strong difference in read distributions between Erβ+E2 BC cells and the other’s analyzed conditions (Erβ+noE2, Erβ-E2 and Erβ-noE2) (Figure 2). These results seem to be in agreement with above reported loaded and/or found junction analysis (Table 1), supporting subtly an agreement between HTSeq/DEXseq and Tophat2/ Cufflinks approaches, in genome reconstruction process and/or in genomic reads re-distribution analysis. Furthermore, the present survey exhibited a constancy rate, measuring exon read count as well as intragenic and intron reads values comparing Erβ+noE2, and both Erβ-E2 and Erβ-noE2 breast cancer cells (Figure 2), hypothesizing a similitude between the former’s assessing alternative splicing pathway in the present analyzed BC cells, reinforcing the link between estrogen receptor  beta (Erβ) and early transcription and mRNA splicing events in hormone responsive BC cells. Considering as a whole, the present results supported a dynamic molecular reply of Erβ+ BC cells as opposed to Erβ- BC in terms of alternative splicing events (Figure 2).
 
 
Assessment of expressed genes and their transcript isoforms proportion comparing hormone responsive Erβ+ and Erβ- breast cancer cell lines
 
We evaluated change in differential expressed gene and transcript isoforms including exclusively genes and transcript isoforms that exhibit a FPKM expression value ≥0.5 and significant statistical differential change at an adjusted p-value ≤0.05 by Cufflinks/Cuddiff approach in processed hormone responsive BC cell lines. Then, basing on previous analysis (Figure 1), we processed in total 6714 and 52499 genes and transcript/isoforms respectively (Figure 3). As expected, change in gene and transcript isoforms strongly contrast between both analyzed estrogen hormone responsive Erβ+ and Erβ- BC cell lines (Figure 3A and B) since, estimated Pearson correlations values comparing log 2 fold change assessing measurement between expressed genes and transcript isoforms resulted to lower 0.5 (R2 =0.23 and R2=0.31 for Erβ+ and Erβ- BC cell  respectively),  advising weak agreement between change in genes and their respective transcript isoforms, especially in Erβ+ BC cells (Figure 3). Hence, we focused on the cases where only transcript isoforms expression were significantly altered, alerting alternative promoter usage in alternative transcript isoforms modulation as well as in alternative splicing event occurrence in estrogen responsive BC cell line. Indeed, we showed that situations for which transcript isoform expression was significantly alerted as opposed to their corresponding genes resulted 2 fold more in hormone induced Erβ+ BC cell (Figure 3C). Taking together, the present results suspected a considerable involvement of alternative promoter usage in early alternative splicing occurrence in hormone responsive Erβ+ BC cell lines as opposed to Erβ- BC cell lines (Figure 3).
 
 
Identification of differentially spliced genes (DSGs) in hormone responsive Erβ+ breast cancer cell lines by Cufflinks/Cuffdiff approach
 
To identify the differences in splice ratios between Erβ+E2  and Erβ+noE2 BC cell line (Erβ+E2 vs. Erβ+noE2), we employed Cufflinks/Cuffdiff v2.1.1 package, which calculates the changes in the relative splice abundances by quantifying the square root of the Jensen Shannon divergence on all the primary transcripts that produce two or more isoforms. It is essential to note that the distributions of genes, and primary transcripts, and isoform expression level (FPKM) are comparable between the samples that are taken for the differential splicing test. Then, 213 genes randomly distributed on human chromosomes have been detected as significantly
 
differentially spliced (DSGs) at a false discovery rate ≤0.05 (FDR ≤0.05) monitoring early AS occurrence in hormone responsive Erβ+ BC cell lines (Figure 4 and Supplementary Table 1). Furthermore, basing on this result as well as on previous one (data non shown), we showed that Erβ+ BC cell lines exhibited 3 fold more differentially spliced genes with respect to Erβ- BC cells replying to estradiol hormone stimulus (Supplementary Tables 1 and 2) confirming previous results and observations (Figures 2 and 3). Therefore, we were interested in investigating DSGs exons change monitoring early AS pathway in hormone responsive Erβ+ BC, since exons modulation have been recognized as suitable process understating mechanism of transcript isoforms expression, proposing the former’s as potential tumors biomarkers as well as therapeutic target.
 
 
Assessment of differential spliced multi-exonic genes (DSGs), exons modulation by DEXseq in Erβ breast cancer cell lines
 
Here,  we   analyzed   through   the   DEXseq   approach, differential exons expression change, referring to previous detected differential spliced multi-exonic genes (Cufflinks/Cuffdiff) in hormone induced Erβ+ BC cell line. This analysis looks for difference across conditions (in Erβ+E2 vs. in Erβ+noE2) between quantities that are directly observable from shotgun data (read count data), such as the relative usage of each exon. Then, performing the test for differential exons used considering exon with at least 10 reads count in at least one analyzed condition and controlling false discovery rate (FDR) with the Benjamini Hochberg method, at 5% threshold (statistical stringency), 3682 and 122 exons from 1438 and 75 multi-exonic genes, claimed to be significantly differentially modulated in hormone responsive Erβ+ and Erβ- BC cell lines respectively (Supplementary Materials 1 and 2). Also, this result suggested that exons change ratio in analyzed differentially spliced multi-exonic genes was 1.6 fold more higher in hormone responsive Erβ+BC cell lines when compared to Erβ- BC cell line, supporting a potential high number of significantly alternatively modulated transcript isoforms involvement characterizing processed hormone responsive Erβ+BC cell line, highlighting   the   links   between   Erβ   and    early    AS occurrence in E2 hormone induced breast cancer. However, significantly modulated exons proportion survey by processing differentially spliced multi-exonic genes revealed that exons change in the analyzed hormone responsive BC cell lines, meanly regard the first 10 gene transcripts exons (more than 25%), advising and/or suspecting weakly solicitation of ending transcripts exons yielding alternative transcript isoforms in hormone responsive BC cells (Table 2).
 
 
DAVID analysis assessing functional annotation of differentially spliced genes discriminated by both Cufflinks/Cuffdiff and DEXseq tools
 
68 genes out the 73 processed differentially spliced genes (DSGs) discriminated by Cufflinks/ Cuffdiff approach at 5% false discovery rate in hormone responsive Erβ+ BC (Supplementary Table 2) were converted in a new list for functional annotation analysis by DAVID package showing at Benjamini correction referring to a p-adjusted value at 5%, that (i) 75% of differentially alternatively spliced genes called by Cufflinks/ Cuffdiff were genes that code for at least two isoforms due to pre-mRNA splicing event and that (ii) 74% of these genes were recognized as alternative splice variant. Furthermore, 42.64% of alternative spliced genes categorized by Cufflinks/ Cuffdiff approach have been discriminated to be localized in the nucleus (Figure 5A). In the same tendency  and   focusing   on   alternative   spliced exons events called by DEXseq methodology at 5% false discovery rate threshold with 2 fold exon change (high alternative splicing evidence), we showed that 55.35% of analyzed alternative spliced genes, coded for at least two isoforms due to pre-mRNA splicing event and that 54.91% of their transcript isoform variants were recognized as alternative splice variant. We also showed that 37.84% of all analyzed alternative spliced genes called by DEXseq approach have been discriminated to be localized in the nucleus (Figure 5B). These results, delicately suggested a relative agreement between the two considered and/or analyzed bioinformatics tools (Cufflinks/ Cuffdiff and DEXseq) assessing the performance of alternative splicing pathway in breast cancer cell line, since mRNA maturation in eukaryotic cells happened in cell’s nucleus. Finally, even if the present results suggested DEXseq approach as discriminating highest number of potential alternative splicing event in term of functional annotation as opposed to Cufflinks/Cuffdiff (Figure 5), it is noteworthy to underline their similitude evaluating (i) splice and isoform variant, (ii) alternative spliced genes localization in the nucleus and (iii) nucleolus in the present functional annotation survey (Figure 5).
 
 
Integration between Cufflinks/Cuffdiff and DEXseq data assessing alternative splicing event in Erβ breast cancer cell line
 
More than 34.74% of DSGs  (multi-exonic  genes) discriminated by Cufflinks/Cuffdiff approach in the hormone responsive Erβ+ BC cell, exhibited at least one significant exons change as DEXseq exon differential analysis (Figure 4). Indeed, LIFR gene, knows as a breast cancer metastasis suppressor, MATA2 gene involves in apoptosis process in human hepatoma, NBPF10 gene associated with several types of cancer, PBX1 gene that results engaged in the progression of breast cancer and PHLPP2 gene that inhibits cancer cell proliferation acting as tumor suppressor, detected as significantly differentially spliced by Cufflinks/Cuffdiff approach, displayed a significant exons change in the present analyzed hormone responsive BC cell line (Figure 4), proposing both Cufflinks/Cuffdiff and DEXseq data merging process as a valid methodology monitoring AS pathway in breast cancer. As DEXseq provides exons splicing visualization, we reported in Figure 6 an example of exons modulation evaluating alternative splicing happening in two fully differentially spliced genes MATA2 and PBX1 in E2 hormone induced BC exclusively (Erβ+ BC). Moreover, several generic tumor biomarkers such as SEPT9, known as a candidate for the ovarian tumor suppressor, PPP1R13B and NOTCH2 genes affecting cells differentiation implementation, proliferation and apoptotic programs and NBPF10 gene which results associated with several type of cancer, were recognized as exhibiting significant exons change in the hormone responsive Erβ+ BC cell line (Figure 4). We also showed that 36.11% of detected spliced genes merging both  Cuffdiff  and DEXseq approaches were regulated by significant alternative promoter use (Figure 4). Interestingly, well-noted biomarker genes like MATA2, SEPT9, NOTCH2 and PPP1R13B claimed to be under alternative promoter usage (p-value≤0.05) (Figure 4 and Supplementary Table 3), assessing alternative splicing pathway in Erβ+ BC cells. So, the present results suspecting the involvement of some remarkable cancer biomarkers controlling AS path in hormone responsive BC cell lines, proposed Cufflinks/Cuffdiff and DEXseq approaches data integration as reasonable and complementary scheme weighing whole AS pathways in breast cancer cells.
 


 DISCUSSION

Alternative splicing (AS) is a means of expressing several or many different transcripts from the same genomic DNA and results from the inclusion of a subset of the available exons for a particular protein. By excluding one or more exons, certain protein domains may be lost from the encoded protein, which can result in protein function damage or gain. Several types of alternative splicing have been described resulting in exon skipping; alternative 5′ or 3′ splice sites; mutually exclusive exons and much more rarely and intron retention, approving the complexity of AS study and the needed of accurate bioinformatics or bio-statistics systems evaluating this phenomena in eukaryotic cells. However, many alternative splicing events have been noted in human development, especially in the brain and the testes (Grabowski and Black, 2001; Venables, 2002) as well as in cancer, including the use of alternative individual splice sites, alternative exons, and alternative introns. Therefore, whole alternative splicing survey must include capable sensitive and specific bioinformatics tools, able to measure accurately alternative transcript isoforms as well as multi-exonic genes transcripts exons abundance, since recent Next Generation RNA Sequencing (NGS RNA-Seq) analysis provides innovative platform exploring in detail cell transcriptome and/or genome. However, a recurrent challenge in RNA-Seq experimentation regards application of adequate bioinformatics tools analyzing huge quantity and complex yielded data. Also, RNA-Seq coupled with well-established bioinformatics approaches; that is, Cufflinks/Cuffdiff (Trapnell et al., 2013) and DEXseq (Anders et al., 2012), allowed a suitable whole genome and transcriptome reconstruction measuring gene transcript isoforms as well as exons expression level, performing differential expression analysis between cells types. In such analysis, annotated regions of the considered genome can be expressed (that is, exons), describing how the pre-mRNAs are spliced into transcripts. While there are several utilities that attempt to de-convolute read data into isoform abundances, the accuracy and robustness of these methods is difficult to establish  (Chandramohan  et  al.,  2013;  Zhang   et   al.,  2014). Isoform expression estimates seem to vary considerably between different tools, and generally depend on the quality and completeness of the transcript assembly (Kanitz et al., 2015; Rehrauer et al., 2013). Also, Cufflinks/Cuffdiff methodology is not more informative regarding exons modulation measuring alternative splice transcript isoforms pattern. Simon Anders et al. (2012) demonstrate the versatility of DEXseq package by applying it to several data sets facilitating the study of regulation and function of alternative exon usage on a genome-wide scale. Guided by these observations, we proposed merging between both Cufflinks/Cuffdiff and DEXseq investigating meticulously alternative transcript isoforms exons abundance in our previous studied hormone responsive Erβ+ breast cancer cells, since Erβ significantly affects estrogen-induced early transcription and mRNA splicing in hormone-responsive BC cells (Dago et al., 2015). Then, genomic re-distribution of RNA-Seq reads sequences from hormone responsive Erβ BC cells by HTSeq/DEXseq and TopHat/Cufflinks, suggested strong difference (p-value <0.05) between estrogen induced Erβ+ BC cells (Erβ+E2) and the other’s analyzed breast cancer cells conditions (Erβ+ with any E2 stimulus; Erβ+noE2, Erβ- induced by E2; Erβ-E2 and Erβ- without any E2 stimulus; Erβ-noE2), since exhibiting dissimilar attitude and/or behaviors considering loaded and fund reads junction as well as exon intragenic and intron reads distribution (Figure 2 and Table 1), subtly evoking agreement between both Cufflinks and DEXseq approaches in genomic and transcriptomic reconstruction analysis. In addition, this analysis suggested high variability between Erβ+E2 and Erβ+noE2 conditions suspecting a high number of differentially expressed genes and/or transcript isoforms in the latter’s as opposed to hormone induced Erβ- BC cells. However, comparative analysis assessing the relationship between change in expressed genes and their respective transcript isoforms, showed a strong evidence of alternative promoter used in alternative splicing pattern processing hormone responsive Erβ+ BC cell line, as the ratio between expressed transcript isoforms as oppose to expressed genes, was 2 fold more higher in the latter’s (Erβ+ BC) when compared to Erβ-BC cell line (Figure 3). Taking together, these results reinforced alternative splicing evidence in hormone responsive Erβ+BC as opposite to Erβ- BC cell lines alerting a significant participation of alternative promoter regulating AS events occurrence in breast cancer process. Several studies have shown that the occurrence of alternative transcriptional termination and splicing is higher in genes with alternative promoters, and the choice of alternative promoter and transcription termination can influence the alternative splicing pattern of the pre-mRNA (Winter et al., 2007; Albulescu et al., 2012). Nevertheless, it is estimated that participate to the alternative splicing phenomena proposing genes transcript isoforms exons modulation   as   a    suitable    approach    understanding alternative transcript isoform expression in human genome. Hence, detailed alternative splicing analysis by NGS RNA-Seq need innovative bioinformatics scheme capable to integrate accurately transcript isoforms and exon expression analysis. Based on this, we combined differentially spliced genes and transcript isoforms exons modulation assessing early AS occurrence in estradiol hormone induced Erβ+ BC cells, emphasizing strong involvement of transcript isoforms exons change in alternative splicing mechanism in breast tumor. Then, integrating Cufflinks/ Cuffdiff and DEXseq approaches, our findings proposed alternative promoter’s usage as well as significant exon change as key molecular events favoring AS pattern in hormone responsive breast cancer cells (Figure 4). Also, it is noteworthy to underline the weak involvement of ended exons of transcript isoforms evaluating alternative splicing occurrence in the present hormone induced Erβ BC cells (Table 2). Furthermore, around 34.74% of significantly differentially spliced genes by Cufflinks/ Cuffdiff have shown significant exons change in DEXseq analysis testing for differential usage of exon regions as a proxy for alternative isoform regulation as well as providing a powerful suite of visualization tools (Figure 6) (Love et al., 2014). Interestingly, the present analysis identified several remarkable cancer biomarkers, like MDM2 and NF1 genes, known as cancer specific alternative splicing genes (Venables, 2004), PBX1 gene, revealed as a novel pioneer factor defining aggressive Erβ- breast tumors, as it guides Erα genomic activity to unique genomic regions promoting a transcriptional program favorable to breast cancer progression (Magnani et al., 2011), LIFR gene, knows as a breast cancer metastasis suppressor (Chen et al., 2012), MATA2 gene, involved in human colon cancer progression (Chen et al., 2007) and PHLPP2 tumor suppressor (Liu et al., 2011) and AIB1/NCOA3 hormone signaling in breast cancer as exhibiting significant exon change in hormone responsive Erβ BC cells, demonstrating the key role of exon modulation in early estrogen induced breast cancer alternative splicing pattern (Figure 4, Supplementary Materials 1 and 2). Furthermore, 36.11% of detected differentially spliced genes by merging Cuffdiff and DEXseq approaches were recognized as including alternative promoter, and some well-noted cancer biomarker (MATA2, SEPT9, NOTCH2 and PPP1R13B) claimed to be under alternative promoter usage (p-value≤0.05) (Figure 4 and Supplementary Table 3), monitoring early alternative splicing pathway in our processed hormone responsive Erβ+ BC cells (Figure 4). So, the present results by evoking the involvement of some remarkable cancer biomarkers, controlling AS pattern in the present hormone responsive BC cell lines, proposed Cufflinks/Cuffdiff and DEXseq genomic data integration, as a reasonable complementary scheme assessing whole AS occurrence in hormone responsive breast cancer cell.  Moreover,  concordance  between  Cufflinks/ Cuffdiff and DEXseq has been supported in part by DAVID functional annotation analysis since 42.64 and 37.84% of alternative spliced genes categorized by previous mentioned approaches respectively, have been discriminated to be localized in cell’s nucleus, suggesting a strong contribution of the latter’s (alternative sliced genes) regulating AS pattern occurrence in hormone responsive BC cell line (Figure 5). Also, even if the present study admitted conflicting results between Cufflinks/Cuffdiff and DEXseq approaches in alternative splicing analysis (Figure 5), we showed that an adequate integration between these bioinformatics/biostatistics tools can help to wholly investigate alternative splicing occurrence in breast cancer disease reducing false discovery event rate. Furthermore, accuracy in statistical analysis processing alternative transcript isoforms regulation in genomic and transcriptomic studies, has been supported by recent work introducing a new method that builds on the statistical techniques used by the well-established DEXseq package to detect differential usage of both exonic regions and splice junctions helping differential usage of novel splice junctions without the need for an additional isoform assembly step (Stephen and James, 2016).


 CONCLUSION

RNA-Seq providing more than simple measurements of gene and/or transcript-level expression, can be used to study more complex regulatory phenomena at the isoform level, even when the isoforms in question are unannotated. Numerous tools have been developed to detect alternative isoform regulation exhibiting in some cases conflicting results. In contrast to this tendency, the present study proposed integration between well-established Cufflinks/Cuffdiff and DEXseq approaches as reasonable system assessing alternative splicing events in hormone responsive Erβ breast cancer cells. Finally, our findings exhibited significant exon modulation of multi-exonic gene transcripts regulated by alternative promoters, as recurrently solicited in early AS pattern in estrogen induced BC cells.


 CONFLICT OF INTERESTS

The authors have not declared any conflict of interests.


 SUPPLEMENTARY

 



 REFERENCES

Albulescu LO, Sabet N, Gudipati M, Stepankiw N, Bergman ZJ, Huffaker TC, Pleiss JA (2012). A quantitative high-throughput reverse genetic screen reveals novel connections between Pre-mRNA splicing and 5′ and 3′ end transcript determinants. PLoS Genet. 8(3):e1002530.
Crossref

 

Anders S, Reyes A, Huber W (2012). Detecting differential usage of exons from RNA-seq data. Genome Res. 22(10):2008-2017.
Crossref

 
 

Anders S, Pyl PT, Huber W (2015). HTSeq A Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166-169.
Crossref

 
 

Cancer Genome Atlas (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature 487:330-337.
Crossref

 
 

Chandramohan R, Wu PY, Phan H, Wang MD (2013). Benchmarking RNA-Seq quantification tools. Conf. Proc. IEEE Eng. Med. Biol. Soc. pp. 647-650.
Crossref

 
 

Chen D, Sun Y, Wei Y, Zhang P, Rezaeian AH, Teruya-Feldstein J, Gupta S, Liang H, Lin HK, Hung MC, Ma L (2012). LIFR is a breast cancer metastasis suppressor upstream of the Hippo-YAP pathway and a prognostic marker. Nat. Med. 18(10):1511-1517.
Crossref

 
 

Chen H, Xia M, Lin M, Yang H, Kuhlenkamp J, Li T, Sodir NM, Chen YH, Josef-Lenz H, Laird PW, Clarke S, Mato JM, Lu SC (2007). Role of methionine adenosyltransferase 2A and S-adenosylmethionine in mitogen-induced growth of human colon cancer cells. Gastroenterology 133(1):207-218.
Crossref

 
 

Christofk HR, Vander Heiden MG, Harris MH, Ramanathan A, Gerszten RE, Wei R, Fleming MD, Schreiber SL, Cantley LC (2008). The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth. Nature 452:230-233.
Crossref

 
 

Dago DN, Claudio S, Antonio R, DomenicoM, Giorgio G, Giovanni N, Maria R, Francesca R, Roberta T, Alessandro W (2015). Estrogen receptor beta impacts hormone-induced alternative mRNA splicing in breast cancer cells. BMC Genomics 16(367):1-13.
Crossref

 
 

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004). Bioconductor: open software development for computational biology and bioinformatics Genome Biol. 5(10):1-39.
Crossref

 
 

Glynn DJ, Brad TS, Douglas AH, Jun Y, Wei G, Clifford HL and Richard AL (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4:1-11.
Crossref

 
 

Grabowski PJ, Black DL (2001). Alternative RNA splicing in the nervous system. Prog. Neurobiol. 65:289-308.
Crossref

 
 

Grober OM, Mutarelli M, Giurato G, Ravo M, Cicatiello L, De Filippo MR, Ferraro L, Nassa G, Papa MF, Paris O, Tarallo R (2011). Global analysis of estrogen receptor beta binding to breast cancer cell genome reveals an extensive interplay with estrogen receptor alpha for target gene regulation. BMC Genomics 12(36):1-16.
Crossref

 
 

Jeffrey AM, Zhong W (2011). Next-generation transcriptome assembly. Nat. Rev. Genet. 12:671-682.
Crossref

 
 

Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G and Zavolan M (2015). Comparative assessment of methods for the computational inference of transcript isoform abundance from RNAseq data. Genome Biol. 16:1-26.
Crossref

 
 

Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4):1-13.
Crossref

 
 

Kim E, Goren A, Ast G (2008). Insights into the connection between cancer and alternative splicing. Trends Genet. 24:7-10.
Crossref

 
 

Liu J, Stevens PD, Li X, Schmidt MD, Gao T (2011). PHLPP-mediated dephosphorylation of S6K1 inhibits protein translation and cell growth. Mol. Cell. Biol. 31(24):4917-4927.
Crossref

 
 

Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12):550.
Crossref

 
 

Magnani L, Elizabeth BB, Xiaoyang Z, Mathieu L (2011). PBX1 Genomic Pioneer Function Drives ERα Signaling Underlying Progression in Breast Cancer PLoS Genet. 7(11):e1002368.

 
 

Moller-Levet CS, Betts GN, Harris AL, Homer JJ, West CM, Miller CJ (2009). Exon array analysis of head and neck cancers identifies a hypoxia related splice variant of LAMA3 associated with a poor prognosis. PLoS Comput. Biol. 5(11):e1000571.
Crossref

 
 

Nassa G, Tarallo R, Ambrosino C, Bamundo A, Ferraro L, Paris O, et al. (2011). A large set of estrogen receptor beta-interacting proteins identified by tandem affinity purification in hormone-responsive human breast cancer cell nuclei. Proteomics 11(1):159-165.
Crossref

 
 

Pal S, Gupta R, Kim H, Wickramasinghe P, Baubet V, Showe LC, et al. (2011). Alternative transcription exceeds alternative splicing in generating the transcriptome diversity of cerebellar development. Genome Res. 21(8):1260-72.
Crossref

 
 

Pal S, Ravi G, Ramana VD (2012). Alternative transcription and alternative splicing in cancer. Pharmacol. Ther. 136:283-294.
Crossref

 
 

Paris O, Ferraro L, Grober OM, Ravo M, De Filippo MR, Giurato G, Nassa G, Tarallo R, Cantarella C, Rizzo F, Di Benedetto A (2012). Direct regulation of microRNA biogenesis and expression by estrogen receptor beta in hormone-responsive breast cancer. Oncogene 31(38):4196-206.
Crossref

 
 

R Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing. URL 

View.

 
 

Rehrauer H, Opitz L, Tan G, Sieverling L, Schlapbach R (2013). Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching. BMC Bioinformatics 14:370.
Crossref

 
 

Stephen WH, James CM (2016). Detection and visualization of differential splicing in RNA-Seq data with JunctionSeq. Nucleic Acids Res. 44(15):1-15.

 
 

Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D (2008). A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956-960.
Crossref

 
 

Tarallo R, Bamundo A, Nassa G, Nola E, Paris O, Ambrosino C, Facchiano A, Baumann M, Nyman TA, Weisz A (2011). Identification of proteins associated with ligand-activated estrogen receptor alpha in human breast cancer cell nuclei by tandem affinity purification and nano LC-MS/MS. Proteomics 11(1):172-179.
Crossref

 
 

Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L (2013). Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31(1):46-53.
Crossref

 
 

Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28(5):511-515.
Crossref

 
 

Venables JP (2002). Alternative splicing in the testes. Curr. Opin. Genet. Dev.12:615-619.
Crossref

 
 

Venables PJ (2004). Aberrant and Alternative Splicing in Cancer. Cancer Res. 64:7647-7654.
Crossref

 
 

Venables,JP, Klinck R, Koh C, Gervais-Bird J, Bramard A, Inkel L, Durand M, Couture S, Froehlich U, Lapointe E, Lucier JF, Thibault P, Rancourt C, Tremblay K, Prinos P, Chabot B, Elela SA (2009). Cancer-associated regulation of alternative splicing. Nat. Struct. Mol. Biol.16:670-676.
Crossref

 
 

Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456:470-476.
Crossref

 
 

Winter J, Kunath M, Roepcke S, Krause S, Schneider R, Schweiger S (2007). Alternative polyadenylation signals and promoters act in concert to control tissue-specific expression of the Opitz syndrome gene MID1. BMC Mol. Biol. 8(105):1-12.
Crossref

 
 

Zerbino DR, Birney E (2008).Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res.18(5):821-829.
Crossref

 
 

Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, Robinson GJ, Lundberg AE, Bartlett PF, Wray NR, Zhao QY (2014). A comparative study of techniques for differential expression analysis on RNA-Seq data. PLoS One 9(8):e103207.
Crossref

 

 




          */?>