Identification and analysis of cassava genotype TME3 bacteria artificial chromosome libraries for characterization of the cassava mosaic disease

Cassava is an economically important crop in sub-Saharan Africa; however, its yield potential is constrained by cassava mosaic disease (CMD) infection. Classical genetics and biotechnology are being harnessed to overcome the disease and secure yields for farmers. The CMD2 resistance locus flanked by three simple sequence repeats (SSR) markers and one sequence characterized amplified region (SCAR) marker were mapped in West African genotypes and shown to impart qualitative resistant to all species of CMGs. However, gene(s) associated with the CMD2 locus and their mode of actions remains unknown. In an effort to discover gene(s) located in CMD2 locus region, TME3 BAC collections were screened for the presence of CMD2 flanking markers. CMD susceptible and resistant cassava genotypes were found to contain 100% of the markers flanking CMD2 locus. SNPs and nucleotide deletions were identified within the marker sequences but there was no evidence of trait and marker association. All the SSR markers flanking CMD2, and the more recently characterized CMD3 loci were to be located on chromosome 12. Through BAC pools library hybridization with marker probes, 130 BACs were identified, but only 23 BACs contained at least CMD2 specific two markers. Whole BAC sequencing identified five clones that mapped to the marker regions. BAC29 assembled into a 100 kb contig and encoded tandem repeats of three full length R genes (3.5 kb) and two partial repeats. These R genes were conserved and highly expressed in CMD susceptible and CMD resistant cassava genotypes. Promoter sequences derived from R genes showed similar transient expression of GUS as 35S promoter. On cassava genome V6.1 BAC29 sequences were mapped to chromosome 16, eliminating their potential role in CMD resistant.


INTRODUCTION
Cassava genetic improvement through classical breeding and biotechnology targets various traits such as disease and pest resistance, enhanced nutritional and reduced cyanogenic content, improved starch quality as well as decreased post-harvest deterioration (Legg et al., 2015;Akinbo et al., 2012;Welsch et al., 2010;Jørgensen et al., 2005;Lokko et al., 2005;Siritunga et al., 2004). In sub-Saharan Africa, cassava production is constrained by two viral diseases namely Cassava mosaic disease (CMD) caused by multiple species of single stranded DNA viruses in the family geminiviridae, and Cassava brown streak disease (CBSD) caused by two species of RNA viruses in the family Potyviridae (Legg et al., 2011(Legg et al., , 2015. CMD is endemic to all cassava production regions of sub-Saharan Africa (Harimalala et al., 2015;Legg et al., 2015;Patil and Fauquet, 2009). As a result, classical breeding programs have focused significant attention on mitigating the negative effects caused by CMD. Through marker assisted breeding, three quantitative trait loci (QTL) namely CMD1, CMD2 and CMD3 have been independently mapped in cassava genotypes namely TMS30572, TME3 and TMS97/2205 (Rabbi et al., 2014;Okogbenin et al., 2012;Lokko et al., 2005;Akano et al., 2002;Fregene and Puonti-Kaerlas 2002). During the 1980s and 1990s, a CMD epidemic occurred in East Africa. A successful approach in managing the crisis was achieved through deployment of the West African landraces TME14 and TME204 which carry the CMD2 locus (Legg, 1999). These cultivars are now among the most popular genotypes in Kenya and Uganda due to their inherent resistance to CMD, good cropping characteristics, cooking qualities and taste. However, they are highly susceptible to CBSD. Others traditional East African cultivars, such as Ebwanatereka, possess desirable organoleptic traits but are highly susceptible to both CMD and CBSD (Kaweesi et al., 2014). Transgenic technologies allow the introduction of multiple traits into an individual cultivar (Taylor et al., 2012). Various breeding programs and research groups are embarking on the use of such techniques to stack multiple resistance traits onto genetic backgrounds with inherent resistance to CMD and improving the popular but CMD susceptible genotypes .
The West African cassava landraces carrying the CMD2 resistant locus are popular targets for genetic engineering ; V rs hur t 2012; Taylor et al., 2012) and as parents for cassava genetic improvement programs (Rabbi et al., 2014;Okogbenin et al., 2012;Akano et al., 2002). Qualitative CMD resistance conferred by CMD2 is reported to be stably inherited as a single locus and to confer broad specificity to cassava-infecting geminiviruses (Rabbi et al., 2014;Okogbenin et al., 2007). CMD2 has been mapped on the genetically similar cassava landraces TME1, TME7 and TME14 (Rabbi et al., 2014) using molecular markers designated as sequence characterized amplified region (SCAR) marker RME1, simple sequence repeats (SSR) markers; SSRY28, NS158 and NS169 and single nucleotide polymorphism (SNP) markers (Rabbi et al., 2014;Lokko et al., 2005;Akano et al., 2002). SSR and SNP markers linked with the CMD2 locus are collocated on chromosome 12 of cassava genome version 6.1 (International Cassava Genetic Map Consortium [ICGMC] 2015). The current version of cassava genome anchor 72% (382 MB) of cassava sequences on genetic map (ICGMC, 2015) and genomic scan identified gaps within the chromosomal region carrying CMD2 locus. This indicates that there are a significant number of genes missing from the existing cassava reference genome, especially within highly repetitive regions. Therefore, in its current form, the cassava reference genome is not sufficiently reliable to unravel all sequences coding for genes harbored in CMD2 locus. In addition, the sequenced cassava genome was derived from the partial inbred line AM560-2, a progeny of the Latin-American cassava cultivar MCOL1505 (Prochnik et al., 2012).
According to Okogbenin et al. (2007), all cassava germplasm originating from South America are susceptible to CMD. This means that alleles specific to CMD resistance are most likely not present in the AM560-2 genome. In the absence of a fully sequenced and annotated cassava genome from a CMD resistant genotype, bacterial artificial chromosomes (BACs) were adopted to construct large insert libraries for positional cloning of the CMD2 locus in cassava landrace TME3. This covered 10.1X of the haploid genome, indicating 99% chance of finding any given sequence (Tomkins et al., 2004). BACs are preferred as they enabled stable cloning of large genome insert sizes of 100-200 kb and can be propagated easily in E. coli cells without undergoing rearrangements Shi et al., 2011). In genomics, large-insert DNA libraries are utilized in the development of physical maps of the genome (Tomkins et al., 2004), for high throughput sequencing of genomes (Prochnik et al., 2012), and in map-based cloning of agronomically important genes such as those imparting disease resistance (Ragupathy et al., 2011). Using BAC by BAC sequencing, resistance gene analogues (RGAs) were isolated and mapped from TMS30001, a CMD resistant genotype derived from Manihot glaziovii, and from cultivar MECU72 carrying whitefly resistant loci (Tomkins et al., 2004). Recently, high quality draft-genomes have been assembled through integration of BAC-based physical maps and BAC-end sequences (Wang et al., 2014;International Barley Genome Sequencing Consortium, 2012), indicating the power of the resources and associated technologies for elucidating genomic information in crop plants.
Map based cloning of cassava was initiated through  (CIAT, 2006). BAC end sequencing of clones carrying markers BAC33a, BAC33b, BAC36b and SBAC33c were constructed into 13 contigs spanning across the CMD2 region based on the position of markers RME1 and BAC33b located on either side of the CMD2 locus (CIAT, 2007).
In the present study, attempts were made to assemble sequences and identify genes residing in the CMD2 locus. Thirteen BAC clones were obtained from CIAT courtesy of Dr Luis Augusto, and additional BACs libraries hybridizing to the CMD2 flanking markers were procured from CUGI. Results described here reported presence of CMD2 and CMD3 loci flanking markers in all cassava genotypes irrespective of CMD resistant status. TME3 BAC library hybridization with markers flanking CMD2 locus and whole BAC sequencing identified sequences mapping to markers on either side of CMD2. However, contigs were not long enough to transverse the entire CMD2 locus region. Tandem repeats of five disease resistance genes were identified in one of the clones and their diversity studied in different cassava genotypes. However, they were mapped to chromosome 16 of the current cassava reference genome.

Selection of candidate BAC libraries spanning around CMD2 locus
Four markers that mapped closest to the CMD2 locus and described by Okogbenin et al. (2012) were PCR amplified from cassava genotype TME3 using the conditions and primer sequences reported by Okogbenin et al. (2012). Amplicons corresponding to the full length marker DNA sequence were cloned into Zero blunt Topo vector (Life Technologies, Carlsbad, CA, USA) and confirmed through sequencing using M13-F (5´-GTAAAACGACGGCCAG-3') M13-R (5´-CAGGAAACAGCTATGAC-3´) primers.
Four high density colony filters were hybridized with each probe to cover the entire TME3 BAC libraries. Hybridizations were performed overnight at 60°C, followed by two stringent washes at 60°C with 0.1% SDS and 1X SSC for 1 h. Images of the hybridizations were recorded by phosphor screens and read by a Typhoon 9400 imager (GE Healthcare, Piscataway, NJ, USA). The coordinates of the BAC clones were identified on filters using Hybdecon software (CUGI). Bacterial clones corresponding with positive signals were isolated using a sterile toothpick and grown overnight in 3 ml of liquid LB medium containing 12.5 μg/μ chloramphenicol. Plasmid DNA was extracted from three colonies per BAC using PureLink® Quick Plasmid Miniprep Kit (Life Technologies, Carlsbad, CA, USA). The BAC hybridization results were verified through PCR using CMD2 flanking marker primers as reported by Okogbenin et al. (2012)

BAC sequence assembly and annotation
Raw sequences obtained from full BAC sequencing were downloaded, demultiplexed by QIIME (Caporaso et al., 2010) and trimmed using Cutadapt (Martin, 2011) to remove adaptor sequences followed by filtering of sequences of bacteria and vector origin by randomized Numerical Aligner (Vezzi et al., 2012) Clean reads from each BAC were assembled using Sanger Sequence Assembly Software (DNASTAR) using default settings and all contigs larger than 5 kb used for the present analysis. Comparative analysis of BAC sequences was then performed to identify cassava genomic regions harboring homologous sequences.
Coding regions of BAC sequences were identified through BLASTN using cassava EST sequences (http://cassava.igs.umaryland.edu/blast/db/EST_asmbl_and_single. fasta) as query and the BAC clone sequences as the subjects. The potential coding sequences were blasted against cassava genome V6.1 (http://phytozome.jgi.doe.gov/) and homologous genes identified based on functional annotation. To increase the level of confidence, homology searches in NCBI were performed to reveal putative genes based on 95-100% nucleotide identities. For disease resistant genes, conserved protein domains were identified using HMMER software suite (http://pfam.xfam.org/). Primers described in Supplementary Table 1 were used to study diversity of full length disease resistant genes in various cassava genotypes.

Amplification of putative CMD resistance genes from cassava genotypes
From one of the BACs sequenced, hereafter referred to as BAC29, three tandem repeats of putative full length resistance (R) genes containing motifs for coiled-coil nucleotide binding site leucine rich repeat (CC-NBS-LRR) and two truncated R genes were identified. To identify nucleotide diversity of the R genes in cassava genotypes, primers were designed from BAC29 sequences to independently amplify 3.5 and 1.6 kb of full length and truncated R genes, respectively (Supplementary Table 1). The promoter regions of putative CMD2 gene(s) was targeted for amplification using primers designed upstream of the genes to cover 1.5-2.0 kb (Table  1). Total DNA was extracted from CMD2 types cassava genotypes TME3, Oko iyawo, TME7, TME204, CMD1 types TMS30001, TMS30572, CMD3 types TMS97/2205 TMS98/0505 and susceptible types TME117, 60444 and Ebwanatereka, using DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the m uf tur r's i stru tio s Th DNA w s qu tifi o N oDrop (Thermo scientific Waltham, MA, USA) and 100 ng mixed with 22 µl A uPrim ™ Pfx Sup rMix (Lif T h o ogi s C r sb CA USA), 1 µl of each primer (0.5 µM final concentration) and subjected to the following PCR conditions; one cycle of 5 min at 94°C followed by 35 cycles of amplification (35 s at 94°C, 30 s at 52°C and 3 min (R gene ORF), 1.5 min (promoter) at 68°C ) and a final cycle of 10 min at 68°C. The PCR amplicons were analyzed by running on a 1% (w/v) agarose gel electrophoresis at 120 V for 30 min. The desired fragments were excised from the gel and purified using PureLink® Quick Gel Extraction Kit (Life Technologies, C r sb CA USA) s p r th m uf tur r's i stru tio s

Cloning putative R genes and their endogenous promoters
The gel purified genes and promoter PCR products were cloned i to Z ro B u t TOPO o i g v tor fo owi g th m uf tur r's protocol (Life Technologies, Carlsbad, CA, USA). Two microliters of ligation mix was transformed into 50 μ of One Shot TOP10 chemically competent E. coli cells (Life Technologies, Carlsbad, CA, USA) as described by the manufacturer. After one hour i ub tio t 37°C i 250 μ LB medium minus antibiotics, 50 μ of transformed cells were plated on LB agar plates containing 50 μg/m k my i , 100 mM Isopropy β-D-1-thiogalactopyranoside (IPTG) and 20 mg/ml X-gal and incubated overnight at 37°C. Six individual white colonies per clone were grown in liquid LB media containing 50 μg/m k my i for 8 h P smi DNA w s purifi using Purelink Quick plasmid miniprep kit (Life Technologies, Carlsbad, CA, USA) and confirmed for insert size through double digestion of 100 ng plasmid DNA with Xmal and Xbal restriction enzymes (New England Biolabs, Ipswich, MA USA). Six clones per cassava genotype were submitted to Genewiz (http://www.genewiz.com) for Sanger sequencing using M13-F, M13-R, and additional primers designed in the middle of gene to cover the entire open reading frame (ORF), shown in Supplementary Table 1. The sequence contigs were assembled using SeqMan Pro and aligned with MegAlign Pro of DNASTAR Lasergene 12.2 (DNASTAR, USA). Maximum likelihood phylogenetic trees were constructed to study the diversity of sequences. The sequences of putative promoters were analyzed for promoter elements using program PLACE web Signal Scan (http://www.dna.affrc.go.jp/PLACE/signalup.html). To study the promoter activities, clones were constructed into binary vector pCambia5000 using promoter sequences derived from CMD resistant and susceptible genotypes respectively to drive the expression of beta-glucuronidase (GUS) reporter gene whereas 35s promoter was used as the control. The promoter activity was determined through leaf infiltration transient expression in Nicotiana benthamiana following the procedures described by Wroblewski et al. (2005).

Northern blot analysis of potential R genes
To determine the level of putative R gene expression in different cassava genotypes, total RNA was extracted using a modified cetyltrimethylammonium bromide (CTAB) protocol (Doyle and Doyle 1990). Following extraction, RNA was incubated at 37°C for one hour with 4 μ of TURBO DNA-fr ™ Kit (Lif T h o ogi s Carlsbad, CA, USA) to remove contaminating DNA. The RNA was quantified on NanoDrop (Thermo scientific Waltham, MA, USA) and 10 μg trophor s i 1% turi g g ros g for 2 h t 80 V before transfer to a positively charged Hybond nylon membrane (Amersham, UK) using DEPC treated 20X SSC (0.3 M trisodium citrate and 3.0 M sodium chloride) for 12 h. Membrane bound RNA was subjected to UV at 120,000 microjoules/cm 2 using a Stratalinker UV crosslinker 1800 (Stratagene, La Jolla, CA) and pre hybridized in Digoxigenin (DIG) Easy Hyb solution (Roche, Indianapolis, IN, USA) for 1 h and hybridization overnight at 42°C with DIG-labeled probes specific to each R gene.

Cloning of TOM1 gene homolog from cassava
A protein homologous to tobamovirus multiplication protein 1 (TOM1) designated Manes.16G009700.1 was identified in the same genomic region harboring a cluster of NBS-LRR genes that were homologous to BAC29 R genes. Primers TOM1-1 AGAGAATGACCAGAATGCCAGTGC and TOM1-2 TTACCGAATAGGGTGATATTGCGCCG were used to amplify 850 bp transcript of the gene from cassava genotypes TME3, TME204, 60444 and Ebwanatereka. The RT-PCR products were cloned into Zero Blunt TOPO vector and transformed into one shot TOPO ten competent cells. Colonies were screened and positive transformants were sequenced. Sequence contigs were assembled using SeqMan Pro and aligned with MegAlign Pro of DNASTAR Lasergene 12.2 (DNASTAR, USA).

Analysis of CMD2 locus flanking markers in different cassava genotypes
Cassava genotypes possessing resistance or susceptibility to CMD were evaluated for the presence of markers flanking CMD2 and CMD3 resistance loci. Analysis of CMD2 flanking markers revealed presence of two RME1 marker alleles with the upper band corresponding to 740 bp while the shorter fragment was about 500 bp in size ( Figure 1A). Two alleles of RME1 were identified in cassava genotypes TME117, Ebwanatereka (both CMD susceptible) and in TMS98/0505 (CMD resistant). One allele of RME1 corresponding to 740 nt was observed in cassava genotypes TME3, TME204, Oko-iyawo, (all CMD resistant), TME7 and 60444 (both CMD susceptible). On the other hand, genotypes conferring quantitative CMD resistance mediated by CMD1 (TMS30572) and CMD3 (TMS97/2205) respectively contained only the shorter, 500 bp sized fragment ( Figure 1A). Sequence analysis of the two RME1 alleles revealed that RME1 sequences encode for a partial nucleotide binding motif of a disease resistance gene. Homology match of RME1 sequences on AM560-2 genome revealed imperfect matches, an indication of highly repetitive sequence ( Figure 1A).
The SSR markers flanking the CMD2 and CMD3 loci were found to be present in all cassava genotypes ( Figures 1A and B). However, analysis of sequences using single stranded conformation polymorphism (SSCP) showed differences across the genotypes ( Figure  1B). Sequence analysis of SSRY28 derived from TME3 and 60444 respectively revealed a 12 nucleotide deletion in TME3 starting from position 114. The position of CMD2 flanking markers was identified as follows: SSRY28 on chromosome12:7541033 -7541198, NS169 on chromosome12:7731640 -7731970, NS158 on chromosome12:7731640 -7731816 whereas CMD3 flanking marker NS198 was located on chromosome12:1353345 -1353192 of cassava genome v6.1 (Table 1). Based on the positions of the markers flanking CMD2 and CMD3 loci, the CMD3 locus is located upstream of the CMD2 locus in the same cassava genomic region.

High-throughput TME3-BAC pools library hybridization with CMD2 marker probes
The four cassava genetic markers RME1, SSRY28, NS169 and NS158 were PCR amplified from cassava genotype TME3, cloned into zero blunt topo and confirmed with EcoRI restriction digestion ( Figure 2A). Each marker probe was independently hybridized to two replicates of high density colony filters containing 70,000 TME3 BAC clones and images recorded by phosphor screens and read by a Typhoon 9400 imager ( Figure 2B). Eighteen BAC clones hybridized with markers NS169 and NS158 respectively while 23 clones were positive with SSRY28 and 89 with RME1 (Table 2). Cross hybridization was detected only for SSRY markers whereby 23 BACs positively hybridized with atleast two marker probes (Supplementary Table 2).

Amplification of CMD2 markers from candidate BAC clones using PCR
The 23 BAC clones identified to carry more than one marker were subjected to PCR analysis using primers specific to each marker (Figure 3). Ten clones were positively identified with RME1, two clones with SSRY28, six clones with NS158 ( Figure 3B and C). None of the clones were positive with NS169 ( Figure 3D). The two strategies of BAC screening gave conflicting results, whereby some BAC clones that positively hybridized with marker probe were scored as negative using PCR  Table 2). The discrepancy was higher in SSR markers when compared with SCAR marker (RME1). The discrepancy between the two assays may be attributable to contamination of clones during transfer or perhaps non-specific hybridization.
Eleven BAC clones that constructed around BAC33a and SBAC33b as reported by CIAT (2007) were procured from CUGI and further screened with CMD2 mapping markers. PCR analysis of RME1 marker identified a fragment corresponding to 740 bp from five clones (29E13, 70C06, 145G15, 34L16 and 17N21) indicating the presence of RME1 marker ( Figure 3E).
Thirteen clones derived from contigs constructed on the bases of mapping marker RME1 and BAC33b (CIAT, 2007) were screened for the presence of CMD2 locus flanking markers by PCR (Supplementary Figure 1). The RME1 marker was detected in BAC23 and BAC32, respectively, while marker NS198 was detected in BAC12 and BAC22 (Supplementary Figure 1 PCR) indicating they were derived upstream of RME1. However, none of the thirteen BACs contained markers NS158, NS169, while SSRY28 mapped downstream of the CMD2 locus ( Supplementary Figure 1 PCR). As a result, information pertaining to the position of marker BAC33b or the marker specific primer sequences are not clear and the contigs may therefore have been constructed upstream

Mapping of TME3 BAC sequences to the cassava genome
Ten clones identified using hybridization and confirmed using PCR, were fully sequenced (Table 3). Sequences derived from seven BACs met the quality control threshold, allowing further analysis to be performed. The longest contig assembled was 93474 bp in size derived from BAC 52H02, whereas the majority of contigs from other BACs ranged from 5200 to 42000 bp (Table 3). At the chromosomal scale, five of the seven BAC clones were mapped to chromosome XII with two BACs mapping on chromosome II and XVII of cassava genome version 6.1 respectively (Table 3). All five BAC clones identified on chromosome XII were all positive for the presence of the RME1 marker, and based on chromosomal coordinates they were all constructed from the same genomic region (Table 3). However, none of the BACs sequences were homologous to any of the SSR maker sequences (Table 3). Therefore, it was not possible to construct a contiguous sequences spanning from the RME1 marker to SSR markers SSRY28, NS158, and NS169.

Sequencing of BAC clones constructed from the region around CMD2 locus, from markers RME1 marker and BAC33b
Among the thirteen BAC clones previously constructed as reported by CIAT (2007), only one BAC, referred to as BAC29, contained opening reading frames (ORFs) for R genes (Figure 4). This entire BAC was assembled to 100 kb and annotated (Figure 4). Analysis of the BAC29 sequences revealed three tandem repeats of putative Rgenes (CMD2a, CMD2b and CMD2c) and two partial Rgenes (CMD2d and CMD2e) ORFs in opposite direction (Figure 4). The three NBS-LRR encoding genes identified from TME3 BAC29 were 3.5 kb long with 93-98% nucleotide identity to each other. The truncated R genes were missing two motifs, whereby CMD2d covered only 636 amino acid sequences. While CMD2e was interrupted by a repetitive sequence insertion and

Sequence analysis of R genes isolated from different cassava genotypes
The full length and truncated R genes were identified in cassava genotypes; TME3, Oko-iyawo, TME7, TME204, TMS30001, TMS30572, TMS97/2205, TMS98/0505, 60444 and Ebwanatereka using PCR and sequencing ( Figure 5A). Alignment of contig sequences revealed that the R genes were highly conserved in different cassava genotypes and were clustered into five gene families corresponding to the BAC29 R genes ( Figure 5B). However, there were divergent sequences in CMD2A that Table 3. TME3 BAC clone sequences mapped to AM560-5 genome version 6.1. showed 100% nucleotide identities with CMD2B and CMD2C ( Figure 5B).

Northern blot analysis of putative CMD2 genes in cassava
Expression of CMD2A, CMD2B and CMD2C in cassava genotypes TME3, Oko iyawo, TME7, TME204, TMS30001, TMS30572, TMS97/2205, TMS98/0505, 60444 and Ebwanatereka was determined from 10 µg of cassava total RNA extracted from leaves of greenhouse grown plants. Based on the intensity of the signal on Northern blot, the expression of CMD2A, CMD2B and CMD2C was uniform across all cassava cultivars regardless of their CMD resistant status ( Figure 6). This result indicates CMD2A, CMD2B and CMD2C genes are constitutively upregulated in all cassava genotypes investigated in this study.

Transient expression of GUS driven by putative CMD2 promoter
The ability of the promoter of CMD2A, CMD2B and CMD2C to drive expression of the GUS visual marker gene in N. benthamiana was studied. GUS staining was observed tobacco leaves under control of by the putative native CMD2 promoter and 35S promoters (Figure 7) and no difference was observed in the intensity of GUS staining between these two chimeric expression cassettes. GUS was also transiently expressed by putative CMD2 promoters derived from cassava genotypes TME3 and 60444. Based on this result and Northern blot analysis (Figure 6), it can be concluded that the isolated R genes are actively transcribed in different cassava genotypes regardless of their CMD resistant status.

Tobamovirus multiplication protein 1 homolog in cassava
In addition to the disease resistant gene cluster identified (Supplementary Figure 2), a tobamovirus multiplication protein 1 (TOM1) homolog was identified in the same genomic region (  (Figure 8). The TOM1 family comprises vacuolar membrane proteins that act to inhibit replication of tobamoviruses in tomato when present as mutated alleles (Ishibashi et al., 2010;Yamanaka et al., 2002). Presence of such mutations in CMD resistant or CMD susceptible cassava genotype may be could be implicated in antiviral defense. However, sequence analysis revealed 100% nucleotide identities of TOM1 gene homolog in cassava genotypes TME3, TME204 (both CMD resistant), 60444 and Ebwanatereka (both CMD susceptible) (Figure 8). Based on high sequence similarity of TOM1 homologs in CMD susceptible and resistant cassava genotypes, their involvement in antiviral defense would seem to be unlikely.

DISCUSSION
In this study, molecular markers associated with CMD resistance as reported by Okogbenin et al. (2012) and Rabbi et al. (2014) were identified in all genotypes irrespective of their response to CMD infection. These results agree with Asare et al. (2014) who detected CMD2 flanking markers in both CMD susceptible and resistant genotypes, and contradicts Bi et al. (2010) who reported absence of these markers in the susceptible genotype 60444. Studies by Parkes et al. (2015), reported absence of all marker alleles associated with CMD2 resistance in 18% of F1 progenies derived from a cross between TME11 (CMD2 type) and Dabodabo (local cultivar), despite being resistant to CMD, indicating a different source of CMD resistance. Alternatively, recombination may have occurred in subsequent crossing delinking the marker from the gene, indicating  poor linkage between the markers and the CMD2 resistance mechanism. In cassava genotype TMS97/2205, it was shown here that the RME1 marker was not detected ( Figure 1A) as previously reported by Okogbenin et al. (2012). TMS97/2205 has a pedigree of TME6 (CMD2) and TMS30572 (CMD1) (Okogbenin et al., 2012;Dixon et al., 2010) and shows very high resistant to all CMGs under field (Okogbenin et al., 2012) and greenhouse conditions (unpublished data). It is therefore important to stress that data generated in the present study strongly suggests that the CMD2 markers are not close enough to the gene to be reliably used as a fully diagnostic tools for the presence of functional CMD2 resistance. Based on this result, the presence or absence of the previously described marker alleles may not give conclusive evidence of the presence of gene(s) conferring functional resistance to CMD and, importantly, therefore not completely reliable for screening materials during early stages of breeding programs.
The SCAR marker (RME1) and SSR markers (NS158 and NS169) are located 4.5 CentiMorgans (cM) and 7.1 cM from CMD2 locus (Parkes et al., 2015;Lokko et al., 2005;Akano et al., 2002). However, phenotypic data for these markers in F1 mapping population has not been reported (Okogbenin et al., 2013). Sixty eight percent of the total phenotypic variation of CMD2 is explained by SSRY28 at a distant of 13 cM from the CMD2 locus (Lokko et al., 2005;Akano et al., 2002), whereas NS198, that is reportedly linked with the CMD3 locus, explains 11% of variation in CMD resistance in F1 segregating populations (Okogbenin et al., 2012). One would expect that CMD resistant genotypes would contain 100% marker-trait association. However, CMD resistant loci have been described using eight molecular markers (Rabbi et al., 2014;Okogbenin et al., 2012;Lokko et al., 2005;Akano et al., 2002) that are sparsely distributed, making it impracticable to precisely associate flanking markers with CMD resistant status. High quality genome assembly and dense genetic maps are essential tools for mapping to allow identification of loci associated with traits (ICGMC, 2015; Gaur et al., 2015;Zhang et al., 2015). In soybean, for example, a high density genetic map saturated with 2500 molecular markers were developed to precisely map QTLs conferring resistance to Fusarium graminearum on 300 kb region of chromosome 8 (Acharya et al., 2015). Likewise, Zhang et al. (2015) developed linkage map containing 8007 markers to identify weeping trait in an ornamental woody plant Prunus mume. Such quality density maps are not be available for cassava. SSR markers flanking the CMD2 and CMD3 loci, respectively were identified on chromosome 12 of cassava reference genome (AM560-2) (http://phytozome.jgi.doe.gov). SSR markers NS169 and NS158 sequences are both derived from the intron of the cassava gene designated Manes.12G074900.1 annotated to code for SWItch/Sucrose Non-Fermentable (SWI/SNF). This is part of an ATP-dependent gene family found in both eukaryotes and prokaryotes that plays a central role in DNA packaging (Hargreaves and Crabtree, 2011). Conversely, marker SSRY28 is derived from cassava transcript Manes.12G074000.1 that codes for PREPHENATE DEHYDRATASE (P PROTEIN) involved in phenylalanine and tyrosine biosynthesis (Yoo et al., 2013;Maeda et al., 2011). CMD3 flanking marker NS198 is derived from cassava transcript Manes.12G016800.1 coding for protein phosphatase 2A (PP2A) involved in cell cycle control (Richard and Elder, 2005). The RME1 marker was shown to code for the partial motif of a disease resistance gene and mapped to several genomic regions. All three genes carrying sequences of SSR markers are key in the regulation of plant growth and development. As a result, the markers developed from such genes may not be sufficiently polymorphic to allow distinction between CMD susceptible and resistant genotypes. Single-strand conformation polymorphism (SSCP) of SSR markers on PAGE gel ( Figure 1B) showed potential nucleotide differences in diverse cassava genotypes. Sequence analysis of SSRY28 revealed four nucleotide deletion at nucleotide position 114 in CMD resistant genotype TME3 but not in CMD susceptible genotype 60444.
In plants, it has been demonstrated that disease resistant genes are mostly located as clusters of homologous or heterologous sequences, and in some cases occur as both (Leister, 2004). Global analysis of the BAC clone sequences for the presence of disease resistance genes revealed NBS domains in each of the five BAC clones containing the RME1 fragment that mapped to Chromosome 12 (Table 3). For BAC29, a tandem repeat of NBS-LRR were mapped on chromosome 16 of cassava reference genome version 6. In the same chromosomal region, a cluster of 14 NBS-LRR genes were identified (Supplementary Figure 2). These NBS domains were homologous to NBS described in cassava genotypes TME3, TME117 and wild relative M. euprinosa and M. brachyandra (Gedil et al., 2012). The presence of numerous R genes in BAC29 corroborate with previous reports that indicate that R genes are usually present as clusters resulting from evolutionary pressure to counter changes in pathogen virulence proteins . In the same locus, the TOM1 homolog was identified, indicating a genomic region with high density of defense-related genes. Clusters of genes effective against bacteria, fungus, viruses and oomycetes have been mapped within the same vicinity on soybean chromosome 13 Innes et al., 2008;Sandhu et al., 2005) and in chromosome 1 and chromosome 2 in lettuce (Christopoulou et al., 2015;McHale et al., 2009).
The number of disease resistant genes present in plant genomes varies greatly across species. For example, 143 R genes have been characterized in coconut (Puch-Hau et al., 2015) whereas cassava proteome has been reported to encode 28 Toll/interleukin-1 receptor Nucleotide-binding domain shared by apoptotic protease activating factor 1 (TIR-NB-ARC-LRR), 177 non-TIR-NB-ARC-LRR putative proteins, two with TIR-LRR domains (Soto et al., 2015). Other studies have identified 228 NBS-LRR genes present in the cassava reference genome, grouped into two subfamilies; 181 CC-domaincontaining (CNL) and 47 TIR-domain-containing (TNL) (Lozano et al., 2015). A diverse cluster of 99 genes highly homologous to full-sized NBS-LRR genes have been identified in the reference genome encoding none, or only a small part, of the NBS domain (Lozano et al., 2015). The full length RME1 marker sequences are homologous to cassava genes encoding for NB-ARC (Figure 1). It can therefore be hypothesized that the RME1 marker is homologous to many endogenous disease resistant genes in the cassava genome while the second RME1 allele (shorter 500 nt) would fit the truncated version of NB-ARC gene.
TME3 BAC library hybridization with RME1 fragment revealed 89 BAC clones positive for RME1 compared with 41 BAC clones for the three SSR markers, confirming a high number of R genes in cassava ( Table  2). The RME1 full length sequence hits several places in cassava genome indicating that it is a highly repetitive sequence. This concurs with the fact that NBS-LRR are encoded by one of the largest gene families in plants carrying highly conserved NBS motifs (McHale et al., 2006). NBS-LRR motifs are in general conserved across many genes with pathogen resistance function (Hammond-Kosack and Parker, 2003;Glowacki et al., 2011).
Three full length and two truncated putative R genes occurring as tandem repeats were found in BAC29 sequences ( Figure 4). This is in agreement with the recent report by Soto et al. (2015) who described the presence of clusters of full length and shorter versions of genes encoding NB-ARC-LRR in the cassava reference genome. Annotation of BAC29 and five other BAC clones mapping to the CMD2 locus genomic region and carrying RME1 marker sequences, revealed that putative R genes were localized in genomic region rich in transposable elements indicating a region undergoing fast evolution. This r su t fits with th mo of " rms-r " b tw pathogens and disease resistant genes that drives selective pressure to evolve R genes with specificity to new pathogen virulence proteins (Christopoulou et al., 2015). The truncated R genes identified in BAC29 showed a large deletion at three prime end (CMD2D) and a large insertion (7 kb) on CMD2e at nucleotide position 2237, indicating potential pseudogenes. Large deletions and insertions leading to frameshifts have been associated with condensed mRNAs and proteins. In most cases, disease resistant genes coding for partial NBS domain are regarded as pseudogenes (Luo et al., 2012). Truncated NBS-LRR with non-coding capacity have been located in genomic regions as clusters adjacent full length functional NBS-LRR in Lotus japonicus, Medicago truncatula and potato Lozano et al., 2012;Ameline-Torregrosa et al., 2008).
Cassava disease resistance genes have not been widely reported. Sequence analysis of R genes from different cassava genotypes revealed a high percentage of nucleotide identity indicating high levels of conservation ( Figure 5B). Importantly, however, this was without association with presence or absence of functional CMD resistance loci. Resistance to CMD is governed by: single dominant gene(s) found within the CMD2 locus (Rabbi et al., 2014;Akano et al., 2002), multigenic recessive resistance harbored by the CMD1 loci (Fregene and Puonti-Kaerlas, 2002) or synergism between CMD2 and an additional locus referred to as CMD3 (Okogbenin et al., 2012). Therefore, the identified NBS-LRR may not be involved in CMD resistance. Biolistic inoculation of cassava genotypes with cassavainfecting geminiviruses indicates that CMD resistance lacks the typical R gene mediated hypersensitive response (Patil and Fauquet, 2015). Studies by Louis and Rey (2015) identified resistance gene analogs (RGAs) encoding for resistance protein analogs (RPAs) that were uniquely overexpressed in TME3 recovering from South African cassava mosaic virus (SACMV) infection. However, none of the identified RGAs in TME3 localized with the CMD2 locus.
The cassava reference genome version 6.1 contains NBS-LRR clusters on chromosome 16 that are homologous to BAC29 and cassava isolated R genes. Northern blot analysis performed here indicated constitutive expression of BAC29 derived R gene in all diverse cassava genotypes ( Figure 6). This may indicate their involvement in plant developmental functions while their role of antiviral defense remains to be demonstrated.
Through BAC end sequencing, 13 BAC clone libraries were constructed spanning regions around the CMD2 locus using the RME1 and BAC33b markers (CIAT, 2007). Full length BAC clone sequencing and assembly enabled construction of contigs of 100 kb in two independent clones while the shortest BAC contig assembled was 4.0 kb. However, these results indicated that some of the clones, for example BAC29, 52H02 and 52H13 were mapping to different genomic regions that have not been associated with CMD2 resistance. Approximately, 60% of reads derived from BACs such as 003H20, BAC33, BAC31 and BAC38 mapped to the E. coli genome, with the remaining 40% either not homologous to any cassava genomic region, or assembled poorly into a few short 1 -2 kb contigs scattered throughout the genome. Other BACs contained sequence reads that did not map to cloning vector PIndigoBac536 BAC and had no hits within the cassava genome. Additional constrains encountered in using the TME3 BAC clone library were that detailed description of this library, such as position on physical map of cassava, BAC end sequences, and maps to physical clones in the freezer collection seems to be missing. These discrepancies observed in TME3 BACs studies here may have resulted from a mix-up of clones during library construction or processing. Another possibility that cannot be ruled out is low quality of the TME3 BAC libraries. Hybridization of markers to entire TME3 BAC libraries only provided clones originating from the marker region and did not form contiguous sequences that traversed through the CMD2 genomic region. Presence of E. coli sequences would also indicate contamination and poor quality of TME3 BAC clones library. This study therefore raises serious questions regarding the reliability of this TME3 BACs collection and the technical challenges of using it for identification of sequences such as those coding for CMD2 gene(s).

Conclusion
In this study, a BAC library was used in an attempt to study and identify genes residing within the CMD2 locus. CMD2 locus was mapped with four markers that did not show polymorphism in different cassava genotypes. Although, BAC libraries have been successfully utilized for map based cloning in other species, the present study was not able to identify clones spanning across the CMD2 locus, despite utilizing contig maps and markers previously reported. Sequences from some of the libraries reported in CMD2 region were found to be located in different chromosomal regions, certainly TME3 BAC library appears to be of insufficient quality at the CMD2 locus to be useful for identifying genes involved in this CMD resistance mechanism. It is likely that it would also not be a functional useful tool for discovery of other genes of importance in cassava. With regards to CMD, the situation is further compounded by the fact that the Cassava genome V6.1 contains gaps in CMD2 region making it difficult to design probes for screening potential BACs for chromosomal walking. Efforts should therefore focus on developing polymorphic molecular markers closely linked with the CMD2 locus that can easily be utilized to screen for CMD resistance. In addition, there is need for whole genome sequencing of CMD2 type cassava to identify sequences coding for genes harbored in CMD2 region. Transcriptome data should also be generated from CMD2 cassava genotypes to unravel the mode of action of CMD2 gene(s) and functional validation was performed through gene knock down or overexpression studies. New BAC libraries should be created from the same DNA source and be highly valuable is allowing accurate cloning of the gene(s) involved in CMD2 resistance.