Genetic relationship and the occurrence of multiple gene resistance to coffee berry disease (Colletotrichum kahawae, Waller & Bridge) within selected Coffea arabica varieties in Kenya

1 Kenya Agricultural and Livestock Research Organization (KALRO) Coffee Research Institute, P. O. Box 4-00232, Ruiru, Kenya. 2 Department of Agricultural Science and Technology, School of Agriculture and Enterprise Development, Kenyatta University, P.O Box 43844 00100, Nairobi, Kenya. 3 Kenya Agricultural and Livestock Research Organization (KALRO) Sugar Research Institute, P. O. Box 4440100, Kisumu, Kenya.


INTRODUCTION
Coffee belongs to the family Rubiaceae and the genus Coffea with over 124 species that have been characterized (Davis et al., 2011). Despite the diversity of this genus, only two species, Coffea arabica L. and Coffea canephora *Corresponding author. Email: jgimase@yahoo.com.
Author(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License P. are of economic importance contributing about 70 and 30% of the total world market share, respectively (Setotaw et al., 2020). The C. arabica. is an allotetraploid species (2n = 2x = 44) that exhibits a diploid-like meiotic behavior . C. arabica is the only tetraploid species of the Coffea genus while the rest of the species are diploid (Spiniso-Castillo et al., 2020). The C. arabica is believed to have been formed as a result of spontaneous hybridization between two diploid species, C. canephora and Coffea eugenioides (Lashermes et al., 1999(Lashermes et al., , 2011. The species is autogamous with about 10% out-crossing (Bikila et al., 2017). On the other hand, C. canephora is diploid (2n = 2x = 22), highly diverse (Bertrand et al., 2003), with resistance to common disease and thus a good source of genes for disease resistance (Ky et al., 2001).
Next-generation sequencing (NGS) technologies, such as genotyping-by-sequencing (GBS) and Diversity Arrays Technology sequencing (DArTseq), provide markers that are widely used in genome-wide analysis (Spiniso-Castillo et al., 2020). The GBS approach is more informative than predesigned single nucleotide polymorphism (SNP) arrays especially on wild germplasm as it is unbiased and provide information on rare alleles while the DArTseq method is based on the complexity reduction approach, using restriction enzymes that target the genome coding regions (Pailles et al., 2017). The restriction enzymes separate low copy sequences from the repetitive regions of the genome that are more informative for marker discovery for breeding purposes (Courtois et al., 2013;Pailles et al., 2017).
The genetic variation is controlled by the segregation of multiple genes such that the variances of individual loci are so small that they cannot be investigated individually and thus the need to analyze sets of these loci (Bikila et al., 2017). The DArTseq based SNP markers were successfully utilized in the determination of genetic relations in the Coffea genus by Garavito et al. (2016) and Spiniso-Castillo et al. (2020).
The coffee berry disease (CBD) caused by a specialized hemibiotrophic fungal pathogen, Colletotrichum kahawae (Waller & Bridge) (Waller et al., 1993), is a key constraint of Arabica coffee production in Africa (Hindorf and Omondi, 2011;Van Der Vossen et al., 2015;Diniz et al., 2017;Vieira et al., 2019). The CBD epidemics can destroy 50 to 80% (Van Der Vossen and Walyaro, 2009;Diniz et al., 2017) and at times up to 100% (Giddisa, 2016) of the developing berries, 4 to 16 weeks following anthesis on a susceptible variety when no control measure is applied . The control of CBD using intensive fungicide spray programs (8-12 rounds per year) increases the cost of production by up to 40% (Van der Vosen and Walyaro, 2009). The use of these chemicals also contributes to environmental pollution (Gichuru et al., 2008), thus the use of resistant varieties is the most cost-effective and environmentally friendly approach for CBD management. Resistance to CBD is governed by three genes in the varieties Rume Sudan (R genes), HDT (Tgene) and K7 (k gene) where R and T are dominant while k is recessive (Van Der Vossen and Walyaro, 1980). The breeding program for resistance to CBD in Kenya started in 1971 with the main breeding goal of developing cultivars that combine resistance to CBD, high production, good beverage quality, and compact growth that will be amenable to high-density planting (Van Der Vossen and Walyaro, 1980). Using conventional approaches, genes for resistance to CBD were introduced to C. arabica coffee varieties that are susceptible by crossing with donor varieties and backcrossing to standard varieties to restore desirable attributes (Walyaro, 1983). However, this approach takes a long time to develop a coffee variety due to the long juvenile nature of the Coffea genus (Moncada et al., 2016). The low genetic diversity of C. arabica also hinders the identification and selection of superior genotypes using traditional breeding methods . To overcome this constraint, molecular markers have been used as a supporting tool to accurately discriminate genotypes and accelerate coffee breeding programs . The DNA marker for the T gene was identified by Gichuru et al. (2008) and linked to Simple Sequence Repeats (SSR) primer locus Sat 235, popularly designated as Ck-1 (Gichuru et al., 2008). This marker was validated by Alkimim et al. (2017) who confirmed that Sat 235 marker co-segregate with the T-gene. A recent study by Gimase et al. (2020a) identified the putative DNA marker for the R gene using Single Nucleotide Polymorphism (SNP) markers.
The CBD resistant cultivar R11 is an F1 hybrid derived from a cross between a specific female and male population (Omondi et al., 2001), while Batian is a pure line that was selected from the R11 pollen parents (Omondi et al., 2001), after several generations of selfing to fix the CBD resistant genes (Gichimu et al., 2014). SL 28, is a Bourbon type single-tree selection that combines high yield, high quality, and drought tolerance but highly susceptible to CBD (Walyaro, 1983). The main objective of this study was to evaluate the genetic relationship within selected C. arabica cultivars R11, Batian and their resistance donor parent HDT, RS and the recurrent parent SL28 using DArTseq based SNP markers and identify genotypes within R11 and Batian with multiple gene resistance to CBD conferred by the T and R genes.

MATERIALS AND METHODS
This study was carried out at the Kenya Agricultural and Livestock Research Organization -Coffee Research Institute (KALRO-CRI) in Ruiru, Kenya. Ruiru is located within the upper midland (UM2) at 1° 06'S and 36° 45'E and an altitude of 1620 m above sea level. The rainfall pattern is bimodal with 1063 mm per annum and the annual average temperature is 19°C with a range of 12.8 to 25.2°C (Jaetzold et al., 2006).
The study materials were 91 genotypes comprising of 61 crosses of the variety of R11 and 27 families of the variety Batian. Also included were SL 28, HDT and Rume Sudan.

Sample collection, genomic DNA extraction and genotyping of SNP markers
Fresh leaves were randomly picked from each of the 91 coffee genotypes, kept in cool boxes and taken to the laboratory for DNA sample extraction. The LGC genomics plant sample collection kit (www.lgcgenomics.com) was used in sample collection, where 6 disks were cut and placed in each strip of the 96 deep well sample plate, and sent to the Integrated Genotyping Service and Support (IGSS) platform (https://ordering.igssafrica.org/cgibin/order/login.pl) for DNA extraction and genotyping. Genomic DNA samples were extracted from genotypes using a standard cetyltrimethylammonium Bromide (CTAB) protocol of Doyle and Doyle (1987). The quality and quantity of the DNA samples were evaluated by running it through 0.8% agarose gel electrophoresis. The DNA concentration was adjusted to 50 ηg/μl. The genomic DNA samples were sent to Diversity Arrays Technology (DArT) Pty Ltd., in Canberra-Australia (http://www.diversityarrays.com) for sequencing and identification of SNP markers. The GBS-SNP was performed following the standard protocol as described by Elshire et al. (2011). Next-generation sequencing was carried out using the Hiseq2500 Illumina platform.
The SNP calling was carried out by the DArT-soft14 algorithm within the KDCompute pipeline developed by Diversity Arrays Technology (https://kdcompute.seqart.net/kdcompute/plugins). In the primary pipeline, the FASTQ files were first processed to filter poor quality sequences to ensure that the assignments of the sequences to specific samples carried in the barcode split region were consistent and reliable (Nemli et al., 2017;Barilli et al., 2018). The identical sequences were collapsed into FASTQ call files that were used in the secondary pipeline for DArT P/L's proprietary SNPs calling algorithms (DArT-soft14) pipeline in the processing of the sequence data (Barilli et al., 2018). Since the allotetraploid C. arabica open-access genome assembly, with a reliable sorting of homoeologous sequences, is not yet available (Scalabrin et al., 2020), the filtered sequence reads were aligned against the finer and publicly available diploid C. canephora genome (http://coffeegenome.org/coffeacanephora) as a reference to find the SNP markers in C. arabica genome (Sant'Anna et al., 2018) and to determine their corresponding genomic positions.

The SNP marker quality analysis and evaluation of the genomewide relations
The SNP loci with >30% missing data and rare SNPs with less than 5% minor allele frequencies (MAF) and heterozygosity (Ho) above 90% were removed (Garot et al., 2018). The genetic relationships within the study genotypes were determined using the Principal Component Analysis (PCA) and hierarchical clustering, both implemented within the clustering analysis component of the KDCompute plugins system (https://kdcompute.seqart.net/kdcompute/plugins).

Genomic DNA extraction, amplification, and electrophoresis using SSR primer locus Sat 235
A total of 59 genotypes comprising 27 R11 crosses, 27 Batian families, five control genotypes, HDT, Robusta, Rume Sudan, and susceptible cultivars SL28 and Caturra were analyzed. Healthy leaves were picked and genomic DNA extracted following the method of Diniz et al. (2005) with minor modifications in the extraction buffers. About 500 mg of fresh leaves were ground and transferred to 2 mL Eppendorf tubes. After grinding, 1 mL extraction solution was added and the tubes shaken vigorously for 5 min and immediately put in a 65°C water bath for 40 min. After which the samples were centrifuged for 5 min at 13000 rpm and the supernatant transferred to a new tube, to which 1 mL CIA (chloroform: isoamyl 24:1) was added and the tubes were shaken for 10 min and centrifuged for 5 min at 12000 rpm. The supernatant was transferred to another tube and the same volume of frozen Isopropanol was added and maintained at -20°C for 1 h.
The content was centrifuged at 1300 rpm for 5 min, the supernatant discarded and the pellet washed with 70% ethanol. This step was repeated twice and after drying, the pellets were treated with 190 μL TE (Tris-EDTA buffer plus RNAse 10 mg μL-1) for 30 min at 37 and 65°C for 5 min. The DNA was then purified with the addition of 100 μL TE, 100 μL water, 100 μL NaCl 5 M and 100 μL EDTA 0.5 M. The samples were homogenized and incubated on ice for 30 min and centrifuged for 5 min at maximum speed and isopropanol added. After drying, the pellet for each genotype was diluted in an appropriate amount of TE buffer as per the amount of DNA quantified using a spectrophotometer and stored at 4°C. The extracted DNA quality was determined by running the samples in 1% agarose gel alongside a Lambda standard with a known concentration of DNA fragments for comparison and quantification of the samples.
The Polymerase Chain Reaction (PCR) was carried out in a total volume of 25 μL, containing 10 ng/μL template of genomic DNA, 0.4 μM of Sat 235 SSR primer, 75 μM dNTPs (each), 2.5 μM MgCl2, PCR buffer 1x TBE [75 mM Tris-HCl; 0.5 Na2 EDTA (pH 8.0)], 20 Mm Boric acid and 1 unit Taq DNA polymerase (from Gene -on company, Germany). Amplification was carried out in a Eurogene thermocycler (TECHNE, UK). The amplification program was one cycle of initial denaturation at 94°C for 5 min followed by 35 cycles of 30 s at 94°C (denaturation), 30 s at 55°C for primer annealing, and 1 min and 30 seconds at 72°C for elongations with a final extension at 72°C for 10 min.
The amplification products with SSR primer Sat 235 were electrophoresed in 2.3% (w/v) agarose gel with a 1x TBE buffer system and then visualized in a UV light trans-illuminator after staining in 60% ethidium bromide solution.

Confirmation of the occurrence of multiple gene resistance to CBD conferred by the T and R genes
The presence/absence of the Tgene was confirmed by observation of the amplified fragment, based on the standard HDT, Robusta, and SL28 while the genotypes carrying DNA marker for R genes were identified by searching the SNP marker sequences within the GBS-based DArTseq marker result files of the study genotypes.

The analysis of the SNP marker and evaluation of the genome-wide relations
The DArTseq generated 2280 good quality SNP markers (MAF>5% and Ho<90%), out of which 1575 were aligned on the 11 chromosomes based on the C. canephora reference genome, markers that were well distributed with the genome (Figure 1) and therefore used in further analysis. The PCA results revealed that all the Batian genotypes from the three crosses clustered together apart from two genotypes, CR8-155 and CR30-809 that deviated from this cluster (Figure 2). The two genotypes had a close relation with Rume Sudan. The Batian genotypes were closely related to the cultivar SL 28. About 60% of the R11 crosses (38) formed one main cluster while the rest (23) were evenly distributed within the PCA plot. HDT did not portray any close relationship with any of the genotypes in the study. PC1 was the most important and accounted for 42% of the total phenotypic variation.
Similar to the PCA, the hierarchical clustering analysis (dendrogram) revealed that all the Batian formed one cluster together with SL 28 apart from two genotypes, CR8-155 and CR30-809 that showed a close relationship with Rume Sudan (Figure 3). 38 R11 crosses formed one major cluster while the rest (23) formed four different clusters. HDT did not cluster with any of the study genotypes while Rume Sudan showed a closer relationship with Batian and three R11 genotypes (R11-6, 22 and 195). The cluster dendrogram also revealed that all the R11 crosses, Batian families from the three crosses, RS and SL 28 had low genetic diversity of about 10% or less while HDT had a relatively high diversity of about 20%.

The occurrence of the T (Ck-1) gene in variety R11 and Batian
The amplification products revealed that all the 27 R11 crosses evaluated carry the DNA fragment for the T-gene (Ck-1) that is introgressed from the C. canephora genome (Plate 1). The Ck-1 DNA fragment was heterozygous for all the R11 genotypes. The Ck-1 DNA fragment was also present in HDT and Robusta in a homozygous state but absent in Rume Sudan, SL 28, and Caturra. Similarly, all the 27 Batian families analyzed carry the Ck-1 fragment (Plate 2), which was also present in HDT and Robusta but absent in SL28 and Rume Sudan. Out of the 27 Batian genotypes, 15 genotypes namely  were homozygous for the Ck-1 gene while 12 genotypes  were heterozygous for the Ck-1 gene (Pate 2).

The genome-wide relationship within the study varieties
The genome-wide analysis revealed a narrow genetic base within C. arabica coffee varieties R11, Batian, SL 28 and Rume Sudan, with a diversity index of less than 10%. This is attributed to the fact that these varieties are derived from a few individual collections and whose subsequent dispersal has progressively narrowed further their genetic base (Baruah et al., 2003;Setotaw et al., 2013). The variety of HDT had a higher diversity index of about 20%. The variety HDT is a natural interspecific cross between C. arabica and C. canephora and usually shows a divergence from commercial cultivars for most of the agronomic traits (Agwanda et al., 1997). The expressed diversity in HDT is due to the introgressed genes from the C. canephora genome (Lashermes et al., 1999). Although HDT was utilized as a donor parent for resistance to CBD in the varieties' R11 and Batian, the high diversity was not reflected in these genotypes as in HDT. This is most likely due to filial advancement from the original crosses, such that the existing progenies contain less of the initial C. canephora genome (Gichuru, 2007). The study revealed uniformity within individuals of the three crosses that make up the variety Batian and revealed further that they are closely related to SL 28. The variety Batian was obtained from complex crosses between CBD donor parents (HDT, Rume Sudan and K7) and susceptible varieties (SL 28, SL 34, Bourbon and Tanganyika drought-resistant selections) and then backcrossed to SL28 and selfed to restore and fix genes for desirable attributes of superior beverage quality (Gichimu et al., 2014). From this result, it is most likely that this attribute was successfully restored as SL 28 was closely related to the individual genotypes of Batian.
The cultivar R11 is also relatively uniform as 60% of the crosses clustered together while the rest were in four different small clusters. R11 is an F1 hybrid between complex crosses (that were selected as Batian) as pollen parents and Catimor (a cross between HDT and Caturra) isolines as seed parents (Omondi et al., 2001). Variation within R11 crosses on various traits have been reported in previous studies; resistance to CBD (Omondi et al., 2001;Gichimu et al., 2014); beverage quality (Gichimu et al., 2012); and yield (Gichimu et al., 2013).

The occurrence of the T-gene (Ck-1) within C. arabica variety R11 and Batian
All the 27 R11 crosses were confirmed for the occurrence of the T-gene. R11 is a composite F1 hybrid made up of 66 different crosses (Omondi et al., 2001). The previous study by Gichimu et al. (2014), confirmed the occurrence of T-gene in 34 R11 crosses and therefore this study brings the total number of R11 crosses confirmed to carry the T-gene through the marker-assisted selection to 61. The cultivar R11 inherited the T-gene from two different sources, the seed parent Catimor and Pollen parents. The seed parent Catimor comprises several lines derived from a cross between HDT (a spontaneous cross between C. canephora and Caturra, a C. arabica cultivar, highly susceptible to CBD) (Gichuru et al., 2008). The pollen parents are complex crosses between CBD resistance var HDT, Rume Sudan and susceptible SL28, SL34, SL4, N39 and Bourbon (Van Der Vossen et al., 1981;Agwanda et al., 1997;Omondi et al., 2001).
Similarly, all the 27 Batian genotypes analyzed were confirmed to carry the Ck-1 gene where 15 genotypes were homozygous and hence stable for the gene. Batian is a selection from the pollen parents of R11 hence inherited the T-gene from HDT, one of the resistance sources in the complex crosses (Gichimu et al., 2014).
A study by Alkimim et al. (2017) using three CBD resistance genotypes in Brazil revealed the occurrence of Ck-1, within the genotypes, where two genotypes were homozygous while one was heterozygous and confirmed that Sat 235 marker co-segregates with the gene. Similarly, a study by Mtenga (2016) reported the occurrence of Ck-1 in CBD resistant genotypes from Tanzania and Ethiopia accessions. In the study, Sat 235 could not amplify the T-gene fragment in Rume Sudan since the gene for resistance to CBD in this variety is in a different locus. This therefore confirmed further, the findings by Mtenga (2016) as the Ck-1 gene was not amplified in Rume Sudan.

The occurrence of the R-gene within C. arabica variety R11 and Batian
Eleven genotypes were confirmed for the occurrence of the SNP 100034991|F|0-44:C>T-44:C>T while none of the genotypes carry the SNP marker 100025973|F|0-59:T>C-59:T>C. The SNP marker 100025973|F|0-59:T>C-59:T>C was comparatively a rare variant where minor allele frequency (MAF) was lower (0.28) as opposed to 100034991|F|0-44:C>T-44:C>T (0.42) out of the possible maximum of 0.5 (Zhang et al., 2015;Gimase et al., 2020a). The SNP marker 100025973|F|0-59:T>C-59:T>C could have been lost in both R11 and Batian during backcrossing of the complex crosses to SL28 to restore good cup quality and high yield (Agwanda et al., 1997) since the selection process was not guided by the use of DNA markers (Omondi, 1998). This, therefore, confirmed that the SNP marker 100034991|F|0-44:C>T-44:C>T as a reproducible marker within C. arabica genotypes carrying resistance gene inherited from Rume Sudan. The polymorphic occurrence of this locus in Rume Sudan and SL 28 signifies its ability to discriminate variants in terms of resistance to CBD and its suitability for MAS (Rouet et al., 2019). The polymorphic genomic loci are used as genetic markers in the determination of the co-segregation of genetic alleles with qualitative traits emanating from populations of crosses or naturally occurring populations (Motazedi et al., 2019). The SNP marker 100034991|F|0-44:C>T-44:C>T was found to be linked to two genes by Gimase et al. (2020b), a finding that is further supported by the inheritance study by Van Der Vossen et al. (1980) that reported that the occurrence of two alleles for R gene in Rume Sudan as R1R1.

Conclusion
The study confirmed the close relationship within the C. arabica coffee varieties as exhibited by a narrow genetic base, with Batian as a very uniform cultivar and genetically very close to SL 28 while R11 was fairly uniform. As per the previous studies, this work affirmed further the uniformity within the C. arabica genome. The study also revealed that all the genotypes within the CBD resistant varieties R11 and Batian carry the T gene, while 11 carries the R gene and therefore with multiple gene resistance to CBD. The study further confirmed the codominant nature of the DNA marker for T gene (Ck-1) and R gene due to their ability to discriminate between homozygous and heterozygous variants within the resistant genotypes. From this study, it is evident that the selection of arabica coffee varieties with multiple gene resistance to CBD is a reality. The genotypes that were confirmed to carry the two genes for resistance to CBD are recommended for further distribution to growers since resistance will not break easily to new disease races.