Linkage disequilibrium and association mapping of drought tolerance in cotton ( Gossypium hirsutum L . ) germplasm population from diverse regions of Pakistan

Drought stress is a major abiotic stress that limits crop production. Molecular association mapping techniques through linkage disequilibrium (LD) can be effectively used to tag genomic regions involved in drought stress tolerance. With the association mapping approach, 90 genotypes of cotton Gossypium hirsutum, from diverse regions of Pakistan were used. The morpho-physiological traits of all genotypes were evaluated in greenhouse under well-watered and drought stress conditions. Mean squares from analysis of variance for all morpho-physiological traits revealed highly significant variations (P≤0.05) between water levels and genotypes. Cotton varieties were screened for polymorphism with 180 simple sequence repeat (SSR) markers. Out of these 180 SSR markers, 95 were polymorphic. Genotyping of the selected 95 SSR primer pairs generated 57.5% polymorphism, and the number of polymorphic alleles per primer was 2.10. Population structure, linkage disequilibrium, and association mapping between pairs of SSR marker loci were studied. The significance of pairwise LD (P≤0.005) among all possible SSR loci was evaluated at significant threshold values (R 2 ≥0.05); 7.1% of the SSR marker pairs showed significant pairwise LD in 90 accessions of G. hirsutum. Also we observed a significant (R 2 ) LD between 13 pairs of SSR loci; each pair within the same chromosome in a range of 180 cM between NAU1230 and NAU3095 loci in chromosome (D5) and 1.612 cM between NAU462 and NAU3414 in chromosome A9. This indicates tight linkage between two alleles on the same chromosome. Markers, NAU3414, NAU2691, NAU1141 and NAU1190 were associated with more than single traits under drought treatments. Highest phenotypic variance explaining (R 2 ) was ascribed to NAU3011 chromosome D13 significantly (p0.001) associated with root length under drought treatment.


INTRODUCTION
Global climate change poses serious problems for sustained crop production.Due to the continuous water deficit for agricultural production, development of drought tolerant crop to meet the food and fibre demand has become a necessity (Saeed et al., 2012).Increasing aridity of semi-arid regions and limited water resources have led to a crucial necessity for improving crop drought resistance (Passioura, 2007).There is extensive research on genetic and breeding programs of cotton and it has a long history of improvement through conventional breeding and selection with frequent long-term yield achievements.The identification of drought-related QTLs plays an essential role in crop improvement through marker assisted selection.DNA marker studies have laid basis for revealing the molecular basis for the traits related to drought tolerance (YongSheng et al., 2009).Among the variety of genetic markers, SSR markers have shown high potential to detect polymorphism (Dongre et al., 2011;Dahab et al., 2013) and have been used extensively for cotton genome mapping and marker assisted selection (He et al., 2007).Researchers have mapped QTLs for morphological traits (Liang et al., 2014), physiological traits (Saeed et al., 2011), earliness (Li et al., 2013), yield (LiFang et al., 2010) and fibre traits (Islam et al., 2014).Abdelraheem et al. (2015) assessed a backcross inbred line (BILs) population derived from a cross between Pima cotton, 'Pima S-7' (Gossypium barbadense) and Upland cotton 'Sure-Grow 747' (Gossypium hirsutum) for their drought tolerance morphological traits using PEG induced osmotic stress.A total of six QTLs were detected for plant height (PH), fresh shoot and root weight (RW), explaining 10.9 to 19.2% of the phenotypic variation (PV).Rodriguez-Uribe et al. (2014), studying drought tolerance of cotton, stated that, of a total of 110 drought-responsive genes identified through a microarray analysis, there was 79% expression.Saeed et al. (2011) evaluated F2, F2:3 and F3:4 populations derived from an intraspecific cross between two G. hirsutum lines for morphological, physiological and yield traits.Seven QTLs were detected for osmotic potential, osmotic adjustment, plant height, and seed cotton yield.Association mapping, based on linkage disequilibrium (LD), is a new methodology which examines thousands of polymorphisms for assessing QTL effect.It is more effective compared to linkage analysis since it does not require generation of segregating populations/large numbers of progeny.Association mapping has three advantages: increased mapping resolution, reduced research time, and greater allele numbers.It is a powerful technique used to identify genomic regions linked to specific variants of a phenotypic trait (Saeed et al., 2014).Genome-wide association in plants has wide range of use and there are many reports of association studies on many crops such as barley (Gutiérrez et al., 2011;Cockram et al., 2010), rice (Shao et al., 2011), bread wheat (Yu et al., 2012), maize (Poland et al., 2011), triticale (Niedziela et al., 2012), bean (Shi et al., 2011), sugar beet (Würschum et al., 2011), cotton fiber quality traits (Abdurakhmonov et al., 2009) and cotton salinity stress tolerance (Saeed et al., 2014).Cotton Gossypium spp. is widely used for natural fiber in the textile industry.The worldwide commercial effect of cotton production is ~$500 billion per annum with consumption of ~115-million bales or ~27million metric tons (MT) of fibre (Chen et al., 2007).Cotton is one of the main warm-season cultivars, grown during summer in arid and semiarid regions where water is limited (Singh et al., 2007).The adaptive traits of plants to unfavourable environmental conditions include numbers of physiological, morphological and biochemical features of whole plant (Saeed et al., 2012).Genes involved in molecular mechanisms can be tagged with the help of molecular mapping approaches.In our present study, we assessed extent of LD in the G. hirsutum germplasm from diverse regions of Pakistan.The aims of this study were to assess the population structure, linkage disequilibrium (LD), and association of molecular markers with drought stress tolerance of cotton (G.hirsutum L.) in a collection of 90 elite cotton germplasm accessions.

Plant
The plant material consisted of 90 genotypes of cotton G. hirsutum (Table 1) collected from diverse regions of Pakistan.The cotton varieties were grown in green house during March-April 2010.Four seeds of each genotype were sown after soaking overnight at field capacity of 2-3 cm 3 depth in 8 polythene bags of 25×5 cm 2 , filled with ~ 1.15 kg of compost soil (peat, sand, soil, 1:1:1).They were arranged in a randomized complete block design with three replications and two treatments.One set consists of 4 bags kept as a control (W1) and the other as water stressed (W2).After germination, only one plant/bag was kept for data recording.Standard pH (6.5), temperature (25 ± 2°C), humidity (50%) and light requirements (13 h photoperiod) for cotton growth were maintained throughout the total duration of experiment.Seedlings grown under both stress and non-stress conditions were irrigated and fertilized till the development of the first true leaf, and thereafter, seedlings grown under control condition (W1) were watered daily to keep the soil at field capacity.Water stress condition was developed by withholding water supply to the seedlings grown under water stress condition (W2), and the effect of water stress was monitored visually and with soil moisture meter (HH2 Theta Probe Type, Delta-T device, Cambridge, England).At initial wilting stage (observed visually), when soil had 14 to 16% soil moisture content, the stressed plants were watered to relieve the sign of wilting but not enough to reach field capacity.The experiment was continued for 45 days from the date of emergence till the 3 rd main stem leaf was fully expanded.The plants grown under normal water supply and stressed conditions were measured for morphological and physiological parameters.
On 26th April, 2010 green-house experiment was completed and the following parameters were measured.First, plant length (PH) and plant fresh weight (PFW) were recorded after the plants were separated into shoot and root parts, and data were recorded for shoot length (SL), root length (RL), fresh shoot weight (FSW) and fresh root weight (FRW).The respective shoots and roots of all plants were then oven-dried at 70°C till a constant dry weight was reached.The dry weight of shoot and root of respective plants were recorded and summed up to get the dry plant weight (DPW).The root shoot ratio (RSR) was calculated using the formula: RSR= DRW/DSW.Relative water contents (RWC) were calculated using the following formula: RWC = [(Fresh weight -Dry weight) / (Turgid weight -Dry weight)] × 100.

SSR genotyping
For extraction of the genomic DNA from each accession group, 4 to 5 young fully expanded leaves from each plant were collected and stored at −80°C.The genomic DNA was isolated from the frozen leaf tissues using the cetyltrimethyl ammonium bromide (CTAB) method described (Zhang et al., 2000).The DNA samples were stored at -20°C until further use.The DNA quality was evaluated with 1% agarose gel electrophoresis prepared using 0.5X TAE buffer, and ethidium bromide (10 ng/100 ml) was added to the gel to stain the DNA bands.The samples were electrophoresed for approximately 30 min after which the products were viewed using an ultraviolet trans-illuminator and photographed using the Syngene Gel Documentation System.The DNA concentration was estimated by the absorbance at 260 nm.The working DNA samples (containing 50 ng μL -1 ) were stored at 4°C for genotyping.SSR primer pairs used were from different sources: NAU from Nanjing Agricultural University, Nanjing, China (Han et al., 2006); BNL primers from Research Genetics Co. (Huntsville, AL, USA, http://www.resgen.com);JESPR from sequences of Reddy et al. (2001); TM from Dr. John Yu, USDA-ARS, Crops Germplasm Research Unit, TE, USA; CIR from Nguyen et al. (2004) (Table 2).Details about these markers can be found at www.cottonmarker.org.All 90 accessions were genotyped using a 180 core set of SSR marker primers.These chromosome-specific primer pairs were selected using the results of different laboratories and published papers (Siu et al., 2000;Han et al., 2006;Shen et al., 2005;Abdurakhmonov et al., 2007).They were based on information related to important QTLs and chromosome distribution.The PCR amplifications were performed in a 10 μl reaction mix containing 1 μl 10× PCR buffer, 0.2 μl dNTPs (5 mM each), 0.1 μl 25 mM MgCl2, 0.1 μl Taq DNA polymerase, and 1 μl (50 ng) genomic DNA.The microsatellites were amplified using the standard polymerase chain reaction (PCR) procedures described by Zhang et al. (2000).Two millilitre of PCR products was separated vertically on denaturing 16% polyacrylamide gels with 5× TBE buffer at 180 V for 45 min and stained with silver (Bassam et al., 1991).A 50 bp DNA ladder was used to estimate allele sizes in

Silver staining and development of bands
After a specific migration of the band on the gel, the gels were placed in fixing solution (40% ethanol and 10% glacial acetic acid) for 20 min.They were washed three times with distilled water and stained with silver staining solution (0.2% AgNO3) for 20 min.After staining, the gels were again washed three times with distilled water for 20 s, and the developing solution (3% NaOH and 0.05% formaldehyde) was applied for 3 to 5 min.

Inference of population structure for association mapping
The STRUCTURE software is a DOS, Windows, UNIX (Solaris) and Linux based database that performs a model-based clustering method for gathering the occurrence of population structure, finding diverse genetic populations, allocating individuals to populations, and classifying migrants and admixed individuals.Complementary studies on genotypic data for evaluating the population structure before continuing with LD analysis were performed by a modelbased approach; they were implemented in the software package STRUCTURE (Pritchard et al., 2000) to identify subgroups in cotton cultivars under study.Admixture model under independent allele frequencies using the burn-in time of 50,000, and a number of MCMC repeats at 100,000 were used (Pritchard et al., 2000), with the K ranging from 2 to 15.

Extent of linkage disequilibrium and marker-trait association analysis
The genome-wide LD between pairs of SSR marker loci was studied using the software package TASSEL ver.2.1.Linkage disequilibrium was estimated by a weighted average of squared allele-frequency correlations (R 2 ) between SSR loci.The significance of pairwise LD (P-values≤0.005)among all possible SSR loci was evaluated using TASSEL.The LD values between all pairs of SSR loci were plotted as LD plots using TASSEL to estimate the general view of genome-wide LD patterns and to evaluate LD structures.The marker-trait associations were calculated by GLM association test incorporating Q matrices from STRUCTURE2.2 into TASSEL software package (Bradbury et al., 2007).To assess significant marker-trait associations P-marker  0.05 was used.

Phenotypic variation
The cotton varieties under study revealed a wide range of phenotypic variation in morpho-physiological traits under both control and drought treatments (Table 3).Mean squares from analysis of variance for all morphophysiological traits revealed highly significant variation (P≤0.05) with respect to water levels and genotypes.However, the interaction between water levels and genotypes was significant for shoot length, fresh shoot weight and dry shoot weight (Table 1).Mean value of all 90 genotypes indicated significant reduction in all seedling traits.The considerable amount of genotypic variance apparent in all traits shows that variance under water stress conditions is genetically determined and selection of varieties\lines for drought tolerance on the basis of seedling traits is possible.Correlation coefficients between means in the stressed and nonstressed conditions were positive and highly significant (P ≤ 0.01) for most of the traits (Table 4).The frequency distribution of morpho-physiological traits under control (W1) and water stressed (W2) treatments indicated considerable genotypic variance (Figure 1).

SSR genotyping, population structure, pairwise linkage disequilibrium and LD decay
The selected SSR primer pairs generated a total of 241 SSR alleles, of which 147 (60%) were polymorphic, resulting in 57.5% polymorphism.The average number of polymorphic alleles per primer was 2.10.To determine the population structure and number of subgroups in cotton cultivars under study, a model-based approach, implemented in the software package STRUCTURE (Pritchard et al., 2000), the distribution of log probability of data, LnP(D), did not show a clear peak against any value of K, but by the use of parameter ΔK, rate of change in the log probability of the data, graph peaked against a value of K = 5.This confirmed 5 subpopulations in the germplasm at significant threshold values (R 2 ≥0.05).7.1% of the SSR marker pairs showed significant pairwise LD in 90 accessions of G. hirsutum in our study.Plots for pairwise LD between SSR markers demonstrated significant LD blocks in the genome-wide LD analysis.We observed a significant (R2) LD between 13 pairs of SSR loci within the same chromosome in range of 180 cM between NAU1230 and NAU3095 loci in chromosome D5 and 1.612 cM and between NAU462 and NAU3414 in chromosome A9.Triangle plots for pairwise LD between SSR markers demonstrated significant LD blocks in the genome-wide LD analysis.Genome-wide LD decay was assessed by plotting r2 LD values as a function of genetic distance in cM.Two long stretches of LD blocks were observed on chromosomes A3 and D9, extending to a distance of 180 and 77 cM respectively (Table 5).Genome-wide LD at r 2 > 0.1 rapidly decayed within ~1.61 to 11 cM, indicating a strong potential for association mapping (Saeed et al., 2014).The percentage of SSR loci pairs in LD observed in 90 G. hirsutum (7.1%) was comparable with reports in cotton (11 and 12%) (Abdurakhmonov and Addukarimov, 2008) maize (10%) (Remington et al., 2001) and sorghum (8.7%) (Hamblin et al., 2004).A high recombination rate in allopolyploid cottons was reported (Brubaker et al., 1999) and it might be one of the factors explaining the observed low level of pairwise LD in cotton, along with mutation, selection, and genetic drift that occurred in the domestication of G. hirsutum germplasm.

Marker-trait association for morpho-physiological traits
A total of 21 marker loci identified by GLM analysis were significantly associated (P≤0.001) with phenotyped traits under both control and drought treatment (Table 6).Out of these 21 markers, NAU3414, NAU2691, NAU1141 and NAU1190 were associated with more than one morphophysiological trait under drought treatments (Table 4).Phenotypic variance explained (R 2 ) value ranging from 9.91 to 19%.Highest phenotypic variance explaining (R 2 ) was ascribed to NAU3011 chromosome D13 significantly (P≤0.001)associated with root length under drought treatment.This locus appeared to be a major locus as it is associated with highest phenotypic variance.NAU3414 located on chromosome A9 was associated with

DISCUSSION
Currently, association mapping methods has been used in diverse plant species such as bread wheat (Yu et al., 2012;Phumichai et al., 2012), barley (Cockram et al., 2008), triticale (Niedziela et al., 2012) and bean (Shi et al., 2011).For molecular studies, there should be a reasonable degree of variability present among the organism of interest; only then the molecular approaches can identify the genetic case underlying this variability.As there was a significant variability shown in our experimental material under greenhouse conditions, thus our molecular study findings are of future significance.In our cotton germplasm, the number of polymorphic alleles detected per primer pair ranged from one to eight, with 2.10 alleles per primer pair.The SSR markers revealed a considerable amount of variation in the sampled genome, even though the overall polymorphism detected for these cotton cultivars was relatively low.The narrow genetic base of cotton has been mentioned in many studies using such molecular markers as SSRs (Bertini et al., 2006;Zhang et al., 2011;Kalivas et al., 2011), within Upland cultivars, which generally reveal a low level of genetic variety.There was little variation in the estimation of the molecular diversity among the upland cultivars (G.hirsutum).However, Abdurakhmonov et al. (2007) reported that the genetic distance for the Upland cultivars was in 0.01 to 0.28 range.The simple sequence repeats (SSR) allelic diversity found in our population for association analysis is approximately the same as the total diversity presented in more extended studies.The same mean number of alleles per locus as in our study was found in a collection of 106 accessions, with 2.13 SSR alleles (Guo et al., 2006).In this study at significant threshold values (R 2 ≥0.05), 7.1% of the SSR marker pairs showed significant pairwise LD in a total of 90 accessions of G. hirsutum.This is comparable with previous reports on cotton: 11 to 12% of SSR loci pairs in the exotic G. hirsutum accessions (Abdurakhmonov and Abdukarimov, 2008), 4% SSR markers in G. hirsutum variety accessions (Abdurakhmonov et al., 2009) and 3% cotton germplasm from China and USA (Saeed et al., 2014).In our study, cotton germplasms used are from Pakistan; whereas in previous reports, the cotton germplasm used were of African, Australian, and Latin American, Mexican, Uzbek, China and USA ecotypes.The identification of QTLs' morpho-physiological traits related to drought tolerance has been reported in many studies: they include drought-related QTLs for morphological traits (Liang et al., 2014), physiological traits (Saeed et al., 2011), earliness (Li et al., 2013), yield (LiFang et al., 2010) and fibre traits (Islam et al., 2014).In our study, significant marker-trait associations were found in a total of 21 marker loci which were significantly associated (P ≤0.001) with phenotyped data under both control and drought treatment.Markers, NAU3414, NAU2691, NAU1141 and NAU1190 were associated with more than one traits under drought treatments.Highest phenotypic variance explaining (R 2 ) was ascribed to NAU3011 loci significantly (P≤0.001)associated with root length under drought treatment.This locus appeared to Dahab et al. 2611 be a major locus as it is associated with highest phenotypic variance.NAU3414 located on chromosome A9 was associated with maximum number of traits (shoot length, shoot fresh weight and dry shoot weight whose value ranged from 11.3, 17.9 and 11.29% respectively).Six markers were exclusively associated with drought treatment (W 2 ).This study also proved that association mapping approach has strong potential to assess significant marker-trait associations, save much time and cost compared to linkage mapping approach.

Table 2 .
List of SSR markers used in the study.
base pairs (bp) for gels.DNA bands of SSRs were developed with silver staining, and recorded.

Table 3 .
Mean squares of the ANOVA of morpho-physiological traits.

Table 5 .
The pairwise genome linkage disequilibrium (LD) between pairs of SSR markers in same chromosomes and their distances.
maximum number of traits (shoot length, shoot fresh weight and dry shoot weight whose value ranged from 11.3, 17.9 and 11.29% respectively).Six markers were exclusively associated with drought treatment (W 2 ).