Genetic variation of twenty autosomal STR loci and evaluate the importance of these loci for forensic genetic purposes

The aim of this study was of twofold. One was to determine the genetic structure of Iraq population and the second objective of the study was to evaluate the importance of these loci for forensic genetic purposes. FTA® Technology (FTATM paper DNA extraction) utilized to extract DNA. Twenty (20) STR loci and Amelogenin), including D3S1358, D13S317, Penta E, D16S539, D18S51, D2S1338, CSF1PO, Penta D, THO1, vWA, D21S11, D7S820, TPOX, D8S1179, FGA, D2S1338, D5S818, D6S1043, D12S391, D19S433 and Amelogenin amplified by using power plex21® kit. Polymerase chain reaction (PCR) products detected by genetic analyzer 3730xL then data analyzed by PowerStatsV1.2. Based on the allelic frequencies, several statistical parameters of genetic and forensic efficiency have been estimated. This includes the homozygosity and heterozygosity, effective number of alleles (n), the polymorphism information content (PIC), the power of discrimination (DP) and the power of exclusion (PE). The power of discrimination values for all tested loci was from 75 to 96%; therefore, those loci can be safely used to establish a DNA-based database for Iraq population. The high PIC values of the selected markers confirm their usefulness for genetic polymorphism studies and linkage mapping programs in human as well. The mean heterozygosity observed, is expected to have mean PIC values across the 20 loci which were 0.77, 0.81 and 0.78, respectively, indicating high gene diversity.


INTRODUCTION
Microsatellites refer to DNA with varying numbers of short tandem repeats (Klintschar et al., 2006) between a unique sequence.DNA regions with repeat units that are 2 bp to 7 bp in length or most generally short tandem repeats (STRs) or simple sequence repeats (SSRs) are generally known as microsatellites (Ellegren, 2004).
In the core repeated bases, long repeat units may contain several hundred to thousands (Butler and Hill, 2012).Within the DNA there are length and sequence polymorphisms (Silvia et al., 2009).DNA can be used to study human evolution using human genome analysis regions that are not subjected to selection pressure (Mats et al., 2007;Imad et al., 2014).Besides, information from DNA typing provides vital information in medico-legal with polymorphisms allowing for more biological studies (Walkinshaw et al., 1996).
It has been found that microsatellites are evenly distributed in the genome on all chromosomes and all regions of the chromosome (Ensenberger et al., 2010;Imad et al., 2014).They can also be found inside gene coding regions, introns, and in the non-gene sequences.Most microsatellite loci are really small, ranging from a few to a few hundred repeats and this small size of microsatellite loci is important for PCR-facilitated genotyping.Basically, microsatellites containing a higher number of repeats are more polymorphic.
The Cooperative Human Linkage Center http://www.chlc.orgevaluates the genetic markers and the loci are selected from there (Table 1) which provides particulars on the additional STR loci, chromosomal location and repeat sequence for each core STR locus (Ruitberg et al., 2001;Klintschar et al., 2004;Klintschar et al., 2005).Therefore, the repeat motif for each STR marker is listed based on this.A significant fact is that STR allele sizes are measured relative to an internal size standard during electrophoresis.This depends on the DNA strand that is labeled using a dye that may have a different apparent measured size.
The PowerPlex® 21 System is compatible with automated PCR instrument and with the ABI PRISM® 3100, 3100-Avant, 3130, 3130xl, 3500 and 3500xL Applied Biosystems Genetic Analyzers.In the United States, Europe and Asia the PowerPlex® 21 System is used and it increases the discriminatory power and datasharing possibilities by incorporating informative loci.The PowerPlex® 21 System includes the 13 CODIS core STR loci, two loci commonly used in Europe (D1S1656 and D12S391).
In China, the D6S1043 locus is commonly used.(Amelogenin, Penta D, Penta E, D2S1338 and D19S433) are several additional markers used throughout the world.In forensic casework and DNA databases, addition of new autosomal is very important to increase the discrimination power for human forensic identification.
This study was aimed at investigating the genetic variation and Forensic efficiency parameters of 20 autosomal STR loci from random unrelated individuals in the middle and south of Iraq.

Population
Four hundred (400) healthy, randomly chosen individuals deriving from the middle and south of Iraq provinces (Baghdad, Babil, Diwania and Basrah).The number and ethnicity of individuals were chosen in order to obtain a population sample to achieve the highest possible representation of the major ethno-religious and tribal groups of the country living in these central and southern areas.

DNA extraction
DNA was extracted from all dried blood samples on FTA cards following the manufacturer's procedure as described in Whatman FTA Protocol BD01 except that the Whatman FTA purification reagent was modified to half the volume (Dobbs et al., 2002).A 1.2 mm diameter disc was punched from each FTA card with a puncher.The discs were transferred to new Eppendorf tubes and washed three times in 100 μl Whatman FTA purification reagent.Each was washed and incubated for 5 min at room temperature with moderate manual mixing and the reagent was discarded between washing steps.The discs were then washed twice in 200 μl TE buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 8.0), the buffer was discarded and the discs were left to dry at room temperature for 1 h.

Typing
Using the ABI Prism1 3130xl Genetic Analyzer 16-capillary array system (Applied Biosystems, Foster City, CA, USA), following manufacturer's protocols, with POP-7™ Polymer and Data Collection Software, GeneMapper® V3.2 software (Applied Biosystems, Foster City, CA, USA).By comparison, the size of a sample's alleles to size the alleles in allelic ladders for the same loci are being tested in the sample, the STR genotyping was conducted.

Quality control
Experimental procedures were performed according to the guidelines of the external blind proficiency test of the GEDNAP (http://www.gednap.org)(Rand et al., 2002;Rand et al., 2004).

Statistical data analysis
The PowerStatsV1.2 (Promega, Madison, USA) was used to calculate the observed heterozygosity (Ho), power of discrimination (PD), probability of exclusion (PE), and polymorphism information content (PIC).Arlequin software program (Schneider et al., 1998) was used to conduct the exact test of population differentiation.In addition, Arlequin software program was used for the expected heterozygosity (He), Hardy Weinberg Equilibrium (HWE) and linkage equilibrium tests as well as for the F-statistics.Where test results with P-values less than 0.05 were observed, and the Bonferroni correction had to be applied to the data.The Bonferroni procedure (Weir, 1996) adjusted the rejection level for the smallest P-value at an overall level of α = 5 % to 0.05/x, where x is equal to the number of tests conducted on the data.The Ho and He values were calculated by means of the same software program.(Cotton et al., 2000;Wiegand et al., 1993) physical positions are from (Schneider et al., 1998).

Allele frequency of common autosomal genetic loci
After the samples have been collected, DNA extracted and PCR amplified were genotyped for the 20 STR loci of interest.The genotyping information was then converted into allele frequencies by counting the number of times each allele was observed.Allele frequencies for each of the 20 STR loci in the Iraq population sample are shown in (Tables 2 and 3).
Since there are some alleles which were not sampled sufficiently and an estimate of an allele frequency is uncertain if the allele is so rare that it can represented only once or a few times in a dataset, it is recommended that each allele was observed at least five times to be used in forensic calculations (Butler, 2007).The minimum allele frequency is 5/ (2n) where n is the number of individuals sampled and 2n is the number of chromosomes (as autosomes are in pairs due to inheritance of one chromosome from each parent).
The best indicators of the genetic polymorphism within the sample are verified by the number of alleles and the expected heterozygosity is found in the Iraq population.Basically, the number of alleles is highly associated with the size of the sample.This is due to the presence of unique alleles in populations, which occur in low frequencies.The usefulness of the markers for genetic screening is verified by the number of alleles scored for each marker.
The number of alleles and the expected heterozygosities detected in Iraq population are good indicators of the genetic polymorphism within the breed.Generally, the number of alleles is highly dependent on the sample the sample size because of the presence of unique alleles in populations, which occur in low frequencies and also because the number of observed alleles tends to increase with increases in population size.The number of alleles scored for each marker is an invaluable indicator of the future usefulness of the marker for genetic screening.
Finding the same number of alleles for certain different loci in various populations (e.g., Iran, Syrian, Emirates, Qatar and Egyptian populations) may indicate common ancestries (Reyhaneh et al., 2009;Alshamali et al., 2003;Ana et al., 2006;Clotilde et al., 2007).The frequency and the number of alleles, however, may be an indication for the degree of inbreeding within each population and thus reflects the homogeneity of the population.The analysis of STR polymorphisms by PCR-based method offers certain advantages over RFLP typing.
In recent years, short tandem repeat (STR) systems have gained importance in forensic analysis of biological specimens as well as in paternity testing, as an alternative to the use of restriction fragment length polymorphism (RFLP) analysis (Edwards et al., 1991;Hammond et al., 1994;Nakamura et al., 1987).The analysis of STR polymorphisms by PCR-based method offers certain advantages over RFLP typing: (1) STR loci can be typed with a high degree of specificity and sensitivity in a short time period, (2) these loci can be successfully amplified from a limited amount of DNA even if it is degraded, and (3) typing of multiple loci can be accomplished in a single multiplex reaction (Hochmeister et al., 1991;Lins et al., 1996;Mohammad and Imad, 2013a, b).

The amelogenin locus
The amelogenin locus occurs on both the X and Y chromosome and enables sex typing (Sullivan et al., 1993) was also located within the reference human genome sequence.AMELX is located on the X chromosome at 10.676 Mb.AMELY is located on the Y chromosome at 6.441 Mb.Amplification of Amelogenin generates different length products from the X and Y-chromosomes.

Observed heterozygosity and expected heterozygosity
A higher heterozygosity means that more allele diversity exists and therefore there is less chance of a random sample matching.Observed heterozygosity and expected heterozygosity all over the 20 loci are presented in (Table 4), and the observed heterozygosity oscillated between studied populations as illustrated in (Table 5).The observed heterozygosity in a population relies on the number and the frequency of alleles of each locus.Moreover, the distribution of genotypes in a population sample may deviate from HWE expectation in Hadi et al. 1215 a number of ways.These include the presence of an excess of homozygotes (and a corresponding lack of heterozygotes) or an excess (or deficiency) of one or more classes of heterozygotes or a combination of those states.There are populations with low heterozygosity, lower than 65% in most tested loci.

Paternity index
The potential of a randomly selected man to pass the obligate gene is determined by using a database, which lists the frequency distribution of individual alleles within a given genetic system.Combined paternity index is an odds ratio that indicates how many times more likely it is that the alleged father is the biological father than a randomly selected unrelated man of similar ethnic background.The paternity index was high for all STR analyzed it ranged from 2.651 (TPOX) to 2.864 (D21S11).

Random match probability
The match probability is the probability for a random match between two unrelated individuals drawn from the same population.It is the sum of the frequency squared of each genotype; it ranged from 0.011 to 0.168.

Power of discrimination
Values for all tested loci were 75% for TPOX.Ranged from 80 to 89% for the D3S1358, D13S317, D5S818, D12S391, vWA, Penta D, D16S539, D1S1656 and CSF1PO loci, and ranged from 91 to 96%, for the rest of the loci.This infers that a DNA-based database for Iraq population can be safely used by using these loci.The highest PD observed in some populations is presented in (Table 6).The Penta E and Penta D loci included in the PowerPlex®21 PCR amplification kits were not typed in the Turkey, Emirates, Iran or Qatari populations because they used different kits in their genotyping studies.The Combined Discrimination Power (CDP) for the Iraq population of middle and south of Iraq for the corresponding 20 STR loci used has been calculated as 0.999999972.These results mean that those loci can be safely used to establish a DNA-based database for Iraq population.

Chance of exclusion
The Power of Exclusion (PE) can be calculated to express how rare it would be to find a random man who could not be excluded as the biological father of the child (Fisher, 1951;Chakraborty and Stivers 1996;Butler,

Table 1 .
Information on 21 autosomal STR loci present in The PowerPlex® 21 System kits Adapted from /www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unists.n The 13 CODIS core loci are highlighted in bold font.