Next-generation sequencing platforms for latest livestock reference genome assemblies

Next-generation sequencing (NGS) or high-throughput sequencing platforms refers to different modern sequencing technologies that allow scientist to sequence millions to billions of reads (base pairs) in a single, much more quicker and cheaper cost than the previously used Sanger (first generation) sequencing method. Next-generation sequencing platforms classified as second generation sequencing platforms which require polymerase chain reaction (PCR) amplification and third generation sequencing platforms which do not require PCR amplification for sequence. The appearance of next generation sequencing technologies has revolutionized the genome sequencing of organisms including farm animals. These platforms enable scientists to access latest and detailed information about genetic markers that are responsible for economically important traits. Identification of candidate genes responsible for these traits in different species can bring down the overall cost of livestock breeding by improving productivity and disease resistance. Even though the output and error rate of third generation sequencing platforms (Pacific Biosystems and Nanopore) remain to be improved, they offered long read length relatively with cheaper sequencing cost and easy sample preparation. Oxfored NanoporeMinION devices are the most portable third generation DNA sequencer available in the size of a small cell phone that can be plugged into the USB of a laptop.


INTRODUCTION
Advancement of molecular genetics knowledge is transforming animal breeding industry and has become big business since the time of its application in the last few decades.It has been used to improve efficiency of reproduction, determination of genetic values of animal (genetic markers, candidate genes) (Singh et al., 2014) and animal breeding decisions (William, 2016).Until the technologies were widely used to sequence DNA of many organisms (Franca et al., 2002).But limitations of these sequencing technologies such as throughput, scalability, time, cost, and resolution directed commercial manufacturers to develop ground-breaking New Sequencing Technologies (NGS).Major and widely used NGS methods in many laboratories include: Roche/454 (www.454.com),Illumina Genome Analyzer ((https://gwww.illumina.com),Ion torrent sequencing (https://(https://www.thermofisher.com) and Applied Biosystems SOLiDTM System (www.marketing.appliedbiosystems.com)which are called second generation sequencing technologies.Pacific BioSciences (https://www.pacificbiosciences.com) and Oxford Nanopore ((https://www.nanoporetech.com) are the most recent platforms classified under thirdgeneration sequencing technologies.The aforementioned new sequencing technologies are known under the name of "Next Generation Sequencing Technologies".
NGS are enabling researchers to generate cost effective large amounts of genomic data in short period even without any amplification of the DNA molecules.These technologies enabled researchers to identify DNA regions [quantitative trait locus (QTL)] over the whole genome associated with a particular trait of interest.Single Nucleotide Polymorphisms (SNPs) genotyping are most powerful and accessible technologies that can provide useful genetic tools for animal breeding programs (Seidel, 2010;Koopaee and Koshkoiyeh, 2014).Over the past five years, large numbers of SNPs have been discovered in livestock species by performing wholegenome association studies (WGAS) at relatively low cost (Wiggans et al., 2017).Though, there is limitation to the application of sequence data for all livestock species (due to their diversity), the access for quick, accurate and low-cost low cost animal's genome sequence data using next generation sequencing has many applications in livestock husbandry.DNA microarray Chromatin Immunoprecipitation (ChIP) or Chromatin Immunoprecipitation sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole-genome genotyping, de novo assembling and reassembling of the genome, genome-wide structural variation, mutation detection and carrier sequencing are among the recent NGS technologies which are used to improve livestock productivity (Bai et al., 2012;Diaz-Sanchez et al., 2013).Currently, many of genome sequencing centers and scientists are busy in generating up-to-date genomic information for several species using latest NGS versions.Livestock species like poultry are among those species which have gotten high consideration by genomic sequencing projects.Access for latest genome sequencing tools, methods and genomic information by livestock scientist helps to elucidate complex traits and use the information gained in livestock breeding programs.Thus, this paper aims to summarize most important available sequencing platforms and latest livestock genome assemblies using those sequencing technologies.

Genomic technologies in livestock breeding
Estimated breeding values (EBVs) for traits of interest are the main criteria to select individual animals to be the parent of the next generation (Dekkers, 2012).

Gatew and Tarekegn 1233
Traditionally, selection of individuals for genetic improvement of livestock breeds has been based on EBVs on phenotypic characteristics only.Though the classical method has made significant contribution for livestock genetic improvement, it has been challenged with several limitations such as requires long period until an individual shows those phenotypes, require big size population, high cost of phenotype recording, high environmental effect, less accuracy, less effective for complex and low heritable traits like fertility and reduction, less accuracy and sex limited traits (Bhat et al., 2016;MacNeil, 2016).To overcome these limitations, scientists and manufacturing companies have been investigated and established genomic technologies such as genome wide association, genome sequencing, genetic markers and genotyping techniques.Genomic technologies provide powerful information for animal breeding programs by characterizing and mapping the locus that affect the trait of interest particularly complex quantitative traits (Koopaei and Koshkoiyeh, 2011;Fleming et al., 2018).The advancement of genomic technologies has boosted the ability of breeders to map genes for economically important traits such as feed efficiency, milk yield, beef quality, health and behavior (Blasco and Toro, 2014).Moreover, selection decisions can be made at early age even for sex limited and low heritable traits relatively with low cost and accuracy.Next generation sequencing technologies are new molecular techniques for improving livestock performance.

Next generation sequencing technologies for livestock industry
Genomes are blueprint of life that control organisms' structural characteristics and performance throughout the lifetime via single genes or by multiple genes situated in different loci.DNA sequencing technologies are important tools in providing breeders both the structural and functional characteristics of genomes (Bai et al., 2012).Before the introduction of next generation sequencing techniques in 2005, DNA sequencing has been carried out using Maxam and Gilbert (1977) and Sanger et al. (1977) sequencing methods (Heather and Chain, 2016).
The first method uses chemicals to break up DNA in order to determine its sequence while Sanger method was by making copies of DNA strands and monitoring what nucleotides are added.Due to low speed, expensive and time consuming problems of the aforementioned sequencing methods, another new technique appeared with a high throughput from multiple samples at reduced cost of the previous techniques (Reuter et al., 2015) known as "Next Generation Sequencing (NGS) Technologies" or "High Throughput Sequencing Technologies" (Kchouk et al., 2017).NGS techniques are entirely new technologies with fascinating opportunities for livestock scientists to extract essential genetic information like SNPs and multiple candidate genes at a time with more reduced cost than first generation sequencing methods (Ansorge, 2016).
Assembling an individual animal's entire genome sequence or specific region(s) of interest is very important for livestock breeders to provide them with more timely and accurate information to improve the quality of their herds (Dekkers, 2012;Taylor et al., 2016).Genome sequences of major livestock species such as poultry, cow, horse and sheep within a population using high throughput automatic sequencing techniques are either completed or nearing completion and SNP libraries for these species are growing rapidly (Bai et al., 2012).This enabled the livestock scientists to apply genomic selection in livestock breeding programs.However, some challenges still face these applications, such as incorporating linkage disequilibrium (LD) information from HapMap projects, data storage, and especially appropriate statistical analyses on the high-dimensional structured genomics data (Fan et al., 2010).Therefore, breeders have enhanced response to selection for the traits by selecting the individuals according to records and pedigree information.
NGS platforms can be categorized into second and third generation sequencing technologies based on their age of appearance.Among the NGS technologies listed earlier, the latter two platforms (Pacific Biosciences and Oxford Nanopore) are the most recent technologies and they are distinguished in third generation sequencing technologies while the rest are in second generation technologies (Pareek et al., 2011).Third-generation-NGS technologies work with a concept of sequencing individual DNA molecules without a prior amplification step, that is, single long molecule sequencing or clonally amplification but second generation-NGS platforms rely on PCR to grow clusters of a given DNA template which was their drawback (Khodakova et al., 2016).

Illumina (Solexa) sequencing
Illumina has contributed a lot for the advancement of sequencing platforms in terms of simplicity, flexibility and capacity so that it can be applied in the field of human and animals genomics researches.Its goal is to apply innovative sequencing technologies to the analysis of genetic variation and function.Sequencing technology by synthesis approach recently used by Illumina is dominating the sequencing industry.Currently, more than 90% of the world's DNA sequencing data is generated by Illumina.Prior to the introduction of Mi-Seq and Hi-seq platforms which have the capacity to sequence up to 15 and 600 Gbp, Illumina purchased Solexa in 2007 which was released in 2005 (Barba et al., 2014).Solexa sequencer was capable to sequence 1 Gbp in single run (Illumina).The principle of Illumina (Solexa) sequencer was based on sequencing by synthesis (SBS) chemistry concept that enabled the identification of single bases as they are introduced into DNA strands.Solexa sequencing uses four branded fluorescently-labeled modified nucleotides and a special DNA polymerase enzyme to sequence the millions of clusters present on the flow cell surface (Heather and Chain, 2016).

Roche/454 sequencing
The first next-generation DNA sequencing machine was by detection of light through pyrosequencing method, which was developed in 1996 by the Stockolm Royal Institute of Technology and introduced to the market by 454 Life Sciences in 2005 (https://www.454.com) then upgraded to GS FLX Titanium series after 3 years (Pillai et al., 2017).Pyrosequencing which comprises "sequencing by synthesis" is based on detection and quantification of DNA polymerase activity, which is carried out using the enzyme luciferase.
Roche/454 sequencing platform was sequenced 580,069 bp of the Mycoplasma genitalia genome at 96% coverage and 99.96% accuracy in a single run for the first time.This system was the first NGS technology to sequence a complete human genome by producing 400 Mb per run with the maximum of 450 bp read length at the beginning and then increased to about 700 bp (Berglund et al., 2011).Roche/454 sequencing has been also successful for both confirmatory sequencing and de novo sequencing (Fakruddin et al., 2013).

Ion torrent sequencing
The Ion Personal Genome Machine (PGM), sequencing platform was commercialized by Life Technologies, now Thermo Fisher, in 2010.This sequencing platform is one of the sequencing technologies which contributed largescale transcriptome studies in the last decade (Yuan et al., 2016).The platform offers several different types of chips and instruments to increase its performance.Throughput of these chips ranges from 50 Mb to 15 GB, with runtimes between 2 and 7 h (Goodwin et al., 2016).So far the system has been on small genomes and targeted sequencing.However, a new system and new chips that will allow them to push into the high throughput territory of whole genome sequencing has been announced.Ion Torrent launched its follow on system, that is, Ion Proton in 2012 which allows for larger chips with higher densities needed for exome and whole genome sequencing (https://www.thermofisher.com).
Ion Torrent sequencing platform employs an analogous technique as pyrosequencing but it does not use enzymatic reaction and optic fluorescent labeled nucleotides like other second-generation technologies (Rothberg et al., 2011;Salipante, 2014).It detects the release of hydrogen ions (H+), a by-product of nucleotide incorporation, as quantitated changes in pH through a novel coupled silicon detector (Quail et al., 2012).The resulting change in pH is detected by an integrated complementary metal-oxide-semiconductor (CMOS) and an ion-sensitive field-effect transistor (ISFET).Detected pH is imperfectly proportional to the number of nucleotides detected and converted into a voltage signal which is proportional to the number of nucleotides incorporated (Goodwin et al., 2016).

Applied Biosystems SOLiD sequencing
Sequencing by Oligo Ligation Detection (SOLiD) is one of Life Technologies (Thermo Fisher).Similar to Illumina, Roche/454 and Ion Torrent platforms, SOLiD applies a clonal amplification through emulsion PCR and optical detection system (Levy and Myers, 2016).The former platforms sequence by-synthesis reactions whereas Applied Biosystems SOLiD platform uses ligationmediated synthesis chemistry for sequencing (Valouev et al., 2008;Levy and Myers, 2016).
The SOLiD sequencing procedure is composed of a series of probe-anchor binding, ligation, imaging and cleavage cycles to elongate the complementary strand.For a bead based preparations, the method begins with applying amplified DNA fragments to micro-beads.Beads are then deposited on a glass slide to which DNA fragments can be fixed.The glass slides can be segmented up to eight chambers to facilitate up scaling of the number of analyzed samples.The 8-mer oligonucleotides with a fluorescent label at the end are sequentially ligated to DNA fragments.The resulting product is then removed and the process repeated for 5 more cycles with hybridized primers.Fundamental properties of SOLiD sequencing procedure which contributed to the high accuracy ((https://www.appliedbiosystems.com)are described subsequently.Two bases are interrogated in each ligation reaction providing increased specificity; the primer is periodically for five independent rounds of extension improving the signal to noise ratio of the system; each Gatew and Tarekegn 1235 base is interrogated twice in independent primer rounds; four dyes are used to encode sixteen possible two base combinations and the design enables error checking capacity.

Pacific biosciences single molecule real time (SMRT) sequencing
It is a parallelized single molecule real-time sequencing method developed by Pacific BioSciences (PacBio) of California, Inc. and introduced as a third-generation sequencing technique (Shin et al., 2013) The first commercially available long read single molecule platform was the PacBio RS II system marketed in 2011 (Nakano et al., 2017).PacBio RS II system can be applied for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization.This platform evolved into other series of systems such as RSII and Sequel systems to correct low-quality reads generated by PacBio RS (Ardui et al., 2018) and it is the most widely used third-generation sequencing technology.
PacBio SMRT technology performs the sequencing reaction on silicon dioxide chips called Zero-Mode Waveguides (ZMWs) (Eid et al., 2009;Ambardar et al., 2016).The platform relies on sequencing by synthesis approach and real time detection of incorporated fluorescently labeled nucleotides, as they are emitted.ZMWs exploit the properties of light passing through openings with a diameter less than its wavelength.Each ZMW contains a DNA polymerase attached to their bottom and the target DNA fragment for sequencing.The fluorescent dye of the incorporated nucleotide can be identified during normal speed reverse strand synthesis.The detection of the labeled nucleotides makes it possible to determine the DNA sequence.
Key advantages of SMRT sequencing platform when compared with other sequencing technologies explained by Roberts et al. (2013) include: long read lengths (for de novo assemblies of novel genomes), direct measurement of individual molecules, templates can be prepared without PCR amplification, the system records the kinetics of each nucleotide incorporation reaction, simplify and improve genomic assembly and understanding of disease heritability.However, PacBio SMRT technology has a limitation of highly increased error rates (Carneiro et al., 2012;Ardu et al., 2018).

Nanopore single molecule sequencing
Nanopore sequencing is the most recent third-generation technology released by UK based company, that is, Oxford Nanopore Technologies in 2014 (Lee et al., 2016).This company is developing and selling Nanopore sequencing products (including the portable DNA sequencer, MinION) for the direct, electronic analysis of single molecules (https://www.nanoporetech.com).The MinION was first announced at the Advances in Genome Biology and Technology (AGBT) conference in Florida on February 2012, but it was not publically available until an early access program known as MAP (MinION Access Program) began on April 2014 (Timp et al. 2012).MinION, is an inexpensive (powered from the USB port of a laptop computer), pocket sized (powered from the USB port of a laptop computer), portable, high throughput sequencing apparatus that produces real time data.The device offers multi-kilobase reads and a streamed mode of operation that allows processing of reads as they are generated.This technology is capable of generating very long reads (about 50,000 bp) with minimal sample preparation (Wang et al., 2015).
Nanopore sequencing applied to overcome limitations of short read sequencing technologies and enable sequencing of large DNA molecules in short period of time from easily prepared libraries.Nanopore is just a small hole, its internal diameter is 1 nm.Nanopore is built into an electrically resistant artificial membrane and a voltage is applied across the membrane.DNA molecules are prepared according one of the standard library preparation protocols which involve attaching a leader adaptor and motor protein to one strand of DNA (Magi et al., 2017).During the sequencing process an ion current passes through the hole that is blocked by the nucleotide (Diaz-Sanchez et al., 2013).If each passing nucleotide yields a characteristic residual ionic current then the record of the current will correspond to the DNA sequence (Derrington et al., 2010) meaning changes in electric current indicate which base is present.
Nanopore DNA sequencing offers exciting potential advantages over the other short-read sequencing technologies including: sensitive detection from limited starting material, ultra-long reads, fast time to results, low cost, small footprint (https://nanoporetech.com).Despite all these, advantageous high error rate (15 to 40%) problem challenged the previous versions of nanopore DNA sequencing technology (Magi et al., 2016).
Overcoming the high error rate problems of the previous readers, new version of MinION ultra-long single molecule read named R9.4became available (Jain et al., 2017) with an increase of median accuracy to 92% and much increased yield 127000 to 217000 reads per flow cell, four flow cells sequenced (Leggett and Clark, 2017).

Performance of next-generation DNA sequencers
Sequencing technology is evolving rapidly and each commercially available DNA sequencing platforms and their new versions are released.The older platforms as well as their released new versions have similarities and differences relative to the others depending on performance and mechanism of sequencing/operation principles (Table 1).

AVAILABLE GENOME SEQUENCES OF LIVESTOCK USING NGS TECHNOLOGIES
The discovery of conserved genome sequences of farm animals is encouraging opportunities for genetic improvement activities (primarily production traits).In addition, genome sequencing could be for basic scientific understanding (understanding the evolutionary relationships between species) and human health and medical research (Pool and Waddell, 2002;Alföldi and Lindblad-Toh, 2013).Following the official date of completion of human genome project in June, 2000 (Venter et al., 2001) genomics research organizations became interested in the genomes of other species including farm animals (Dodgson et al., 2010;Guo et al., 2018) to increase the ability of animal agriculture to provide high quality, low cost and safe animal products to the consumer.So it has been now possible to do genomic sequencing, uncover a large number of genetic polymorphisms and report estimated number of genes for most of the farm animal species using high-throughputnext generation sequencing (HT-NGS) platforms.
Chicken (Gallus gallus domesticus) was the first agricultural bird for its genome to be sequenced and published in international Chicken Genome Sequencing Consortium (2004).However, the initial sequence assembly for chicken was as model organism for phylogenetics and embryology that bridges human and other vertebrates (Burt, 2007;Warren et al., 2017).The second sequencing of problem areas plus an assembly using additional linkage and radiation hybrid map data done in 2006.Subsequently, chicken genome sequencing reached a new level in order to provide a dense single nucleotide polymorphism (SNP) map.Different sequencing centers are collaborating in sequencing of the chicken full genome and other common farm animals using high-throughput-next generation sequencing platforms (making a full genome sequencing) since most of animal traits are complex and a single locus accounts for very little of the phenotypic diversity (Bolormaa et al., 2011).High-quality reference genome assemblies are also required in order to analyze the result sequencing data (Seemann et al., 2015).Consequently, currently latest reference genome sequences for most common farm animals like chicken, pig, cattle, sheep, goat, horse, etc., have been partially or completely sequenced and are publicly available (Table 2).

CONCLUSIONS
In the past few decades, new sequencing technologies (high-throughput) have been widely used to sequence Gatew and Tarekegn 1237 However, they have high error rate and relatively low throughput.Therefore, performance improvements of these two platforms from the current models focusing on minimizing error rate and increasing output data is required.Currently, latest reference genome sequences using NGS for common farm animals, poultry and insect such as chicken, cattle, sheep, and goat, horse and honey bee are available to be referenced by livestock scientist.But it is difficult to get partial or full reference genome sequence information for donkey and camel though they are among common farm animals.This indicates that donkey and camel are still neglected animals but play an

Table 1 .
Next generation platforms operation principles and their performances.

Table 2 .
Latest livestock representative genomes sequences.