Scientific Research and Essays

  • Abbreviation: Sci. Res. Essays
  • Language: English
  • ISSN: 1992-2248
  • DOI: 10.5897/SRE
  • Start Year: 2006
  • Published Articles: 2768

Full Length Research Paper

Characterization of de novo assemblies of quasispecies from next-generation sequencing via complex network modeling

Mattia C. F. Prosperi1*, Sandro Meloni2,3, Iuri Fanti4, Stefano Panzieri3, Giovanni Ulivi3 and Marco Salemi1
  1Department of Pathology, Emerging Pathogens Institute, Immunology and Laboratory Medicine, University of Florida, Gainesville, Florida, USA. 2Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain. 3Department of Computer Science and Automation, Faculty of Computer Science Engineering, University of Roma TRE, Rome, Italy. 4Clinic of Infectious Diseases, Catholic University of the Sacred Heart, Rome, Italy.
Email: [email protected]

  •  Accepted: 03 August 2012
  •  Published: 23 August 2012

Abstract

 

Several worldwide pandemics, such as influenza, human immunodeficiency virus, and coronavirus, are caused by viral quasispecies. Characterization of quasispecies harboring in a host is essential to unveil the mechanisms that are at the base of the pathogen evolution, infection and spread at the epidemic level. Next generation sequencing (NGS) produces many thousands of sequence fragments from a single sample, allowing the full genome sequencing at high resolution. In this work, an original approach for the de novo assembly (reconstruction of a full genome without the need of a reference genome) of NGS reads into the quasispecies present in the sample is introduced, using biased random walks over an overlap graph construction. The proposed framework is shown to be successful in reconstructing viral quasispecies at different diversities, using both simulated and empirical data. In addition, a broad set of measures describing topological properties of the overlap graphs is examined, in order to highlight differences in the data sets and therefore in the population structures.

 

Key words: Next-generation sequencing, genome assembly, quasispecies, complex network, random walk, de novo assembly.