Journal of
Bioinformatics and Sequence Analysis

  • Abbreviation: J. Bioinform. Seq. Anal.
  • Language: English
  • ISSN: 2141-2464
  • DOI: 10.5897/JBSA
  • Start Year: 2009
  • Published Articles: 50

Full Length Research Paper

UniDPlot: A software to detect weak similarities between two DNA sequences

Marc Girondot1,2* and Jean-Yves Sire3
  1Laboratoire d’Écologie, Systématique et Évolution, UMR 8079 Centre National de la Recherche Scientifique, Université Paris Sud et ENGREF, 91405 Orsay cedex 05, France. 2Département de Systématique et Evolution, Muséum National d’Histoire Naturelle de Paris, 25 rue Cuvier, 75005 Paris, France. 3Université Pierre and Marie Curie-Paris 6, UMR 7138 "Systématique, Adaptation, Evolution", 7 quai St-Bernard, 75005 Paris, France.
Email: [email protected]

  •  Accepted: 21 June 2010
  •  Published: 31 October 2010

Abstract

 

Search for DNA sequence similarity is a crucial step in many evolutionary analyses and several bioinformatic tools are available to fulfill this task. Basic local alignment search tool (BLAST) is the most commonly and highly efficient algorithm used. However, it often fails in identifying sequences showing very weak similarity. An alternative method is to use Dot Plot, but such a graphical method is not suitable for the analysis of large sequences (e.g. hundreds of kilobases) as this is now more often required in the context of genome sequencing programs. As an alternative to the classical Dot Plot method, we designed UniDPlot, which permits to search for weak similarity either between two large sequences (e.g., genome regions, ...) or between one large sequence and a short one (e.g., exons, …). UniDPlot methodology contracts the output of the Dot Plot similarity matrix along the length of the largest sequence, while defining statistical limits of significance using a bootstrap procedure. To illustrate the efficiency of this method, we used UniDPlot to search for the fate of the gene that encodes the major enamel protein, amelogenin, in chicken. Although we showed that amelogenin was invalidated through a pseudogeneization process, we recovered the entire sequence in the chicken genome. Using UniDPlot, we have identified a pseudogene, which was not detected by classical methods. UniDPlot can be used to search for missing genes, or motifs of various sizes in different genomic contexts.

 

Key words: DNA sequence similarity, UniDimensional plot (UniDPlot) software, genome.