Search for DNA sequence similarity is a crucial step in many evolutionary analyses and several bioinformatic tools are available to fulfill this task. Basic local alignment search tool (BLAST) is the most commonly and highly efficient algorithm used. However, it often fails in identifying sequences showing very weak similarity. An alternative method is to use Dot Plot, but such a graphical method is not suitable for the analysis of large sequences (e.g. hundreds of kilobases) as this is now more often required in the context of genome sequencing programs. As an alternative to the classical Dot Plot method, we designed UniDPlot, which permits to search for weak similarity either between two large sequences (e.g., genome regions, ...) or between one large sequence and a short one (e.g., exons, …). UniDPlot methodology contracts the output of the Dot Plot similarity matrix along the length of the largest sequence, while defining statistical limits of significance using a bootstrap procedure. To illustrate the efficiency of this method, we used UniDPlot to search for the fate of the gene that encodes the major enamel protein, amelogenin, in chicken. Although we showed that amelogenin was invalidated through a pseudogeneization process, we recovered the entire sequence in the chicken genome. Using UniDPlot, we have identified a pseudogene, which was not detected by classical methods. UniDPlot can be used to search for missing genes, or motifs of various sizes in different genomic contexts.
Key words: DNA sequence similarity, UniDimensional plot (UniDPlot) software, genome.
Copyright © 2021 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0