The 21st century emergence of genomic medicine is shifting the paradigm in biomedical science from the population phenotype to the individual genotype. In characterizing the biology of disease and health disparities in population genetics, human populations are often defined by the most common alleles in the group. This definition poses difficulties when categorizing individuals in the population who do not have the most common allele(s). Various epidemiological studies have shown an association between common genomic variation, such as single nucleotide polymorphisms (SNPs), and common diseases. We hypothesize that information encoded in the structure of SNP haploblock variation in the human leukocyte antigen-disease related (HLA-DR) region of the genome illumines molecular pathways and cellular mechanisms involved in the regulation of host adaptation to the environment. In this paper we describe the development and application of the normalized information content (NIC) as a novel metric based on SNP haploblock variation. The NIC facilitates translation of biochemical DNA sequence variation into a biophysical quantity derived from Boltzmann’s canonical ensemble in statistical physics and used widely in information theory. Our normalization of this information metric allows for comparisons of unlike, or even unrelated, regions of the genome. We report here NIC values calculated for HLA-DR SNP haploblocks constructed by Haploview, a product of the International Haplotype Map Project. These haploblocks were scanned for potential regulatory elements using ConSite and miRBase, publicly available bioinformatics tools. We found that all of the haploblocks with statistically low NIC values contained putative transcription factor binding sites and microRNA motifs, suggesting correlation with genomic regulation. Thus, we were able to relate a mathematical measure of information content in HLA-DR SNP haploblocks to biologically relevant functional knowledge embedded in the structure of DNA sequence variation. We submit that NIC may be useful in analyzing the regulation of molecular pathways involved in host adaptation to environmental pathogens and in decoding the functional significance of common variation in the human genome.
Key words: Information theory, entropy, genomic variation, biological information.
Copyright © 2021 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0