In order to reduce the amount of information when querying from large databases, one has to develop new approaches. We present here a new way to query our SQUAT database. SQUAT contains formal concepts representing an association between a number of genes that are simultaneously over expressed and the biological situations in which those genes are over expressed. We explored the relevance of querying “self-explaining” formal concepts obeying a double constraint: (1) The concept should contain, within the genes of the concepts, at least one transcription factor (TF), and (2) At least one gene in the concept, should contain in its promoter a transcription factor binding site (TFBS) for the identified TF. The present work demonstrated that: (1) there are such “self-explaining” formal concepts in SQUAT. (2) Mining only those “self-explaining” formal concepts severely reduces the number of concepts that have to be analyzed. (3) Two such “self-explaining” concepts have been further analyzed, and their biological relevance has been demonstrated.
Key words: Data mining, gene expression, large database, formal concepts.
Copyright © 2021 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0