A hybrid multilevel text extraction algorithm in 
scene images

Tahani Khatib; Huda Karajeh; Hiba Mohammad; Lama Rajab

doi:10.5897/SRE2014.6146

Scientific Research and Essays

Abbreviation: Sci. Res. Essays
Language: English
ISSN: 1992-2248
DOI: 10.5897/SRE
Start Year: 2006
Published Articles: 2768

Full Length Research Paper

A hybrid multilevel text extraction algorithm in scene images

Tahani Khatib

Tahani Khatib
Department of Computer Information Systems, King Abdullah II School for Information Technology, University of Jordan, 11942 Amman, Jordan.
Search for this author on:
Google Scholar

Huda Karajeh

Huda Karajeh
Department of Computer Information Systems, King Abdullah II School for Information Technology, University of Jordan, 11942 Amman, Jordan.
Search for this author on:
Google Scholar

Hiba Mohammad

Hiba Mohammad
Department of Computer Information Systems, King Abdullah II School for Information Technology, University of Jordan, 11942 Amman, Jordan.
Search for this author on:
Google Scholar

Lama Rajab

Lama Rajab
Department of Computer Information Systems, King Abdullah II School for Information Technology, University of Jordan, 11942 Amman, Jordan.
Search for this author on:
Google Scholar

Article Number - 14908D250363
Vol.10(3), pp. 105-113 , February 2015
https://doi.org/10.5897/SRE2014.6146

Received: 15 December 2014
Accepted: 22 January 2015
Published: 15 February 2015

Abstract

The textual pieces in scene images might often provide vital semantic data for visual content understanding, indexing and analysis; as a result, text extraction had become a significant research area in image processing and computer vision. In this paper, we propose a new hybrid multilevel algorithm to extract text in various scene images. The algorithm converts the Red – Green –Blue (RGB) image into grayscale for color reduction. Next, it applies edge detection and mathematical morphological operations to extract edges in the image preprocessing phase. The resultant binary image passes through three subsequent levels in a multi layer behavior. Connected components labeling and text candidates' selection take place in each level through different criteria analysis. We used the structural features of connected components as basis criteria for selecting candidate texts, those features include: area, width, length and condense intensity mean of connected components. Afterwards, Horizontal projection profile analysis is used to further refine the candidate text areas and to eliminate non-text regions. The proposed algorithm is evaluated on a set of fifty images chosen from a well known text locating test dataset: KAIST. Extensive experiments show high robustness under different environments such as indoor, outdoor, shadow, night and light, and for different text properties such as various font size, style and complexities of backgrounds and textures. The algorithm effectively extracts textual contents from scenes images with high average of Precision, Recall, and F-Score which are 90.1, 99, and 94.3%, respectively.

Key words: Multilevel text extraction, hybrid text extraction, edge detection, connected components, text candidates, morphological operations, horizontal projection profile.

This article is published under the terms of the Creative Commons Attribution License 4.0

Back to Vol. 10 No. 3

Back to articles

Views: 0
Downloads: 0

Related Articles:
On Google
On Google Scholar