Estimating the size of Arabic indexed web content

Abdulrahman Alarifi; Mansour Alghamdi; Mohammad Zarour; Batoul Aloqail; Heelah Alraqibah; Kholood Alsadhan; Lamia Alkwai

doi:10.5897/SRE11.1708

Scientific Research and Essays

Abbreviation: Sci. Res. Essays
Language: English
ISSN: 1992-2248
DOI: 10.5897/SRE
Start Year: 2006
Published Articles: 2768

Full Length Research Paper

Estimating the size of Arabic indexed web content

Abdulrahman Alarifi*, Mansour Alghamdi, Mohammad Zarour, Batoul Aloqail, Heelah Alraqibah, Kholood Alsadhan and Lamia Alkwai

Computer Research Institute, King Abdulaziz City for Science and Technology, P. O. Box 6086, Riyadh 11442, Riyadh, Saudi Arabia.
Email: [email protected]

Article Number - D973D1028832
Vol.7(28), pp. 2472-2483 , July 2012
https://doi.org/10.5897/SRE11.1708

Accepted: 07 June 2012
Published: 26 July 2012

Copyright © 2024 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0.

Abstract

Various initiatives designed to increase Arabic Web content have been undertaken in recent years, and now search engines are reporting that the Arabic portion of Web content has grown relative to the overall Web content. An accurate estimate of Arabic Web content is crucial for those interested in studying it and enriching it. In this paper, we propose a statistics-based system to estimate the size of Arabic indexed Web content using three popular search engines; Google, Yahoo and Bing. Our system relies on selecting sample words from an Arabic corpus to estimate the size of the Arabic Web content indexed by the search engines and the overlap among them. We have used Arabic Wikipedia as a corpus, as it provides diversified content accessed by a large number of Internet users. Our results show that, as of December 2010, the size of the Arabic indexed Web content was estimated at 2 to 2.1 billion pages.

Key words: World Wide Web, the Web, search engine, index size, Arabic content, Internet, corpus.

This article is published under the terms of the Creative Commons Attribution License 4.0

Back to Vol. 7 No. 28

Back to articles

Views: 0
Downloads: 0

Related Articles:
On Google
On Google Scholar

Articles on Google by: