Search engine coverage of South African and Afrikaans websites

  • Johan Breytenbach Departement Rekenaarwetenskap en Informatika, Universiteit van die Vrystaat, Bloemfontein
  • Theo Macdonald Departement Rekenaarwetenskap en Informatika, Universiteit van die Vrystaat, Bloemfontein
Keywords: Internet, soekenjin, websoektog, dekkingsydigheid, taalsydigheid, sigbaarheid van webtuiste onwikkelende lande, Afrikaans

Abstract

Search engines are web-based systems used for retrieving information from the Internet. They have economic power because of their positioning between information providers and information seekers. Search engines can influence the flow of information – possible business transactions – by the way information is indexed, stored, and portrayed as search results. If a search engine provides good coverage of website content from one group of information providers (grouped by country or language) to the detriment of another group, it will have economic implications for both groups. It is known that certain developed countries and the language(s) of these countries, have better coverage than other developed countries and their languages. This study investigates for the first time the website coverage of a developing country, South Africa, and one of its indigenous languages, Afrikaans.

What does the existence of search engine country bias and/or linguistic bias imply for developing countries such as South Africa? South African information providers would have reason for concern if information seekers’ attention were continually routed abroad by biased search engines. South African information seekers would also be done an injustice if they cannot find local web content (or content in their indigenous languages) due to poor search engine coverage. Biased search results guide them away from cheaper, more convenient local information due to poor coverage of local content. Search engines are negatively impacted in turn, when users become tired of the poor local coverage and unwanted international search results and turn to other tools, such as local search engines, for information retrieval.

How severe are the effects of search engine bias on developing countries such as South Africa? The body of knowledge discussing search engine bias is very limited, and this study is motivated by stating that, given the possibility of negative economic implications of such bias for developing economies, more research on this topic is justified and urgently needed.  The study revealed that Western website content enjoys better coverage than South African website content. After further investigation it was also found that English website content enjoys better coverage than website content in Afrikaans. There is, therefore, a proven search engine bias in favour of Western developed countries and the English language.  Website visibility is also studied as a possible cause of search engine bias. It would seem plausible that a relationship may exist between the coverage of websites by search engines and how visible these websites are to a search engine’s crawlers. For the determination of website visibility, the number of in links towards each sample domain was determined. This study shows that the higher visibility of websites from developed countries is a cause of search engine bias in favour of these websites.

With website visibility proven as a cause of bias, the study indicates that South Africa, and possibly other developing countries, is lagging far behind in the race to create highly visible websites  surrounded by well-covered hyperlink structures – the kind of websites most likely to be covered by search engine crawlers. It is the responsibilty of information providers from developing countries to create more hyperlinks between websites from their countries, as well as creating visible website content in their indigenous languages.

Search engine coverage bias has negative economic implications for developing countries such as South Africa. This paper investigates the severity of country related coverage bias against websites from the South African domain(s) and the correlation between website coverage and website visibility. Other possible causes of coverage bias found in literature include indexing algorithms, ranking algorithms and lexicons struggling with non-English content. Information providers’ lack of knowledge about website coverage and search engine tools is discussed as another possible cause of country bias.

Published
2010-01-13
Section
Original Research