Info      Company   Contact Us   Articles   Site Map   FAQs   Other Links  
Services      Consultancy   Site Promotion   Search Engines   Site Audits   Online Survey
  Articles and Papers   Search Engine Discussion Paper  

Boost
Your
Position

 

Discussion Paper - Search Engines as E-Commerce Support Tools.

Date: June 2000

The different Types of Search Engine
Search and Navigation Applications
Marketing Applications
Revenue Generation
Features and Effectiveness of Search Engines
References
Appendix A

Search engines may be viewed as both catalysts and major switch-points for WWW traffic. They can accelerate the process of finding information, and channel users directly to specific URLs without having to 'drill down' through multiple categories or navigate many hyperlinks. They can also be frustratingly obtuse, and deliver huge lists of links to pages with irrelevant content. Successful use of search engines therefore requires a good understanding of their operation and their respective strengths and weaknesses for different tasks.

THE DIFFERENT TYPES OF SEARCH ENGINE  

Strictly speaking, search engines such as Google (www.google.com) and FAST (www.alltheweb.com) are automated software driven systems which are quite distinct from directories such as Yahoo! (www.yahoo.com) and Open Directory (http://dmoz.org). Search engines comprise three essential elements:

  1. a software spider or crawler, which traverses the WWW, and abstracts textual data, which is then stored in an
  2. index or database that acts as a central storehouse of all pages visited by the spider
  3. a software algorithm which is able to match WWW pages held in the database with specific keyword queries entered by a user, and list in order of relevancy. Algorithms are complex, and are constantly being finely tuned to try to improve relevancy and accuracy of results.

Directories on the other hand are compiled by people, and rely on a hierarchical structure to provide a convenient means of classifying and retrieving relevant pages according to easily understandable categories.

In practice there are many hybrids and other variations of the search and directory type processes, and the term search engine is now commonly used to encompass all varieties - as applied here, unless a specific distinction is to be made from directories, in which case the italicised form search engines will be used.

Hotbot (www.hotbot.com) for example combines results from the Inktomi search engine (www.inktomi.com), the Open Directory index, and Direct Hit (www.directhit.com). Alta Vista (www.altavista.com , and ~.co.uk) makes use of a number of other services, including Ask Jeeves (www.askjeeves.com), RealNames (www.realnames.com), and Open Directory as well as its own web crawler. In addition, 'meta-crawlers' enable users to compile results from a number of engines together - both to save time and obtain more comprehensive coverage of the WWW. Meta-crawler searches are available directly online (e.g. Dogpile, www.dogpile.com and The Big Hub, www.thebighub.com) or can be prepared offline and then run from the desktop or workstation (e.g. Copernic, www.copernic.com and WebFerret, www.ferretsoft.com/netferret/).

Some search engines, such as Go/Infoseek (www.go.com) and the first wave of established engines (e.g. Alta Vista www.altavista.com, Excite www.excite.com, Yahoo!) as noted by Green (5), have transformed themselves into portals, in an attempt to become destinations in their own right (rather than transition points) and provide extra services and commercial links to generate more revenue. Banga and Cross (1) show examples of the different types of search engine available, and comprehensive lists and descriptions are given by Notess (12), and Sullivan (14).

Interestingly there has been a reversal recently from the portal-centric trend espoused by the early search engines, to a more dedicated search facility, stripped of advertising and other cross selling. The new Raging Search (www.raging.com) from AltaVista, and Google are good examples - ideal for pure (or power) searchers who do not want any additional frills or distractions. The benefits of search engines as e-commerce support tools can be categorised into three main types:

  • Search and Navigation
  • Marketing and Promotion
  • Revenue Generation

SEARCH AND NAVIGATION APPLICATIONS  

The most obvious use of search engines - a means of finding information - can be broken down further into:

  • Knowledge discovery - exploration of the unknown
  • Information retrieval - recovering information that is known to exist
  • Direct path navigation - eliminating intermediate hyperlinks.

Search engines play a vital role in locating documents on the WWW, helping consumers, businesses and academic researchers to find information that would be difficult or impossible to find by other means. With well over 800 million publicly available pages (Lawrence and Giles - 11) - and rapidly growing - the WWW represents a huge, albeit chaotically organised resource. If handled correctly, search engines can be used to locate valuable information, conferring significant competitive and economic benefits for a business. For example, a company may use search engines to:

  • discover improvements in technology / production methods / processes which help to reduce costs / improve a product or servicefind new potential customers
  • reduce purchase costs by finding cheaper / better quality suppliers
  • expand geographical reach - from 'local' territory, to global, for both customers and suppliers.

Knowledge discovery can be conducted in a very speculative manner - searching for information which may not be known to exist, and making discoveries in a somewhat serendipitous manner. Successful searching however requires an understanding of the different engines available - their subject specialities and operational features, index size and freshness, quality of matching and ranking algorithms, and speed and presentation of results.

HitsToSales (9), provide a comprehensive list of search engines (over 1,000), and many reviews and popularity rankings of the major engines have been conducted by Notess (12), and Sullivan (14). Searches must be clearly defined, and keywords or phrases used to target the required information as precisely as possible. Some engines such as Alta Vista offer a high degree of search refinement with the use of Boolean operators, enabling many unwanted references to be excluded (e.g. stars +astronomy -film -theatre -stage to find only the astronomical type of star and eliminate 'film stars').

Search engines also provide a short cut to a destination, and are therefore valuable navigation features in their own right on many large web sites. They offer an alternative and much faster route to a specified document, eliminating the need to negotiate multiple links through a hierarchic catalogue structure for example (e.g. in Amazon.com). This may be done on a global basis (all the WWW), or just locally (within a given web site or intranet). Providing a local search facility can benefit a business web site by improving the site's usability, thereby increasing the chances of a positive interaction with users.

MARKETING APPLICATIONS  

Search engines are a two way channel. As well as directing users to documents relevant to specified keywords, they can also be seen as a means of funnelling prospects to a target company's web site. Thus a company which 'seeds' the search engines with the right information, can gain a competitive advantage over other businesses which rank lower in the list of search results for the same set of user entered keywords.

The 9th GVU survey 7 (see Fig. 1) indicated that 85% of users find new web sites via a search engine, and it is commonly observed that people rarely look beyond the first 30 links in a list of matching results.

Fig.1 (from Georgia Tech. GVU Survey, December 1998)

Search engines are thus an important marketing opportunity, and the top positions for popular keyword combinations are fiercely contested. Achieving a high search engine ranking can generate a significant volume of traffic for a web site, and as this can be done with very little capital outlay, search engines can yield the highest ROIs (return on investment) compared with other promotional methods.

In the absence of geographical landmarks, search engines perform a valuable function in locating businesses on the WWW and in alleviating many users' fears of being lost in cyberspace. Search engines are one of a number of methods to 'put a business on the map'.

Search engines attempt to present the most relevant matches to a given query at the top of the list. Although searches frequently result in an overwhelming number of results, the most useful links are usually near the top. Simply registering with the engines is thus rarely enough to generate traffic. Rather, a knowledge of the different engines ranking criteria is necessary, to achieve good positions - at least in the top 30 to be effective. Some of the more popular methods to improve position are briefly described in Appendix A. A more extensive discussion of these issues is given by Sullivan (14).

In addition to the largely free marketing opportunities that search engines provide, they can also be used as a more traditional advertising vehicle to display banner or text ads on a paid for basis. Ads can be triggered by keywords (typically $50 CPM) or delivered randomly. An initiative by RealNames which is partnering with many of the major engines (including AltaVista, Google, Go, and MSN Search) allows brand names to be purchased and given precedence on relevant search enquiries. Thus for a minimum entry fee of $100, a company can have its own URL prominently featured in the results list whenever its designated brand name or word is entered in a search box.

It is important to realise that there is a very murky and volatile distinction between aggressive optimisation techniques, and what is sometimes called search engine 'spoofing' or 'spamdexing'. The dynamics between optimisers trying to improve their web site positioning, and search engines trying to deliver more relevant and accurate results means that generally accepted codes of practice are constantly changing. Very aggressive techniques (e.g. use of invisible or camouflaged text, excessive repetition of keywords, serving different pages to spiders etc.) may result in a site being de-listed entirely, and are ultimately counterproductive - on commercial, social and professional grounds.

Search engines can be a powerful marketing tool, as evidenced by numerous success stories - e.g. Hardaway (8) reports 75% customer growth through search engines alone - but they are best used as part of a coherent and balanced marketing plan - not in isolation.

REVENUE GENERATION  

Search engines are an economic force in their own right, the plethora of different engines available is testament to their attractiveness as a business model. Search engines can deliver an income stream from:

  • Banner advertising (on fixed rate or CPM basis)
  • Text advertising (e.g. Google)
  • Keyword sales (e.g. RealNames)
  • Commercial registration (e.g. $199 for LookSmart; GoTo)
  • Selling search services to other sites/intranets (e.g. $1,999 per month for Google's Gold service)
  • Affiliate programs
  • Operation as a Portal, earning revenue from screen space and/or percentage of transactions made by associate businesses.

As search engines and their portal sites can attract millions of visitors daily, they occupy a prime volume of cyberspace and can command high fees for advertising and associated marketing activities. In 1999 for example, according to Junnarkar (10), CDNow paid Lycos $18.5m for a 3 year deal, Ameritrade paid $25m for 2 years with AOL, and First USA announced a $500m arrangement with AOL to be the exclusive credit card marketeer. A search engine which can earn income directly in these various ways, is perhaps the ultimate business support tool - but the entry costs are very high, and available to only relatively few. Search engines often have academic origins.

FEATURES AND EFFECTIVENESS OF SEARCH ENGINES  

For most people a search engine is a facility to be used, either for finding information, or as a marketing tool. The effectiveness of a search engine depends very much on the nature of the search - the subject matter, and complexity of query - and of course on the characteristics of the engine itself. A detailed review of such features is beyond the scope of this paper, but users should be aware at least of the following as important criteria of performance:

  • Index size
  • Depth of coverage
  • Search refinement options
  • Search methodology
  • Ranking methodology
  • Commercial bias
  • Geographical bias
  • Subject specialisations

There has been much debate recently about the reported sizes of search engines and how much of the WWW they are able to cover. Lawrence and Giles (11), reported that in early 1999 the largest search engine was only covering about 16% of publicly available web pages, and eleven of the largest engines combined only indexed about 40% of the WWW. (Note: these statistics are highly qualified, and are at best rough estimates). They found these engines had failed to keep up with the dramatic growth of the WWW over the previous 18 months, and that there tended to be a bias in their page inventories towards US (rather than non US) sites, and towards commercial rather than educational sites.

Since the Lawrence and Giles report, a flurry of competitive activity amongst the major engines has significantly changed the position, with significant size increases in the indexed databases. Only self-reported data from the engines is currently available, and this is summarised below (based on an estimated 1 billion publicly available pages on the WWW).

Fig. 2 (from Search Engine Watch www.searchenginewatch.com). [Inktomi, FAST, AltaVista, Northern Light, Excite, Google, Go, Lycos]                                                                      

In spite of this growth, there is still little overlap in the major engines' search results, so choice of engine is still critical. For example Notess 12, found in his tests with 14 engines, that 36% of results were found exclusively by one engine, while another 27% were only found by another two.

The major battle amongst search engines at present appears to be finding the right balance between coverage of the WWW (i.e. size of inventory) and delivering search results with a high degree of relevancy. There is a tendency for the smaller engines to score well on relevancy, whilst the larger ones provide good coverage but unreliably ranked results. Thus searches for obscure material are best conducted with at least several of the larger engines together to try and achieve sufficient coverage, while more popular/commonly available material can be found quickly with a small engine.

The algorithms that search engines use to match and rank results are also critical, and are constantly being refined to try and improve relevancy. At present most engines use a brute force word based index, which is insensitive to the context and meaning of neighbouring material, and may be even further subverted by web authors spamming. Search results often miss important sources altogether, but include a mass of unrelated junk material. To a large extent this is due to the problems of synonymy and polysemy of language and the inability of search engines to infer the real intentions of search queries. Many people have difficulty using search engines (70% of users are dissatisfied; less than 6% manage to use Boolean operators to refine searches -Tanaka (15)), so there is much work ahead to improve the human/search engine interface.

This is perhaps why the human selected directory system of Yahoo! and others is so successful as it enforces a more human oriented structure on a database, and so scores more highly on relevancy. Against this must be levelled the fact that human selectors cannot maintain pace with the growth of the WWW, and the inventory of directories is invariably smaller than that of search engines.

Ask Jeeves attempts to foster a user friendly approach with its natural language style of entry, but the mechanics underlying the response are still relatively crude, though results are drawn from a directory.

Direct Hit tries to improve the relevancy of results by applying a 'popularity' weighting based on measurements of frequency and length of visits by users to pages listed in previous searches. In this way the most popular pages rise to the top of the search results lists.

Google is a relatively new search engine with a successful strategy of applying a link based ranking measure to improve relevancy - on the assumption that a page with more external links to it will be a higher quality and more authoritative site. This has further benefits as Google's core inventory of 200 million pages can be expanded to 350 million pages if necessary by retrieving the externally linked pages (see Fig. 2).

Other developments are afoot to improve search engine performance, for example the Clever project (3), which similar to Google, makes use of hyperlinks as a measure of quality. Clever goes further though in identifying expert sites (high on authority) and hub sites (with many links), and applies an iterative routine to assign scores to these vectors, and so arrive at a more qualified evaluation of the importance of a site for a particular subject (similar to citation analysis), rather than rely on just the total number of links.

In the future, search engines may well implement elements from semantic network theory to cure synonymy errors, make more use of Bayesian probability theory (as the new Kenjin engine does rather erratically now, www.kenjin.com) and apply a better understanding of the morphology of the WWW (e.g. to focus on the principle central CORE and IN and OUT structures identified by Border et al (2).

Search engines are indeed powerful and multi-faceted tools that online businesses should exploit to the full - but like any tool, successful handling requires a thorough knowledge of their operation and weaknesses, as well as their strong points.

REFERENCES - SEARCH ENGINES  

  1. Banga K., Cross S., 2ECOM07 Support Tools/Search Engine Presentation, M.Sc. course, 12 May 2000.
  2. Broder A., Kumar R., Maghoul F., Raghavan P., Rajagopalan S., Stata R., Tomkins A., Wiener J., Graph Structure in the Web, 2000. http://www.almaden.ibm.com/cs/k53/www9.final
  3. Chakrabarti S., Dom B., Kumar S.R., Raghavan P., Rajagopalan S., Tomkins A. Hypersearching the Web, Scientific American, June 1999, http://www.sciam.com/1999/0699issue/0699raghavan.html
  4. Charlton E., Web Visibility: Strategies for Standing out from the Crowd, Financial Times, 10/02/2000, page 16.
  5. Green D., Search Insider: David Green considers the search engine' claims about web coverage, Information World Review, October 1999, pp 31-32 .
  6. Greenberg I., Garber L., Searching for New Search Technologies, IEEE Computer magazine, August 1999, pp 4-11.
  7. GVU Survey Results - www.searchenginewatch.com/reports/gvu.html
  8. Hardaway, The Customer is Always Right, www.bcentral.com/success/hardaway.html
  9. HitsToSales - A list of search engines (approx. 1,000) - www.hitstosales.com/searchlinks.html
  10. Junnarkar S., Hu J., How Prime is Portal Real Estate?, CNet News, 20 May 1999, http://news.cnet.com/news/0-1007-202-342577.html
  11. Lawrence S., Giles L., Accessibility of Information on the Web, Nature, Vol. 400, June 1999, pp 107-109.
  12. Notess G., Search Engine Showdown, www.searchengineshowdown.com/
  13. Rosenfield L., Morville M., Infromation Architecture for the World Wide Web, chapter 6, O'Reilly and Associates, 1998.
  14. Sullivan D., Search Engine Watch. Many articles, reviews, technical information and reports on search engines: www.searchenginewatch.com
  15. Tanaka J., The Perfect Search, Newsweek. 27 September 1999.
  16. Thompson B., I'm feeling lucky, (interview with Sergey Brin, president and co-founder of Google), Internet Magazine, February 2000, pp 34-38.

APPENDIX A  

Search Engine Positioning Tactics

  • Register with relevant sear engines - this is best done manually, but automated software submission may also be used (e.g. WebPosition Gold, PositionAgent). Although engines should spider all sites eventually, this is often a very hit or miss affair, so formal notification is necessary, and certainly ensures a much faster entry into engines' indices. Many engines are charging a fee for accelerated entry of new sites (e.g. Yahoo!), and Looksmart (www.looksmart.com) is the first to withdraw free submissions and insist on minimum payment of $199 for every new commercial registration.
  • Keyword Placement - keywords or phrases that users are expected to enter into search engines when seeking the products/services on offer by the target web site, must be appropriately seeded in the relevant pages of the site. Optimisation is a black art that requires keywords be used in the right locations (e.g. Titles, Headers, body copy, URLs, anchor text, etc.) with the right frequency (e.g. not more than 7 times per page), and density (relative to the total volume of words on a page).
  • Meta Tags - Not all search engines recognise meta tags, but some of the most important, such as Alta Vista and Hotbot do. Less than 50% of commercial organisations use meta tags correctly, so a business that does implement these tags properly will benefit. Title tags and Keyword meta tags may be used to assist in the ranking process by some engines, but they must relate to the page contents. The Description meta tag plays no part in ranking (therefore does not require keyword 'stuffing') but is often used in search results lists, so should be made as explicit and appealing as possible.
  • Link Popularity - used by many engines (especially Google) as an important criterion for relevance. Sites with very few external links pointing to them may find it very difficult to achieve a high ranking. Sites should therefore maximise links to them by swapping links with non-competitive, ideally complementary businesses, or join link exchange schemes.
  • Doorway Pages - optimised for search engines (to obtain good positions for specific keywords with a given ranking algorithm) and intended to act as a gateway between search engine and target web page. Searchers might thus click on a link to a doorway page from a list of search results, then follow a link from the doorway to the final destination. Forwarding visitors to a doorway page to the main web site can be automated with the meta-refresh tag, but this is now viewed by some engines as 'spoofing' and is therefore discounted.
    © 2002 Search Engine Booster        Contact: info@search-engine-booster.com