Discussion Paper - Search Engines as E-Commerce Support Tools.
Date: June 2000
The different Types of Search Engine
Search and Navigation Applications
Marketing Applications
Revenue Generation
Features and Effectiveness of Search Engines
References
Appendix A
Search engines may be viewed as both catalysts and major switch-points
for WWW traffic. They can accelerate the process of finding information,
and channel users directly to specific URLs without having to 'drill
down' through multiple categories or navigate many hyperlinks. They
can also be frustratingly obtuse, and deliver huge lists of links
to pages with irrelevant content. Successful use of search engines
therefore requires a good understanding of their operation and their
respective strengths and weaknesses for different tasks.
THE DIFFERENT TYPES OF SEARCH ENGINE 
Strictly speaking, search engines such as Google (www.google.com)
and FAST (www.alltheweb.com)
are automated software driven systems which are quite distinct from
directories such as Yahoo! (www.yahoo.com)
and Open Directory (http://dmoz.org).
Search engines comprise three essential elements:
- a software spider or crawler, which traverses the WWW, and abstracts
textual data, which is then stored in an
- index or database that acts as a central storehouse of all pages
visited by the spider
- a software algorithm which is able to match WWW pages held in
the database with specific keyword queries entered by a user,
and list in order of relevancy. Algorithms are complex, and are
constantly being finely tuned to try to improve relevancy and
accuracy of results.
Directories on the other hand are compiled by people, and
rely on a hierarchical structure to provide a convenient means of
classifying and retrieving relevant pages according to easily understandable
categories.
In practice there are many hybrids and other variations of the
search and directory type processes, and the term search engine
is now commonly used to encompass all varieties - as applied here,
unless a specific distinction is to be made from directories, in
which case the italicised form search engines will be used.
Hotbot (www.hotbot.com) for
example combines results from the Inktomi search engine (www.inktomi.com),
the Open Directory index, and Direct Hit (www.directhit.com).
Alta Vista (www.altavista.com
, and ~.co.uk) makes use of a number of other services, including
Ask Jeeves (www.askjeeves.com),
RealNames (www.realnames.com),
and Open Directory as well as its own web crawler. In addition,
'meta-crawlers' enable users to compile results from a number of
engines together - both to save time and obtain more comprehensive
coverage of the WWW. Meta-crawler searches are available directly
online (e.g. Dogpile, www.dogpile.com
and The Big Hub, www.thebighub.com)
or can be prepared offline and then run from the desktop or workstation
(e.g. Copernic, www.copernic.com
and WebFerret, www.ferretsoft.com/netferret/).
Some search engines, such as Go/Infoseek (www.go.com)
and the first wave of established engines (e.g. Alta Vista www.altavista.com,
Excite www.excite.com, Yahoo!)
as noted by Green (5), have transformed themselves into portals,
in an attempt to become destinations in their own right (rather
than transition points) and provide extra services and commercial
links to generate more revenue. Banga and Cross (1) show
examples of the different types of search engine available, and
comprehensive lists and descriptions are given by Notess (12),
and Sullivan (14).
Interestingly there has been a reversal recently from the portal-centric
trend espoused by the early search engines, to a more dedicated
search facility, stripped of advertising and other cross selling.
The new Raging Search (www.raging.com)
from AltaVista, and Google are good examples - ideal for pure (or
power) searchers who do not want any additional frills or distractions.
The benefits of search engines as e-commerce support tools can be
categorised into three main types:
- Search and Navigation
- Marketing and Promotion
- Revenue Generation
SEARCH AND NAVIGATION APPLICATIONS
The most obvious use of search engines - a means of finding information
- can be broken down further into:
- Knowledge discovery - exploration of the unknown
- Information retrieval - recovering information that is known
to exist
- Direct path navigation - eliminating intermediate hyperlinks.
Search engines play a vital role in locating documents on the WWW,
helping consumers, businesses and academic researchers to find information
that would be difficult or impossible to find by other means. With
well over 800 million publicly available pages (Lawrence and
Giles - 11) - and rapidly growing - the WWW represents a huge,
albeit chaotically organised resource. If handled correctly, search
engines can be used to locate valuable information, conferring significant
competitive and economic benefits for a business. For example, a
company may use search engines to:
- discover improvements in technology / production methods / processes
which help to reduce costs / improve a product or servicefind
new potential customers
- reduce purchase costs by finding cheaper / better quality suppliers
- expand geographical reach - from 'local' territory, to global,
for both customers and suppliers.
Knowledge discovery can be conducted in a very speculative manner
- searching for information which may not be known to exist, and
making discoveries in a somewhat serendipitous manner. Successful
searching however requires an understanding of the different engines
available - their subject specialities and operational features,
index size and freshness, quality of matching and ranking algorithms,
and speed and presentation of results.
HitsToSales (9), provide a comprehensive list of search
engines (over 1,000), and many reviews and popularity rankings of
the major engines have been conducted by Notess (12), and
Sullivan (14). Searches must be clearly defined, and keywords
or phrases used to target the required information as precisely
as possible. Some engines such as Alta Vista offer a high degree
of search refinement with the use of Boolean operators, enabling
many unwanted references to be excluded (e.g. stars +astronomy -film
-theatre -stage to find only the astronomical type of star and eliminate
'film stars').
Search engines also provide a short cut to a destination, and are
therefore valuable navigation features in their own right on many
large web sites. They offer an alternative and much faster route
to a specified document, eliminating the need to negotiate multiple
links through a hierarchic catalogue structure for example (e.g.
in Amazon.com). This may be done on a global basis (all the WWW),
or just locally (within a given web site or intranet). Providing
a local search facility can benefit a business web site by improving
the site's usability, thereby increasing the chances of a positive
interaction with users.
MARKETING APPLICATIONS 
Search engines are a two way channel. As well as directing users
to documents relevant to specified keywords, they can also be seen
as a means of funnelling prospects to a target company's web site.
Thus a company which 'seeds' the search engines with the right information,
can gain a competitive advantage over other businesses which rank
lower in the list of search results for the same set of user entered
keywords.
The 9th GVU survey 7 (see Fig. 1) indicated that 85% of users find
new web sites via a search engine, and it is commonly observed that
people rarely look beyond the first 30 links in a list of matching
results.

Fig.1 (from Georgia Tech. GVU Survey, December 1998)
Search engines are thus an important marketing opportunity, and
the top positions for popular keyword combinations are fiercely
contested. Achieving a high search engine ranking can generate a
significant volume of traffic for a web site, and as this can be
done with very little capital outlay, search engines can yield the
highest ROIs (return on investment) compared with other promotional
methods.
In the absence of geographical landmarks, search engines perform
a valuable function in locating businesses on the WWW and in alleviating
many users' fears of being lost in cyberspace. Search engines are
one of a number of methods to 'put a business on the map'.
Search engines attempt to present the most relevant matches to
a given query at the top of the list. Although searches frequently
result in an overwhelming number of results, the most useful links
are usually near the top. Simply registering with the engines is
thus rarely enough to generate traffic. Rather, a knowledge of the
different engines ranking criteria is necessary, to achieve good
positions - at least in the top 30 to be effective. Some of the
more popular methods to improve position are briefly described in
Appendix A. A more extensive discussion of these issues is given
by Sullivan (14).
In addition to the largely free marketing opportunities that search
engines provide, they can also be used as a more traditional advertising
vehicle to display banner or text ads on a paid for basis. Ads can
be triggered by keywords (typically $50 CPM) or delivered randomly.
An initiative by RealNames which is partnering with many of the
major engines (including AltaVista, Google, Go, and MSN Search)
allows brand names to be purchased and given precedence on relevant
search enquiries. Thus for a minimum entry fee of $100, a company
can have its own URL prominently featured in the results list whenever
its designated brand name or word is entered in a search box.
It is important to realise that there is a very murky and volatile
distinction between aggressive optimisation techniques, and what
is sometimes called search engine 'spoofing' or 'spamdexing'. The
dynamics between optimisers trying to improve their web site positioning,
and search engines trying to deliver more relevant and accurate
results means that generally accepted codes of practice are constantly
changing. Very aggressive techniques (e.g. use of invisible or camouflaged
text, excessive repetition of keywords, serving different pages
to spiders etc.) may result in a site being de-listed entirely,
and are ultimately counterproductive - on commercial, social and
professional grounds.
Search engines can be a powerful marketing tool, as evidenced by
numerous success stories - e.g. Hardaway (8) reports 75%
customer growth through search engines alone - but they are best
used as part of a coherent and balanced marketing plan - not in
isolation.
REVENUE GENERATION 
Search engines are an economic force in their own right, the plethora
of different engines available is testament to their attractiveness
as a business model. Search engines can deliver an income stream
from:
- Banner advertising (on fixed rate or CPM basis)
- Text advertising (e.g. Google)
- Keyword sales (e.g. RealNames)
- Commercial registration (e.g. $199 for LookSmart; GoTo)
- Selling search services to other sites/intranets (e.g. $1,999
per month for Google's Gold service)
- Affiliate programs
- Operation as a Portal, earning revenue from screen space and/or
percentage of transactions made by associate businesses.
As search engines and their portal sites can attract millions of
visitors daily, they occupy a prime volume of cyberspace and can
command high fees for advertising and associated marketing activities.
In 1999 for example, according to Junnarkar (10), CDNow paid
Lycos $18.5m for a 3 year deal, Ameritrade paid $25m for 2 years
with AOL, and First USA announced a $500m arrangement with AOL to
be the exclusive credit card marketeer. A search engine which can
earn income directly in these various ways, is perhaps the ultimate
business support tool - but the entry costs are very high, and available
to only relatively few. Search engines often have academic origins.
FEATURES AND EFFECTIVENESS OF SEARCH ENGINES
For most people a search engine is a facility to be used, either
for finding information, or as a marketing tool. The effectiveness
of a search engine depends very much on the nature of the search
- the subject matter, and complexity of query - and of course on
the characteristics of the engine itself. A detailed review of such
features is beyond the scope of this paper, but users should be
aware at least of the following as important criteria of performance:
- Index size
- Depth of coverage
- Search refinement options
- Search methodology
- Ranking methodology
- Commercial bias
- Geographical bias
- Subject specialisations
There has been much debate recently about the reported sizes of
search engines and how much of the WWW they are able to cover. Lawrence
and Giles (11), reported that in early 1999 the largest search
engine was only covering about 16% of publicly available web pages,
and eleven of the largest engines combined only indexed about 40%
of the WWW. (Note: these statistics are highly qualified, and are
at best rough estimates). They found these engines had failed to
keep up with the dramatic growth of the WWW over the previous 18
months, and that there tended to be a bias in their page inventories
towards US (rather than non US) sites, and towards commercial rather
than educational sites.
Since the Lawrence and Giles report, a flurry of competitive activity
amongst the major engines has significantly changed the position,
with significant size increases in the indexed databases. Only self-reported
data from the engines is currently available, and this is summarised
below (based on an estimated 1 billion publicly available pages
on the WWW).

Fig. 2 (from Search Engine Watch www.searchenginewatch.com).
[Inktomi, FAST, AltaVista, Northern Light, Excite, Google, Go, Lycos]

In spite of this growth, there is still little overlap in the major
engines' search results, so choice of engine is still critical.
For example Notess 12, found in his tests with 14 engines, that
36% of results were found exclusively by one engine, while another
27% were only found by another two.
The major battle amongst search engines at present appears to be
finding the right balance between coverage of the WWW (i.e. size
of inventory) and delivering search results with a high degree of
relevancy. There is a tendency for the smaller engines to score
well on relevancy, whilst the larger ones provide good coverage
but unreliably ranked results. Thus searches for obscure material
are best conducted with at least several of the larger engines together
to try and achieve sufficient coverage, while more popular/commonly
available material can be found quickly with a small engine.
The algorithms that search engines use to match and rank
results are also critical, and are constantly being refined to try
and improve relevancy. At present most engines use a brute force
word based index, which is insensitive to the context and meaning
of neighbouring material, and may be even further subverted by web
authors spamming. Search results often miss important sources altogether,
but include a mass of unrelated junk material. To a large extent
this is due to the problems of synonymy and polysemy of language
and the inability of search engines to infer the real intentions
of search queries. Many people have difficulty using search engines
(70% of users are dissatisfied; less than 6% manage to use Boolean
operators to refine searches -Tanaka (15)), so there is much
work ahead to improve the human/search engine interface.
This is perhaps why the human selected directory system of Yahoo!
and others is so successful as it enforces a more human oriented
structure on a database, and so scores more highly on relevancy.
Against this must be levelled the fact that human selectors cannot
maintain pace with the growth of the WWW, and the inventory of directories
is invariably smaller than that of search engines.
Ask Jeeves attempts to foster a user friendly approach with its
natural language style of entry, but the mechanics underlying the
response are still relatively crude, though results are drawn from
a directory.
Direct Hit tries to improve the relevancy of results by applying
a 'popularity' weighting based on measurements of frequency and
length of visits by users to pages listed in previous searches.
In this way the most popular pages rise to the top of the search
results lists.
Google is a relatively new search engine with a successful strategy
of applying a link based ranking measure to improve relevancy -
on the assumption that a page with more external links to it will
be a higher quality and more authoritative site. This has further
benefits as Google's core inventory of 200 million pages can be
expanded to 350 million pages if necessary by retrieving the externally
linked pages (see Fig. 2).
Other developments are afoot to improve search engine performance,
for example the Clever project (3), which similar to Google,
makes use of hyperlinks as a measure of quality. Clever goes further
though in identifying expert sites (high on authority) and
hub sites (with many links), and applies an iterative routine
to assign scores to these vectors, and so arrive at a more qualified
evaluation of the importance of a site for a particular subject
(similar to citation analysis), rather than rely on just the total
number of links.
In the future, search engines may well implement elements from
semantic network theory to cure synonymy errors, make more use of
Bayesian probability theory (as the new Kenjin engine does rather
erratically now, www.kenjin.com)
and apply a better understanding of the morphology of the WWW (e.g.
to focus on the principle central CORE and IN and
OUT structures identified by Border et al (2).
Search engines are indeed powerful and multi-faceted tools that
online businesses should exploit to the full - but like any tool,
successful handling requires a thorough knowledge of their operation
and weaknesses, as well as their strong points.
REFERENCES - SEARCH ENGINES 
- Banga K., Cross S., 2ECOM07 Support Tools/Search Engine Presentation,
M.Sc. course, 12 May 2000.
- Broder A., Kumar R., Maghoul F., Raghavan P., Rajagopalan S.,
Stata R., Tomkins A., Wiener J., Graph Structure in the Web, 2000.
http://www.almaden.ibm.com/cs/k53/www9.final
- Chakrabarti S., Dom B., Kumar S.R., Raghavan P., Rajagopalan
S., Tomkins A. Hypersearching the Web, Scientific American, June
1999, http://www.sciam.com/1999/0699issue/0699raghavan.html
- Charlton E., Web Visibility: Strategies for Standing out from
the Crowd, Financial Times, 10/02/2000, page 16.
- Green D., Search Insider: David Green considers the search engine'
claims about web coverage, Information World Review, October 1999,
pp 31-32 .
- Greenberg I., Garber L., Searching for New Search Technologies,
IEEE Computer magazine, August 1999, pp 4-11.
- GVU Survey Results - www.searchenginewatch.com/reports/gvu.html
- Hardaway, The Customer is Always Right, www.bcentral.com/success/hardaway.html
- HitsToSales - A list of search engines (approx. 1,000) - www.hitstosales.com/searchlinks.html
- Junnarkar S., Hu J., How Prime is Portal Real Estate?, CNet
News, 20 May 1999, http://news.cnet.com/news/0-1007-202-342577.html
- Lawrence S., Giles L., Accessibility of Information on the
Web, Nature, Vol. 400, June 1999, pp 107-109.
- Notess G., Search Engine Showdown, www.searchengineshowdown.com/
- Rosenfield L., Morville M., Infromation Architecture for the
World Wide Web, chapter 6, O'Reilly and Associates, 1998.
- Sullivan D., Search Engine Watch. Many articles, reviews, technical
information and reports on search engines: www.searchenginewatch.com
- Tanaka J., The Perfect Search, Newsweek. 27 September 1999.
- Thompson B., I'm feeling lucky, (interview with Sergey Brin,
president and co-founder of Google), Internet Magazine, February
2000, pp 34-38.
APPENDIX A 
Search Engine Positioning Tactics
- Register with relevant sear engines -
this is best done manually, but automated software submission
may also be used (e.g. WebPosition Gold, PositionAgent). Although
engines should spider all sites eventually, this is often a very
hit or miss affair, so formal notification is necessary, and certainly
ensures a much faster entry into engines' indices. Many engines
are charging a fee for accelerated entry of new sites (e.g. Yahoo!),
and Looksmart (www.looksmart.com)
is the first to withdraw free submissions and insist on minimum
payment of $199 for every new commercial registration.
- Keyword Placement - keywords or phrases
that users are expected to enter into search engines when seeking
the products/services on offer by the target web site, must be
appropriately seeded in the relevant pages of the site. Optimisation
is a black art that requires keywords be used in the right locations
(e.g. Titles, Headers, body copy, URLs, anchor text, etc.) with
the right frequency (e.g. not more than 7 times per page), and
density (relative to the total volume of words on a page).
- Meta Tags - Not all search engines recognise
meta tags, but some of the most important, such as Alta Vista
and Hotbot do. Less than 50% of commercial organisations use meta
tags correctly, so a business that does implement these tags properly
will benefit. Title tags and Keyword meta tags may be used to
assist in the ranking process by some engines, but they must relate
to the page contents. The Description meta tag plays no part in
ranking (therefore does not require keyword 'stuffing') but is
often used in search results lists, so should be made as explicit
and appealing as possible.
- Link Popularity - used by many engines
(especially Google) as an important criterion for relevance. Sites
with very few external links pointing to them may find it very
difficult to achieve a high ranking. Sites should therefore maximise
links to them by swapping links with non-competitive, ideally
complementary businesses, or join link exchange schemes.
- Doorway Pages - optimised for search engines
(to obtain good positions for specific keywords with a given ranking
algorithm) and intended to act as a gateway between search engine
and target web page. Searchers might thus click on a link to a
doorway page from a list of search results, then follow a link
from the doorway to the final destination. Forwarding visitors
to a doorway page to the main web site can be automated with the
meta-refresh tag, but this is now viewed by some engines as 'spoofing'
and is therefore discounted.
|