





Search
Engines
| As of Spring 1998, six or seven search
engines dominate the lists in numbers of search requests they process. They have bots (from robots) to automatically,
systematically, and frequently scour the Web. The pages they encounter are stored in a database and indexed. An
article in the April 3, 1998, Science magazine
estimates that the searchable Web has 360 million pages and that none of the top six
covers much more than a third of them. Northern Light is the newcomer, and it charges for
any indepth searches; many surveys list Webcrawler instead among the top engines. |
To learn
everything you need to know about search engines, you should read all the material at Search Engine Watch. |
|
A recent survey by Relevant Knowledge confirms the general findings of the
Science article and raises these questions.
| The Questions |
The Answers |
| which 34% does Hotbot cover? |
who knows? |
| will running two different Hotbot
searches add up to 68%? |
no, considerably less |
| will running two different
searches on Excite add up to more than 14%? |
yes, depending on the bots'
frequency |
| will running the same search on
Hotbot and Exite add up to more than 34%? |
yes, but not much |
We do have a pretty good idea of
some of what's not covered: pages that do not have links to them from the home page of a
domain name or from any page linked to the home page. They are sitting on a server but are
available only if you type (or bookmark) the whole URL. If you don't quite understand what
I mean by that, rest assured the search engines are great if they give you what you want.
If they don't, they're only the beginning of your search.
| If you've been doing this Internet stuff
for a while, it probably won't surprise you to learn that common knowledge is often wrong.
For example, Yahoo is not a search engine. It's a
subject directory compiled by humans that covers maybe 1% of the possible sites. You can
browse through it, but you can't search it. You can, however, launch a search while you're
at Yahoo. |

Yahoo is not a search engine
|
|
It probably also won't surprise you to learn that
some "search engines" in fact are meta-engines. Most meta-engines are links to
send your search terms to one or more of the top six. MetaCrawler,
the one I use most, will collate the results, eliminate the duplicates, and present the
rest to you. For many reasons, if only their sense of humor, Dogpile is my favorite when I need to dig a little
deeper. Mother Load has a special search for corporate marketing sites. The whole process
happens faster than you could slide open two drawers of a library's card catalog. |

MetaSites
| The search engine sites basically take the
keywords you submit and compare them to the index of the database of pages their bots have
brought back. The actual engine, the bot, has already done its work and is out there doing
it again for the next update. The database is stored in many large hard drives, so it can
take a couple of moments to collect the results, rank them, and display them, often with
banner ads relevant to the topic. That is, if you search for Cancun, you're likely to see
a banner ad for an airline or Travelocity. How many matches did you get?
If you got too many or too few, you may
want to re-search by adding or omitting keywords. Go to one of the top six engines and try
three searches:
 |
red |
very general; note the number of matches |
 |
travel |
much more specific, but note the number of
matches |
 |
desalinization |
specialized, yet note the number of
matches |
Conclusion: Information
Overload. These search engine sites' databases are so huge that one-word
searches are not very helpful. Try two or three words at the same time. Note the
differences in search conventions: at one site AND links two
words; at another site, only + will do that job.
How are the matches
ranked?
Each engine does it a little differently.
Knowing how yours does it will help you choose your search terms. If exactly what you're
looking for is displayed on the first page, congratulations.
Beyond the search engines
The rest of the time, you're going to have
to develop more strategies. Many people at this point turn to specialized searches. Some
of them, you can pay for. Lexis-Nexis, in Dayton, Ohio, has a million and a half sbuscribers. Half of
them use the service each month, some of them extensively. What's there? Almost 1.5
billion documents. The visual
tour of their facility is most enlightening. Their data base is larger than the web
itself, and it's available over the Internet via the telnet protocol (telnet://) rather
than the hypertext protocol (http://), so it's not on the web. You will not always end up
with nicely formatted and illustrated .htm pages. But you will end up with a lot of
information.
As the Dialog
Corporation's home page says, "Quantity of data, by itself, is of little consequence.
The challenge is to help people isolate data of real value from an exponentially rising
tide of information.
Yes, they're expensive, but they may be
worth the money in time saved. Some of this information is copyright-protected and
available only from these sources.
The longest list of
specialized search engines that I know is Beaucoup,
where you'll find over a thousand in dozens of categories.
Let's say that you're designing a web for the
Kenmore Police. Your audience is Joe Smith, a Kenmore citizen who had a not-so-wonderful
contact with an officer and who gets on the Web to learn more about the department. One of
your objectives might be to give Joe the resources he needs to overcome his TV-induced
mirage about police officers and to learn how crimes are really solved.
To tailor that infoormation for Joe, you would scour the Web
for the best resources, link to them, and then provide some text explaining where the
links go and what you want Joe to get out of them, that is, the reason for his going
there.
You might go to a search engine and use the term
"forensic," a technical term you picked up and which Joe Smith might not know.
The results from the search engine won't end your search. They only begin it. |
| You would soon find your way to Zeno's Forensic Page. You would
find your way there because many other sites link to it and it claims to be the web's best
resource. Then you can continue your search because Zeno has done for you what you are
trying to do for Joe Smith. You may well visit the Forensic Science Society and their Forensic WebLinks Search,
which might get you to the Roanoke County, Virginia, Police Department or the American Society of Questioned Document Experts, which
would be way down any list of search engine responses to the term forensic. But
it might have exactly what you're looking for. How do you know? |
While you're
searching, don't forget to look at the pages for whether you can use the images or any
part of the pages (design, navigational devices, etc.) as a model for your site. |
|
You must keep asking yourself, What would help Joe Smith
understand how police really solve crimes?  I'm trying to take you beyond the search
engines to what are often called metasites. They have little content
themselves other than links to other sites -- which themselves may be lists of links. As
the Web grows, they are part of its maturation.
For example, I'll bet all of you can find something of
interest at Voice
of the Shuttle. As the work of one person, Alan Liu, VoS is highly selective, it has
several dead links, and it has especially useful annotations. A more impersonal but much
larger metasite is the World Wide Web Virtual Library. Note its
marketing
links page. (Your marketing links page, along with a sentence or two description,
counts as extra credit for MBA 604.)
Some of the links at these metasites may
well show up on a search engine's results page. But these sites themselves wouldn't show
up as the result of that same search. Think backwards: what would you search for
to get Voice of the Shuttle?
Part of your value as a professional researcher
will be your carefully tended, up-to-date list of links to the topics you specialize in.
You'll do for your professional topics what Zeno has done for forensic science students
and what this web is doing for Medaille marketing students. Whether you share that
information via a Web page is up to you. |
Wise MBAs -- whatever their
specialty or industry or job category or responsibility level -- will know how to search
the Web effectively. What are they going to do, ask for the morning off to drive down to
the Erie County Public Library?
Psst.... You at the card
catalog. Your co-workers back at the office are firing up Netscape. |
|
|

Tip
At a large site, look for a search option on the
home page. This course web has a search option, too. It's not a
search engine, the robot part. It's just the word-match part, so it's fast, it's accurate,
and it searches the full text of every page.

Don't Read
This ...
... if you're liable to get upset
about the lack of privacy online.
Have you ever tried to go down to the courthouse
to look at public records such as real estate transactions, court filings, etc.? If so,
you realize two things:
 |
there's an amazing amount of
information there that many people would rather have kept private |
 |
it's not easy to get
to that information |
Yes, it's "public", but there may be a
certain wisdom in having it behind high counters. If you're an officer of the court or a
licensed investigator, you know what to ask for and how to ask for it. If not, you'll find
it considerably harder to access the material. Once you get your hands on it, it's not
user-friendly.
What if every record
from every courthouse in the country were available online with the speed of a search
engine?
Guess what. It is available. KnowX. Not all
of it and not back very far and not up-to-the-minute current. It costs a little bit of
money for each piece of info. You still have to know how to interpret it and what to do
with it. But if I had a big enough budget to troll through those databases and then
organize my results, I could probably find out things that lots of people wouldn't want
know others to know.
Where will it stop?
Does KnowX make you feel more secure or less
secure?

Don't try this at home, kids
REVERSE EMAIL
LOOKUP
enter your email address or a friend's. Chances are good (not totally assured) that
yours or your friends name will come up. Now, click on that name if it is a link.
More information will surface, likely the name, complete mailing address, even the phone
number. And of course, street map programs will pinpoint the house, given the street
address.
enter that Email address in a good search engine and you will see likely see what
activity that person has been involved in on the Net (newsgroups and discussion
groups posted to, websites constructed, and so on.)
And you wonder how someone got your name just from an Email address? I dont
wonder, and neither should you!

FAQ
| Frequently asked questions. For starters, how do you
pronounce FAQ? ef-a-que? Try using that in a
sentence. Try making it plural. Okay, that one's no good.
fax? Immediate confusion with the machine for transmitting
documents.
fak? Most people use this one: Look for the fak. But is the
plural the same, as in deer: one deer, two deer; one fak, two fak? Two faxes? What about
spelling the plural? How would you use the thing as an adjective?
Meanwhile, FAQes (?) are all over the Web. If you use a search engine, you aren't going to find most of them.
However, the USENET newsgroups were the place to be in the
ten years between
| the mid-80's |
| when the Internet's protocols started getting
standardized |
| & |
| the mid-90's |
| when the Web brought pictures to the Internet
and threaded discussions started popping up on every site from A to Z (including this
course web) |
As the newsgroups grew in popularity,
newcomers would end up asking the same questions over and over. So the oldtimers started
making lists of frequently asked questions and posting them frequently to the newsgroup.
Newcomers were encouraged to "read the FAQ" before asking a question.
Many of these FAQ are regularly maintained and updated in
text form even though many have migrated to Web sites. However, the ones still in text
form can make great summaries of the common knowledge about many topics. The best
metasites I know are a site about FAQes, MIT's repository, and Ohio State's more
selective one. You can also read the newsgroup news.answers, where most FAQes are
periodially posted.
I find the FAQes as a whole to be strongest on topics related
to computers, to hobbies such as dog breeding, and to well-established academic
discliplines such as linguistics.
The accumulated texts of all the newsgroup postings may be
huge. But they're all in one place and you can search them quickly at DejaNews. These searches are especially good for
turning up experts who may well respond to a polite email enquiry. |

coming: the future of search engines
graphic displays and new methods of organization
must-see site GRAPHICAL THESAURUS highly
recommended
http://www.thinkmap.com



|