|


Finding
Information on the World Wide Web
The Internet, and
its most visible component the World Wide Web, has hundreds of millions
of pages available, waiting to present information on an amazing variety
of topics. The bad news about the Internet is that there are hundreds
of millions of pages available, most of them titled according to the whim
of their author, almost all of them sitting on servers with cryptic names.
When you need to know about a particular subject, how do you know which
pages to read? If you're like most people, you visit an Internet search
engine.
Internet search engines are special sites on the Web that are designed
to help people find information stored on other sites. There are differences
in the ways various search engines work, but they all perform three basic
tasks:
- They search the
Internet -- or select pieces of the Internet -- based on important words.
- They keep an index
of the words they find, and where they find them.
- They allow users
to look for words or combinations of words found in that index.
Early search engines held an index of a few hundred thousand pages and
documents, and received maybe one or two thousand inquiries each day.
Today, a top search engine will index hundreds of millions of pages, and
respond to tens of millions of queries per day.
How
do they get the keywords from the websites to put in their index?
Before a search engine
can tell you where a file or document is, it must be found. To find information
on the hundreds of millions of Web pages that exist, a search engine employs
special software robots, called spiders, to build lists of the words found
on Web sites. When a spider is building its lists, the process is called
Web crawling. In order to build and maintain a useful list of words, a
search engine's spiders have to look at a lot of pages.
|