Welcome to WebmasterWorld Guest from

Forum Moderators: keyplyr & mack

Message Too Old, No Replies

how do spiders work?

noob needs a general idea

4:51 pm on Dec 20, 2002 (gmt 0)

New User

10+ Year Member

joined:Dec 20, 2002
votes: 0

ok, I'm overloading with info and I'm not getting a clear picture on what i'm reading. What I would like to know, in general terms, is how a spider or bot work, for like google.
Excuse me for these words i use, i'm sure they are wrong, but, here goes.
For example, does googlebot go on its first trip to a new site and look for index.html and start reading it and seeing where it can go from there, or does it "take a shapshot" of the root directory and then starts reading the file names in the directory and then read each file? How does it know where to go? what to read first?

The reason i'm asking is I plan to use php/mysql to create pages. My idea is to have a basic, text only html page sitting on the directory that a bot can read. I will have a index.php that will load these basic pages into a template, striping out what is not needed in order to display to a viewer. The basic page will be used to provide a "text only" version as well and will contain the keywords and other meta info.

6:19 pm on Dec 20, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 1, 2002
votes: 0

Remote clients cannot get a listing of your directory, unless you allow it.
Besides, many sites do not even have separate files located in directories at all.

The only way for spiders to find your pages is to follow the links. Most spiders get the home page and then move from there.


9:36 pm on Dec 20, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 20, 2002
votes: 0

A spider reads in the index file (it could be .html but not necessarily - it's whatever the user will see when they type the domain name). It then simply follows the <a href...> tags to find other pages. Orphan pages are not usually indexed.

Following this (and sometimes seperate), they run it through a spam penalty/ban filter to get rid of people who are obviously trying to cheat...but that's another story :)


Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members