How Does a Robot Know the Domain Name?

Forum Moderators: open

Message Too Old, No Replies

How Does a Robot Know the Domain Name?

Several domain names pointed to different sub-directories.

GreenTea

7:17 pm on May 18, 2005 (gmt 0)

As a robot crawls over web space, can it figure out which files go with which domain name?

For example, I've got different domain names pointed to different sub-directories in the same web space. The root directory has one domain name pointed at it, call it www.firstdomain.com. A completely different and unrelated domain name (www.widgetsdomain.com) is also pointed at a sub-directory (widgets) and a third domain name (www.doodlesdomain.com) is pointed at another sub-directory (doodles). Will a robot know, as it crawls, that everything in the widgets sub-directory has nothing to do with www.firstdomain.com? Is there a way that I can tell it?

It seems to me that when the sites finally do get listed at search engines that it'd be nice to see them with their unique domain names and not the arbitrary domain name for the root.

Thanks for any help from this newbie!

jdMorgan

11:28 pm on May 18, 2005 (gmt 0)

GreenTea,

Welcome to WebmasterWorld!

Follow those 'subdirectory-pointed' URLs with your browser, or better yet, with a server headers checker [webmasterworld.com]. A search engine robot will see exactly what you see in the browser address bar, or in the headers display of the headers checker. As long as there is no redirect to the 'main' domain, the robot won't see that 'main' domain.

All of this is based on HTTP [w3.org]; Robots don't have any 'magic' methods that browsers can't use.

Jim

GreenTea

4:07 am on May 19, 2005 (gmt 0)

Oh mucho thanks Jim! That takes a load off my mind.

The header checker is nifty!

Thanks again.

Dave_A

12:09 am on Jun 8, 2005 (gmt 0)

My search engine robot has a few commands built into it that allow me to stipulate if it will follow any links found and carry on indexing them all or remain "Indomain" and stay inside each hosting, so it can follow all links found even off to other web sites stored on different servers or it can stay inside one domain.