Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt not working

bots not obeying robots.txt

         

cyberjunkie

1:55 am on Oct 24, 2004 (gmt 0)

10+ Year Member



Hello,

I have a very simple robots.txt file that is supposed to be disallowing everything from all robots, but doesn't seem to, as I have been able to find my site indexed in several search engines.

My robots.txt file IS located in my public root directory i.e. [myserver.com...] and contains this:

User-agent: *
Disallow: /

Is there something I'm missing?

TIA.

Sanenet

2:14 am on Oct 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Which SEs are you appearing in? Not all of them respect robots.txt.

cyberjunkie

2:37 am on Oct 25, 2004 (gmt 0)

10+ Year Member



Thanks for the prompt reply Sanenet...

I've only checked the following major SE's, Yahoo, MSN, Lycos & Google, and my site appears in all of them.

Because both Google and Yahoo (from my understanding) do obey robots.txt, I'm presuming that the problem lies elsewhere.

Any other suggestions?

jdMorgan

11:05 pm on Oct 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A critical question to ask is, "*How* do the listings appear in Google, etc."?

That is, are they full listings with a title and a description (or snippet) from your page, or are they listed by URL only or URL with link-text only?

Google, a few others, and most recently, Yahoo, have been listing any page they find a link to, even if they are not allowed to fetch the page by robots.txt. For emphasis, they consider robots.txt to mean, "Do not fetch this page" rather than, "Do not list this page." Other search engines treat a robots.txt Disallow as saying not to list it, but the Standard for Robots Exclusion favors the "Do not Fetch" interpretation.

The solution -- if this is the case for your site -- is to allow the 'bots to fetch your publically-linked page, and mark each one with a robots meta-tag including "noindex,nofollow". Given that scenario, Google and Yahoo will not list the page. A problem arises here for non-html pages, such as .pdf and media-type files that cannot be marked with a robots meta-tag -- there is no way to keep them out of the SERPs except to make sure there are no spiderable links to them.

Jim