Forum Moderators: goodroi
ive been studying SEO for over 3 years now, have my site usually first page of most if not all of our key phrases, and know the ins and outs.
however, i have never seen a real big benefit to implementing a robots.txt file on my sites.
why do you use a robots.txt file?
User-agent: googlebot
User-agent: slurp
User-agent: msnbot
User-agent: teoma
User-agent: W3C-checklink
User-agent: WDG_SiteValidator
Disallow: /js/
Disallow: /nav/ User-agent: Mediapartners-Google*
Disallow: User-agent: *
Disallow: /
A partner of mine sent me one that was bout 100 lines long and had all kinds of stuff in it.
It all depends on the size of the site and the volume of dynamics. We like to handle things at the page level with a noindex, nofollow directive instead of relying on robots.txt. Also, if there were a 100 lines in that file, that is surely providing information to prying eyes that maybe they shouldn't have quick access to?
The robots.txt is fine for "general" good bot blocking but I wouldn't rely on it for managing crawler activity at the page level. We're also handling requests through various other routines and redirecting those to their appropriate destinations.
Only if *all* robots in your User-Agent list recognize "Allow," which is NOT part of the Standard for Robot Exclusion, but rather a semi-proprietary "extension" to the protocol.
Be sure to check the "webmaster info" page for each robot to be sure it supports "Allow," and before using any other "extension" which is not universally-supported, such as wild-card paths.
Jim
[edited by: jdMorgan at 10:26 pm (utc) on Feb. 20, 2009]
i really cant do anything i guess... im hoping what this could do is show google and the other SE crawlers that im taking the time to implement this, and take more ownership of my site.
and also, it may shave a few seconds of time from crawlers in my site.
My experience is that SEs, won't access restricted folders listed in the robots.txt by default, but they can be forced to access them by other means, via external links for instance. Same goes for everyone really who uses a browser and is one of the reasons I do not use the robots.txt content to direct traffic.
My experience is that SEs, won't access restricted folders listed in the robots.txt by default, but they can be forced to access them by other means, via external links for instance.
I think you'll want to be careful here and make sure there are no root level pages at the Disallowed directories. If there are, you will see URI only listings when you perform site: searches.
Any directories that shouldn't be for public consumption of course would be password protected. You don't need to list those in the robots.txt file. You're only providing a map if you do. Place a noindex, nofollow directive at the login page and be done with it.