Forum Moderators: open

Message Too Old, No Replies

Robots.txt

Best one for free access...

         

Harley_m

12:55 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



Looking at the recent posts on the subject - ive decided to put a robots.txt into my site, even though i dont need to restrict access - what im thinking is it cant do any harm can it...? im having trouble getting the site deep crawled and freshed, and its just possible thats what it wants to see to be happy...

anyway - it only wont do any harm if its written perfectly, so what would it need to include to give free and total access...just :

user-agent: *
disallow:

or index,follow too...?

can someone write it correctly for me, as i dont want to fluff it up!

Thanks - and good luck to all in the update...

Harley

Brett_Tabke

12:58 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



>is it cant do any harm can it

It shouldn't, but there were bots in the past that if a robots.txt was present, they turned around. Obviously not the case with any large se's, but there is no reason to have a robots.txt unless you want to block parts of the site.

Harley_m

1:03 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



i thought there were rumours of inks bot assuming index no follow...?

bobmark

3:06 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



I can't imagine why you wouldn't have a robots.txt file. In my case I have various areas that, as an added security precaution, I don't want crawled by anyone and experience no problems with any major SE's.
However, you DO have to be careful, especially with widget* disallows.
I copied various portions of the sample webmasterword robots.txt file to supplement my own and noticed that a couple of welcome - or at least not undesirable - bots seemed to interpret a disallow like:
User-agent: Zeus*
Disallow: /
as applying to them (specifically Alexa, for one) as if they picked up only the * and not the preceeding text (Googlebot had no problem parsing this correctly). In my case, I do not want the bloodsucking Zeus cult people all over my site so a robots.txt file is essential, I just had to avoid disallows that incorporated a wildcard character.
(I know this is true as my logs showed a couple of SE's that turned around and left when encountering the above referenced statement and returned to deep crawl again once it was ammended).