Welcome to WebmasterWorld Guest from 54.144.7.239

Forum Moderators: goodroi

Message Too Old, No Replies

Why robots.txt if you want everything indexed?

Does it even matter?

     
5:56 pm on Mar 2, 2003 (gmt 0)

New User

10+ Year Member

joined:Jan 29, 2003
posts:3
votes: 0


Hi everyone,

I've lurked for awhile and tried to find the answer to this question. If you want everything in your site indexed - does it even matter if you have a robot.txt file? I realize that you'll get a 404 error for each time, but do the spiders care?

Thanks for helping out a webmaster-wannabe!

Larry

6:09 pm on Mar 2, 2003 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7564
votes: 4


Robots.txt is a good tool for controling what gets spidered and what doesnt. But it is also widely used for keeping the bad bots out altogether. It is often necasery to ban a bot that abuses your server by hogging bandwidth or fetching things you would rather it didn't.

Although some bots choose to ignore robots.txt altogether and when this happenes it is usualy a case of barring the bot from your server totaly by using your htaccess file. This is the internet equivilant of saying "you aint getting nothing here"

Hope this helps.

11:12 am on Mar 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 4, 2002
posts:666
votes: 0


You could just put an empty robots.txt file in there.

Thus everything will get indexed and no pesky 404s

2:06 pm on Mar 3, 2003 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7575
votes: 0


... it is also widely used for keeping the bad bots out altogether.

Only if the bot requests the robots.txt file and not all of them do.

The robots.txt file is good for those bots that do request it and in those cases you can tell them which directories to stay out of - like an image directory or your cgi directory.

2:22 pm on Mar 3, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 23, 2002
posts:150
votes: 0


>>You could just put an empty robots.txt file in there.<<

Does that help?
What scenerio could you see with no robot.txt at all?
I've read a lot about keeping bots out, but know little about using robot.txt to draw them in deeper.

2:22 pm on Mar 3, 2003 (gmt 0)

New User

10+ Year Member

joined:Jan 29, 2003
posts:3
votes: 0


Thanks everyone for all the info. So I guess the final answer is - that the spiders don't penalize you for NOT having a robots.txt, but if you want to keep *most* spiders from accessing certain files - it's best to have one (and get rid of the 404's).
2:46 pm on Mar 3, 2003 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7575
votes: 0


Larry_C (welcome to WebmasterWorld BTW)

>> but if you want to keep *most* spiders from accessing certain files - it's best to have one

If you really want to protect certain files/directories or keep rogue bots out then you'll need to use something a bit stronger like .htaccess (Apache Server). Do a search here and you'll find plenty of reading to get you started.

And yes, I don't believe the spiders will penalize you for not having a robots.txt file and while I'm not completely sure, I can't think of any way having one could improve the spidering of your website other than telling the spider what not to include.

2:50 pm on Mar 3, 2003 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5642
votes: 41


If you have a custom 404 page, having a robots.txt will conserve a bit of bandwidth.
3:49 pm on Mar 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 7, 2003
posts:1230
votes: 0


hi larry_c, some very nice spiders ;) only crawl your site if they find a robots.txt. so it's good to have one allowing what should be allowed.