Welcome to WebmasterWorld Guest from 184.73.18.109

Forum Moderators: goodroi

Why robots.txt if you want everything indexed?

Does it even matter?

   
5:56 pm on Mar 2, 2003 (gmt 0)

10+ Year Member



Hi everyone,

I've lurked for awhile and tried to find the answer to this question. If you want everything in your site indexed - does it even matter if you have a robot.txt file? I realize that you'll get a 404 error for each time, but do the spiders care?

Thanks for helping out a webmaster-wannabe!

Larry

6:09 pm on Mar 2, 2003 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Robots.txt is a good tool for controling what gets spidered and what doesnt. But it is also widely used for keeping the bad bots out altogether. It is often necasery to ban a bot that abuses your server by hogging bandwidth or fetching things you would rather it didn't.

Although some bots choose to ignore robots.txt altogether and when this happenes it is usualy a case of barring the bot from your server totaly by using your htaccess file. This is the internet equivilant of saying "you aint getting nothing here"

Hope this helps.

11:12 am on Mar 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could just put an empty robots.txt file in there.

Thus everything will get indexed and no pesky 404s

2:06 pm on Mar 3, 2003 (gmt 0)

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



... it is also widely used for keeping the bad bots out altogether.

Only if the bot requests the robots.txt file and not all of them do.

The robots.txt file is good for those bots that do request it and in those cases you can tell them which directories to stay out of - like an image directory or your cgi directory.

2:22 pm on Mar 3, 2003 (gmt 0)

10+ Year Member



>>You could just put an empty robots.txt file in there.<<

Does that help?
What scenerio could you see with no robot.txt at all?
I've read a lot about keeping bots out, but know little about using robot.txt to draw them in deeper.

2:22 pm on Mar 3, 2003 (gmt 0)

10+ Year Member



Thanks everyone for all the info. So I guess the final answer is - that the spiders don't penalize you for NOT having a robots.txt, but if you want to keep *most* spiders from accessing certain files - it's best to have one (and get rid of the 404's).
2:46 pm on Mar 3, 2003 (gmt 0)

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Larry_C (welcome to WebmasterWorld BTW)

>> but if you want to keep *most* spiders from accessing certain files - it's best to have one

If you really want to protect certain files/directories or keep rogue bots out then you'll need to use something a bit stronger like .htaccess (Apache Server). Do a search here and you'll find plenty of reading to get you started.

And yes, I don't believe the spiders will penalize you for not having a robots.txt file and while I'm not completely sure, I can't think of any way having one could improve the spidering of your website other than telling the spider what not to include.

2:50 pm on Mar 3, 2003 (gmt 0)

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member



If you have a custom 404 page, having a robots.txt will conserve a bit of bandwidth.
3:49 pm on Mar 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hi larry_c, some very nice spiders ;) only crawl your site if they find a robots.txt. so it's good to have one allowing what should be allowed.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month