homepage Welcome to WebmasterWorld Guest from 54.227.34.0
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Why robots.txt if you want everything indexed?
Does it even matter?
Larry_C

10+ Year Member



 
Msg#: 33 posted 5:56 pm on Mar 2, 2003 (gmt 0)

Hi everyone,

I've lurked for awhile and tried to find the answer to this question. If you want everything in your site indexed - does it even matter if you have a robot.txt file? I realize that you'll get a 404 error for each time, but do the spiders care?

Thanks for helping out a webmaster-wannabe!

Larry

 

mack

WebmasterWorld Administrator mack us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 33 posted 6:09 pm on Mar 2, 2003 (gmt 0)

Robots.txt is a good tool for controling what gets spidered and what doesnt. But it is also widely used for keeping the bad bots out altogether. It is often necasery to ban a bot that abuses your server by hogging bandwidth or fetching things you would rather it didn't.

Although some bots choose to ignore robots.txt altogether and when this happenes it is usualy a case of barring the bot from your server totaly by using your htaccess file. This is the internet equivilant of saying "you aint getting nothing here"

Hope this helps.

Krapulator

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 33 posted 11:12 am on Mar 3, 2003 (gmt 0)

You could just put an empty robots.txt file in there.

Thus everything will get indexed and no pesky 404s

lorax

WebmasterWorld Administrator lorax us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 33 posted 2:06 pm on Mar 3, 2003 (gmt 0)

... it is also widely used for keeping the bad bots out altogether.

Only if the bot requests the robots.txt file and not all of them do.

The robots.txt file is good for those bots that do request it and in those cases you can tell them which directories to stay out of - like an image directory or your cgi directory.

OntheEdge

10+ Year Member



 
Msg#: 33 posted 2:22 pm on Mar 3, 2003 (gmt 0)

>>You could just put an empty robots.txt file in there.<<

Does that help?
What scenerio could you see with no robot.txt at all?
I've read a lot about keeping bots out, but know little about using robot.txt to draw them in deeper.

Larry_C

10+ Year Member



 
Msg#: 33 posted 2:22 pm on Mar 3, 2003 (gmt 0)

Thanks everyone for all the info. So I guess the final answer is - that the spiders don't penalize you for NOT having a robots.txt, but if you want to keep *most* spiders from accessing certain files - it's best to have one (and get rid of the 404's).

lorax

WebmasterWorld Administrator lorax us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 33 posted 2:46 pm on Mar 3, 2003 (gmt 0)

Larry_C (welcome to WebmasterWorld BTW)

>> but if you want to keep *most* spiders from accessing certain files - it's best to have one

If you really want to protect certain files/directories or keep rogue bots out then you'll need to use something a bit stronger like .htaccess (Apache Server). Do a search here and you'll find plenty of reading to get you started.

And yes, I don't believe the spiders will penalize you for not having a robots.txt file and while I'm not completely sure, I can't think of any way having one could improve the spidering of your website other than telling the spider what not to include.

buckworks

WebmasterWorld Administrator buckworks us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 33 posted 2:50 pm on Mar 3, 2003 (gmt 0)

If you have a custom 404 page, having a robots.txt will conserve a bit of bandwidth.

hakre

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 33 posted 3:49 pm on Mar 3, 2003 (gmt 0)

hi larry_c, some very nice spiders ;) only crawl your site if they find a robots.txt. so it's good to have one allowing what should be allowed.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved