homepage Welcome to WebmasterWorld Guest from 54.227.67.210
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
No Robots.txt page
what's the harm?
dauction




msg:1525924
 9:02 pm on Feb 27, 2004 (gmt 0)

Will not having a robots.txt page cause a bot not to spider a site?

And other than an occaisional psycho bot ..any other harm in not running a robots.txt page?

 

Mardi_Gras




msg:1525925
 9:08 pm on Feb 27, 2004 (gmt 0)

>Will not having a robots.txt page cause a bot not to spider a site?

Robots.txt is designed to EXCLUDE robots - not to invite them in.

dauction




msg:1525926
 9:19 pm on Feb 27, 2004 (gmt 0)

Mardi_Gras that was always my understanding , I have a bot that stops at robots.txt looking for instryuctions.. it gets a 404 ..leaves and dosent pick up any other pages..

I guess I'm just irrated that it leaves al the time..wondering if Ican use the robots.txt page as an invite using index/follow instructions

<meta name="robots" content="index,follow">

then create a link to my site map?

Mardi_Gras




msg:1525927
 9:28 pm on Feb 27, 2004 (gmt 0)

Just drop in a simple robots.txt and see what happens - it can't hurt to try :)

dauction




msg:1525928
 9:35 pm on Feb 27, 2004 (gmt 0)

Cant hurt to try LOL.. actually that's why I was asking.. because most of the pages rank fairly well anyways on the se's I am in and I didnt want to screw up any of those rankings by "playing around" in the robots.txt

99% sure it shouldnt cause any problems.. it's just that 1% uncertainty that keeps nagging at me ..

I'll try it out on one of my other least important sites..

thanks for your help

dannyboy




msg:1525929
 7:26 am on Feb 29, 2004 (gmt 0)

I'd put one in there just to prevent the error_log growing with 404 errors due to search engines requesting robots.txt

paybacksa




msg:1525930
 4:11 am on Mar 2, 2004 (gmt 0)

I agree... it's so easy and you'll gain from the experience anyway. Not bad to have a task to do that has no risk, no deadline, no cost, eh?

It's easy. Just point your browser to your favorite website and put in the domain name follwed by /robots.txt like this:

[webmasterworld.com...] <enter>

and you'll get their robots.txt file. Edit is and upload it to your root directory next to the INDEX file. Here's a clip from Brett's - he had mentioned last weke he had to exclude unknown bots because they were hiting his site so hard and costing him bandwidth. Not a bad idea IMHO to start with this one...

paybacksa-----

#
# WebmasterWorld.com: robots.txt
# GNU Robots.txt Feel free to use with credit given to WebmasterWorld.
# Please, we do NOT allow nonauthorized robots any longer.
# [searchengineworld.com...]
# Yes, feel free to copy and use the following.

User-agent: msnbot
Disallow: /

User-agent: scooter
Disallow: /

User-agent: naver
Disallow: /

User-agent: dumbot
Disallow: /

User-agent: Hatena Antenna
Disallow: /
-----truncated by paybacksa

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved