homepage Welcome to WebmasterWorld Guest from 54.167.185.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt
Do you use it if you want to all all robots?
WebSeeker




msg:1528447
 2:14 pm on Jan 2, 2001 (gmt 0)

Do you use Robots.txt if you want to allow all robots from all engines? Or is it only used when you want to disallow something?

 

Macguru




msg:1528448
 3:48 pm on Jan 2, 2001 (gmt 0)

Hi WebSeeker,

I use this file mainly to disalow access to some robots, files or folders. Also to cut down on thoses 404 Errors.

GWJ




msg:1528449
 3:23 pm on Jan 3, 2001 (gmt 0)

Hi Macguru,

>>Also to cut down on thoses 404 Errors.

I'm sorry, could you go into more detail on this statment. How does it cutn down?

TIA,

Brian

GWJ




msg:1528450
 3:24 pm on Jan 3, 2001 (gmt 0)

Hi Macguru,

>>Also to cut down on thoses 404 Errors.

I'm sorry, could you go into more detail on this statment. How does it cut down?

TIA,

Brian

Macguru




msg:1528451
 3:33 pm on Jan 3, 2001 (gmt 0)

Hi GWJ,

Maybe I was unclear on that, sorry, my mother tongue is french. (I try hard to make sense... ;) )

When most robots crawl your site, they look for the robots.txt at the root level.
If they dont find it, a 404 error is registered to the error log, when they do, they read it and abide by it.

Whith an error log filled with 404 error from robots looking for this file it is harder to concentrate on "real" 404 errors.

GWJ




msg:1528452
 12:43 pm on Jan 4, 2001 (gmt 0)

Gotcha MacGuru. Very good English by the way.

Brian

Brett_Tabke




msg:1528453
 8:30 am on Apr 16, 2001 (gmt 0)

One thing I noticed while doing the robots.txt crawl/validator over at SEW, was so many people were redirecting to a 404. Often that 404 was "seamless" with no redirect or forward of any nature. You just pull the robots.txt and get their 404 page. A search engine will then have to figure out if it is really a robots.txt or an html page. That is pretty easy to do (look for <html> or <body> tags), but if it is a frame set page, it can be confusing. One trick I did find, was someone using a frameset page that actually worked as both a robots.txt AND as the frameset page. This knowledge will be kept under lock and key - it's the last thing we need on the web.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved