homepage Welcome to WebmasterWorld Guest from 54.163.91.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
robots.txt and Robots META Tag
<META NAME="Robots" CONTENT="index,folow"
experienced




msg:200238
 9:03 am on Feb 18, 2004 (gmt 0)

Hi,

If i use this line in the meta tag of the page
<META NAME="Robots" CONTENT="index,folow">, do i need to place a robots.txt on root to allow every agent to crawl the site..?

 

ciml




msg:200239
 9:31 am on Feb 18, 2004 (gmt 0)

If you use disallow in your /robots.txt then Googlebot won't fetch the URLs; so it won't see that you're not forbidding Google from listing the URLs in their META tags.

If you want to be listed then the simplest way is to have no /robots.txt or robots META tag. If you want to disallow other bots in your /robots.txt then you should allow Googlebot there.

Netizen




msg:200240
 9:38 am on Feb 18, 2004 (gmt 0)

Essentially robots.txt [robotstxt.org] is used to exclude search engine spiders, so if you don't want to exclude any then there is no need for a robots.txt file.

experienced




msg:200241
 10:21 am on Feb 18, 2004 (gmt 0)

Thanks to all,

But if i still use this line in meta tags is there any problem may i face in future..?

Bcoz i have seen a number of sites doing well on google also and using this line without any robots.txt file.

Thanks
Exp...

ciml




msg:200242
 10:37 am on Feb 18, 2004 (gmt 0)

Robots exclusion protocol isn't used to encourage crawling, just to prevent it.

Netizen




msg:200243
 10:49 am on Feb 18, 2004 (gmt 0)

In essence you shouldn't need to put

<META NAME="Robots" CONTENT="index,follow">

on your pages. This is more designed for either allowing indexing but excluding following links or vice versa i.e.

<META NAME="Robots" CONTENT="index,nofollow">

or

<META NAME="Robots" CONTENT="noindex,follow">

Hopefully search engine spiders will index and follow links without you having to tell them to do it!

kaled




msg:200244
 10:57 am on Feb 18, 2004 (gmt 0)

Essentially robots.txt is used to exclude search engine spiders, so if you don't want to exclude any then there is no need for a robots.txt file.

Technically, this may be correct, however, I'm pretty sure that my host was delivering an error page in response to a request for robots.txt. Furthermore, so far as I could tell, that error page was delivered without a 404 http header. So the error page would have been treated as a robots.txt file.

I am inclined to believe that a blank robots.txt file is better than none at all.

Kaled.

salmo




msg:200245
 11:05 am on Feb 18, 2004 (gmt 0)

I would second Kaled's position, a blank robots.txt file is better than none, and it does no harm even if we are wrong and it doesn't help.

borisbaloney




msg:200246
 11:13 am on Feb 18, 2004 (gmt 0)

A blank robots.txt file will also cut down on the 404 error stats in your statistics program(s). Better to focus on pages that are actually missing or links actually broken.

ciml




msg:200247
 12:21 pm on Feb 18, 2004 (gmt 0)

I tend to use a blank /robots.txt, but just to keep the error log cleaner.

If Google were to stop crawling sites where /robots.txt they would miss a lot of Web sites.

There were problems with some server erroneously returning 403 instead of 404 and Google used to treat that as "don't index". While that made sense, too many people had problems a 403 for /robots.txt doesn't prevent crawling any more.

webdude




msg:200248
 1:00 pm on Feb 18, 2004 (gmt 0)

Some time ago, I started using a blank robots.txt but was getting errors wqhen googlebot crawled. Unfortunately it was some time ago, and I remember some posts here that said it was better to use a robots.txt.

So I created a new one that disallows all search engines to crawl a non-existant directory. Seems to work for me...

User-agent: *
Disallow: /DontLookHere/

SyntheticUpper




msg:200249
 1:18 pm on Feb 18, 2004 (gmt 0)

Whatever you do - check the spelling & syntax i.e. it is follow not folow :)

pageoneresults




msg:200250
 1:50 pm on Feb 18, 2004 (gmt 0)

There was a little tidbit of information here last year where Brett recommended not to use a blank robots.txt file. I'm going to assume that many of us have sub-directories or individual files at the root level that we do not want indexed. It would be good practice to Disallow at least one file in the robots.txt file.

Apparently some robots were treating the blank robots.txt file as Disallowing the entire site. Ack!

borisbaloney




msg:200251
 2:12 pm on Feb 18, 2004 (gmt 0)

Thanks pageoneresults.

Just to confirm - if I have mod_rewrite producing static-looking pages, I could dis-allow Google from my cgi-bin since it never sees a page directly from it?

Netizen




msg:200252
 3:05 pm on Feb 18, 2004 (gmt 0)

Absolutely correct.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved