robots.txt and Robots META Tag

Forum Moderators: open

Message Too Old, No Replies

robots.txt and Robots META Tag

<META NAME="Robots" CONTENT="index,folow"

experienced

9:03 am on Feb 18, 2004 (gmt 0)

Hi,

If i use this line in the meta tag of the page
<META NAME="Robots" CONTENT="index,folow">, do i need to place a robots.txt on root to allow every agent to crawl the site..?

ciml

9:31 am on Feb 18, 2004 (gmt 0)

If you use disallow in your /robots.txt then Googlebot won't fetch the URLs; so it won't see that you're not forbidding Google from listing the URLs in their META tags.

If you want to be listed then the simplest way is to have no /robots.txt or robots META tag. If you want to disallow other bots in your /robots.txt then you should allow Googlebot there.

Netizen

9:38 am on Feb 18, 2004 (gmt 0)

Essentially robots.txt [robotstxt.org] is used to exclude search engine spiders, so if you don't want to exclude any then there is no need for a robots.txt file.

experienced

10:21 am on Feb 18, 2004 (gmt 0)

Thanks to all,

But if i still use this line in meta tags is there any problem may i face in future..?

Bcoz i have seen a number of sites doing well on google also and using this line without any robots.txt file.

Thanks
Exp...

ciml

10:37 am on Feb 18, 2004 (gmt 0)

Robots exclusion protocol isn't used to encourage crawling, just to prevent it.

Netizen

10:49 am on Feb 18, 2004 (gmt 0)

In essence you shouldn't need to put

on your pages. This is more designed for either allowing indexing but excluding following links or vice versa i.e.

Hopefully search engine spiders will index and follow links without you having to tell them to do it!

kaled

10:57 am on Feb 18, 2004 (gmt 0)

Essentially robots.txt is used to exclude search engine spiders, so if you don't want to exclude any then there is no need for a robots.txt file.

Technically, this may be correct, however, I'm pretty sure that my host was delivering an error page in response to a request for robots.txt. Furthermore, so far as I could tell, that error page was delivered without a 404 http header. So the error page would have been treated as a robots.txt file.

I am inclined to believe that a blank robots.txt file is better than none at all.

Kaled.

salmo

11:05 am on Feb 18, 2004 (gmt 0)

I would second Kaled's position, a blank robots.txt file is better than none, and it does no harm even if we are wrong and it doesn't help.

borisbaloney

11:13 am on Feb 18, 2004 (gmt 0)

A blank robots.txt file will also cut down on the 404 error stats in your statistics program(s). Better to focus on pages that are actually missing or links actually broken.

ciml

12:21 pm on Feb 18, 2004 (gmt 0)

I tend to use a blank /robots.txt, but just to keep the error log cleaner.

If Google were to stop crawling sites where /robots.txt they would miss a lot of Web sites.

There were problems with some server erroneously returning 403 instead of 404 and Google used to treat that as "don't index". While that made sense, too many people had problems a 403 for /robots.txt doesn't prevent crawling any more.

webdude

1:00 pm on Feb 18, 2004 (gmt 0)

Some time ago, I started using a blank robots.txt but was getting errors wqhen googlebot crawled. Unfortunately it was some time ago, and I remember some posts here that said it was better to use a robots.txt.

So I created a new one that disallows all search engines to crawl a non-existant directory. Seems to work for me...

User-agent: *
Disallow: /DontLookHere/

SyntheticUpper

1:18 pm on Feb 18, 2004 (gmt 0)

Whatever you do - check the spelling & syntax i.e. it is follow not folow :)

pageoneresults

1:50 pm on Feb 18, 2004 (gmt 0)

There was a little tidbit of information here last year where Brett recommended not to use a blank robots.txt file. I'm going to assume that many of us have sub-directories or individual files at the root level that we do not want indexed. It would be good practice to Disallow at least one file in the robots.txt file.

Apparently some robots were treating the blank robots.txt file as Disallowing the entire site. Ack!

borisbaloney

2:12 pm on Feb 18, 2004 (gmt 0)

Thanks pageoneresults.

Just to confirm - if I have mod_rewrite producing static-looking pages, I could dis-allow Google from my cgi-bin since it never sees a page directly from it?

Netizen

3:05 pm on Feb 18, 2004 (gmt 0)

Absolutely correct.