Welcome to WebmasterWorld Guest from 54.159.190.106

Forum Moderators: open

Message Too Old, No Replies

robots.txt and Robots META Tag

<META NAME="Robots" CONTENT="index,folow"

   
9:03 am on Feb 18, 2004 (gmt 0)

10+ Year Member



Hi,

If i use this line in the meta tag of the page
<META NAME="Robots" CONTENT="index,folow">, do i need to place a robots.txt on root to allow every agent to crawl the site..?

9:31 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member



If you use disallow in your /robots.txt then Googlebot won't fetch the URLs; so it won't see that you're not forbidding Google from listing the URLs in their META tags.

If you want to be listed then the simplest way is to have no /robots.txt or robots META tag. If you want to disallow other bots in your /robots.txt then you should allow Googlebot there.

9:38 am on Feb 18, 2004 (gmt 0)

10+ Year Member



Essentially robots.txt [robotstxt.org] is used to exclude search engine spiders, so if you don't want to exclude any then there is no need for a robots.txt file.
10:21 am on Feb 18, 2004 (gmt 0)

10+ Year Member



Thanks to all,

But if i still use this line in meta tags is there any problem may i face in future..?

Bcoz i have seen a number of sites doing well on google also and using this line without any robots.txt file.

Thanks
Exp...

10:37 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Robots exclusion protocol isn't used to encourage crawling, just to prevent it.
10:49 am on Feb 18, 2004 (gmt 0)

10+ Year Member



In essence you shouldn't need to put

<META NAME="Robots" CONTENT="index,follow">

on your pages. This is more designed for either allowing indexing but excluding following links or vice versa i.e.

<META NAME="Robots" CONTENT="index,nofollow">

or

<META NAME="Robots" CONTENT="noindex,follow">

Hopefully search engine spiders will index and follow links without you having to tell them to do it!

10:57 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Essentially robots.txt is used to exclude search engine spiders, so if you don't want to exclude any then there is no need for a robots.txt file.

Technically, this may be correct, however, I'm pretty sure that my host was delivering an error page in response to a request for robots.txt. Furthermore, so far as I could tell, that error page was delivered without a 404 http header. So the error page would have been treated as a robots.txt file.

I am inclined to believe that a blank robots.txt file is better than none at all.

Kaled.

11:05 am on Feb 18, 2004 (gmt 0)

10+ Year Member



I would second Kaled's position, a blank robots.txt file is better than none, and it does no harm even if we are wrong and it doesn't help.
11:13 am on Feb 18, 2004 (gmt 0)

10+ Year Member



A blank robots.txt file will also cut down on the 404 error stats in your statistics program(s). Better to focus on pages that are actually missing or links actually broken.
12:21 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I tend to use a blank /robots.txt, but just to keep the error log cleaner.

If Google were to stop crawling sites where /robots.txt they would miss a lot of Web sites.

There were problems with some server erroneously returning 403 instead of 404 and Google used to treat that as "don't index". While that made sense, too many people had problems a 403 for /robots.txt doesn't prevent crawling any more.

1:00 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some time ago, I started using a blank robots.txt but was getting errors wqhen googlebot crawled. Unfortunately it was some time ago, and I remember some posts here that said it was better to use a robots.txt.

So I created a new one that disallows all search engines to crawl a non-existant directory. Seems to work for me...

User-agent: *
Disallow: /DontLookHere/

1:18 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



Whatever you do - check the spelling & syntax i.e. it is follow not folow :)
1:50 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



There was a little tidbit of information here last year where Brett recommended not to use a blank robots.txt file. I'm going to assume that many of us have sub-directories or individual files at the root level that we do not want indexed. It would be good practice to Disallow at least one file in the robots.txt file.

Apparently some robots were treating the blank robots.txt file as Disallowing the entire site. Ack!

2:12 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



Thanks pageoneresults.

Just to confirm - if I have mod_rewrite producing static-looking pages, I could dis-allow Google from my cgi-bin since it never sees a page directly from it?

3:05 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



Absolutely correct.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month