Meta Tags & Robots.txt

Forum Moderators: goodroi

Message Too Old, No Replies

Meta Tags & Robots.txt

tcustom

1:03 am on May 13, 2004 (gmt 0)

So, if I use a robots.txt file, then do I need to include the:

meta name="robots" contents="whatever"

statement in my html page?

Dreamquick

1:21 am on May 13, 2004 (gmt 0)

Depends what you want the metatag to do.

If it's simply mirroring a "dont crawl this file" command you are handling via robots.txt then you can remove it. However if you are using the metatag to issue a more advanced command that's not available from robots.txt (noarchive, nofollow etc) then you'll need to keep it because you won't be able to achieve that functionality otherwise.

- Tony

tcustom

1:37 am on May 13, 2004 (gmt 0)

Tony,

I'm actually using the most of the robots.txt file from webmasterworld.com where I am disallowing bad spiders, allowing good ones by placing a * for the user-agent and blank for disallow after all the bad spiders, then disallowing certain directories of my site.

In my html pages, all but one have the meta name for follow, index. One page has a noindex, nofollow.

The meta name has been there a while, now I just added the robots.txt. So with what I have in the .txt file, are the meta names still necessary....

jdMorgan

1:57 am on May 13, 2004 (gmt 0)

robots.txt takes precedence over on-page meta robots tags, because if a page is disallowed in robots.txt, a good robot won't fetch the page, and so can't read the meta tags.

Be aware that a page disallowed in robots.txt may still appear in some search results if a search engine finds a link to it. It will appear as a URL-only listing, with no title and no description. This is not true of all engines, but Google, Yahoo, and Ask Jeeves have been observed with this behaviour. They don't fetch the page, they just show the link they found in the results.

Further, I noticed recently that Yahoo is now showing such links using whatever link text they found on the link as the title for their result.

The solution to the problem (if it concerns you) is to 'Allow' that page in robots.txt, and use the on-page noindex tag to keep it from being listed in search results. This costs you bandwidth, since you have to let the robot read the page.

Anyway, things will be clearer if you remember that the on-page meta robots tags can't be read if the page is disallowed in robots.txt.

Jim