Forum Moderators: open

Message Too Old, No Replies

Robots meta tag vs X-Robots-Tag

         

JorgeV

4:21 pm on Oct 28, 2019 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello-

Is it better to use meta tags? or HTTP tags for robots directives?
<meta name="robots" content="noindex">

or
X-Robots-Tag: noindex


Both do the same, but if both methods exist, there is certainly a good reason. Is meta tag safer to use, understood by all robots?

lucy24

7:51 pm on Oct 28, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Personally I only use the X-Robots-Tag in situations where the <meta> is impossible. That means non-page content, especially things like scripts that search engines should generally be allowed to crawl but you certainly don't want them indexed. In theory you could apply X-Robots to images as well, but there I prefer to robot-out entire directories. (So far, not even Google has started yapping about not being allowed to crawl images--and I can’t remember ever seeing a “the site’s robots.txt rules prevent us from showing you this picture even though it might be just what you’re looking for”.)

JorgeV

9:37 pm on Oct 28, 2019 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Thank you.

And what is the best, to exclude a folder (html pages) from being indexed? To add a noindex tag to the pages, or to use the robots.txt? I think that, using the robots.txt is better, because the pages are never requested from Googlebot (and other legitimate bots), so, it avoids unnecessary hits and resources usage. But in the other hand, may be Search engines "like" to see the content of the pages, even if they are not going to index them after all ?

lucy24

1:13 am on Oct 29, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah, the eternal dilemma:
-- let them crawl and see things you don’t want them to see, relying upon them to honor your noindex meta
-- ask them to stay out, and be bombarded with Search Console messages howling about “indexed but blocked by robots.txt” (to which my stock reply is “So ### what? That’s your problem, not mine” though I realize others may see it differently).

If you want to exclude an entire folder, I would unhesitatingly say: robot it out, and ignore subsequent complaints. Unless you have the world’s most brilliant and inimitable linking text, those pages will not show up in response to any real-life searches. (One of my roboted-out directories contains a handful of pages that are linked from an accessible page. In each case, the linking text is “The rest of the story”. I have experimented, and found that even if I constrain the search to example.com, these uncrawled pages will not be among the results.)

Food for thought: Another of my roboted-out directories contains the site search page, which (in theory at least) carries advertising, because that's the price of a free search from {major search engine}. To date they have never complained about not being allowed to see a page on which they are (in theory at least) placing ads of their choice. Interesting, huh?

JorgeV

9:01 am on Oct 29, 2019 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Thank you very everybody!