Noindex tag warning for the privacy page?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Noindex tag warning for the privacy page?

JS_Harris

2:53 am on Apr 28, 2021 (gmt 0)

Should a webmaster be worried if they start getting search console warnings, specifically "coverage issue detected" warnings, for their privacy page? the page in question has a noindex tag and combines the contact info with privacy policy. Does Google no longer recognize it as a policy page or?

Has anyone encountered this for the privacy page? With the upcoming Google update it's concerning, a little, maybe.

Robert Charlton

6:02 am on Apr 28, 2021 (gmt 0)

Why did you noindex the page?

I don't know about "worried", but it sounds like Google... which is a search engine... very clearly wants to be able to include that page in the serps if someone searches for it.

I'd remove the tag.

Also, are you running Google Adsense, or Google Analytics? I assume that would make it even more important that the Privacy Policy is visible.

lammert

8:37 am on Apr 28, 2021 (gmt 0)

As this page also contains the contact info for the site, hiding it may also affect site rankings if your site falls in the E-A-T or YMYL categories.

not2easy

11:23 am on Apr 28, 2021 (gmt 0)

Unless there are no links to the page on the site, that seems unusual. Why would a site want to have their privacy page turn up in a search? I have always noindexed such pages. They are easily accessible on the site but would make a poor landing page experience for 101% of visitors.

FranticFish

12:49 pm on Apr 28, 2021 (gmt 0)

There is a precedent for this: Google started giving warnings about robots.txt blocking CSS files at about the same time there was the very good case put forward for them rendering content - i.e. that Googlebot was Chrome. Just because the file isn't traditionally search-facing doesn't mean they don't want to assess the content and perhaps have it contribute to ratings (EAT, YMYL etc)

I too have always blocked terms / accessibility, privacy, cookies, GDPR etc pages as I figured they were small print for humans only. I'm going to change that across the board, although now that these pages could possibly be landing pages they'll have to be a bit more 'cute n fluffy' than they were :)

not2easy

1:51 pm on Apr 28, 2021 (gmt 0)

It's one thing to block robots from visiting a page or resources for a page, and another thing to noindex a page. I don't block bots from site pages or resources but they are noindexed. I thought that's what the console warnings in the OP were about.

saladtosser

2:00 pm on Apr 28, 2021 (gmt 0)

I always noindex these pages as most of it is the same jargon from many other sites, its hard to write an engaging and unique privacy policy that adds value to the web :)

FranticFish

5:29 pm on Apr 28, 2021 (gmt 0)

what the console warnings in the OP were about

My point was a broader one that warnings about not being able to index files are not a million miles removed from warnings about being denied access to a file.

It's one thing to block robots from visiting a page or resources for a page, and another thing to noindex a page

Yes, but I do find the ways that they 'overlap' each other interesting these days
- JM said not so long ago that in time noindex pages will see all their outbound links turn to nofollow
- Google's robots.txt guidance states that blocking a file won't necessarily keep it out the index
Of course, there's never been any suggestion that Google crawls through robots.txt blocked pages, so both methods ultimately 'zap' the content on the page and take any links on it out of the equation. But, for that reason, I don't see them as incredibly different in operation. The way nofollow has changed over time has changed noindex quite dramatically.

NickMNS

6:14 pm on Apr 28, 2021 (gmt 0)

Yes, but I do find the ways that they 'overlap' each other interesting these days

There is no overlap, these are two distinct directive, and have always been.

Noindex === Do not save the content of the page in the index
Robots.txt === Do not allow the crawler to crawl the page.

It is that simple.

Everything else described is the impact of these directives.

noindex pages will see all their outbound links turn to nofollow

That is not exactly what was said, but the basic idea is correct. Simply stated, if a page is not indexed then there is no record of the link on that page thus the link is not counted.

Google's robots.txt guidance states that blocking a file won't necessarily keep it out the index

You never told Google not to index the page, you only said don't crawl it.

Now things become confused when there is change of state.
In robots.txt example above, if the page was always blocked by the robots.txt file since it was published, then it shouldn't be in the index. But if it was not blocked and was indexed, then if you subsequently block it you arrive at the situation where it remains in the index. Also if you block a previously indexed page with robots.txt and add a noindex tag, Google will never see the noindex tag, since you told Google don't go there so again it will remain in the index.

Back to the "no-follow" link example, here again if the page was indexed and you change the tag to no-index, then the links on that page will eventually become "no-follow". The eventually, results from the fact that it takes time for Google to update their index, but once that happens there is no record of the links on that page and they are no longer counted towards page-rank.

The fact that Google warns you of the no-index tag, I believe is done as a courtesy. Google is simply asking "Are you sure you don't want me to show this?". If the answer is "no don't show it", then do nothing. If your answer were "yes show it" I'm sure you would be glad that Google let you know of potentially big mistake.

aristotle

6:57 pm on Apr 28, 2021 (gmt 0)

Usually these types of pages (privacy policy, etc) have little, if any, content that's relevant to to the main content of the site. This is why I always put noindex tags on them.

On my sites the main content is a group of articles about different aspects of an overall subject or theme. In my opinion these pages can help each others' rankings because they naturally contain many of the same keywords. Thus they reinforce each other, especially when there is strong internal linking.

A privacy policy page doesn't "fit in", and therefore if it were indexed, it might actually hurt the rankings of the main pages. So in my opinion, it's best to put a noindex tag on these types of pages.

No5needinput

9:05 pm on Apr 28, 2021 (gmt 0)

JM said not so long ago that in time noindex pages will see all their outbound links turn to nofollow

What about xml sitemaps noindexed via .htaccess?

<FilesMatch "\.(rss|xml)$">
Header set X-Robots-Tag "noindex"
</FilesMatch>

JS_Harris

10:20 pm on Apr 28, 2021 (gmt 0)

The privacy page receiving the warning only has a noindex meta tag, nothing in htaccess or on robots.txt

Also, are you running Google Adsense, or Google Analytics? I assume that would make it even more important that the Privacy Policy is visible.

Nope, neither, and I've always noindexed the policy stuff. Google can still access the page just like users can but it's near duplicate content to every other site at best so I prefer it not be a landing page. The words privacy policy appear on every page, doing a site search of my site will turn them up without problem.

The warning caught me off guard because nothing changed on it and that page has been like that for over 15 years.

No outbound links on the page, just the contact info and internal menu. I'm going to leave it as is, I don't think it's a problem but, well, it surprised me so I figured I'd ask if others are seeing it too.

It happened just after Google warned of an impending user experience core update.

aristotle

12:43 am on Apr 29, 2021 (gmt 0)

"coverage issue detected" warnings,

What is the "coverage issue" that was detected?

iamlost

4:45 am on Apr 29, 2021 (gmt 0)

I think it was a couple three years ago that Google Search Console was updated to include �coverage issues�.

I do know that almost immediately the SEO bloggeratti were explaining that the most common reason for the issue was noindexed content. And the solution, of course, was to remove that pesky noindex that �could affect the site/page SERP standing�.

Idjits.

Yes, there is a coverage issue: Google wants full unrestricted coverage and the site is denying that. Cry me a river.

I�ve had up to half my sites noindexed for over a decade with no noticeable impact on Google (or other SE) results. Of course, YRMV.

That said, while my �privacy statement/policy� and �contact� (webform only as no B&M presence) pages have always been noindexed, �about� has not for marketing purposes.

JorgeV

10:43 am on May 1, 2021 (gmt 0)

Hello,

As this page also contains the contact info for the site, hiding it may also affect site rankings if your site falls in the E-A-T or YMYL categories.

Noindex has nothing to do with "hiding" . First of all, to know a page has a "noindex" tag, the crawler had to visit it. So the crawler knows the page exists, knows its content , it simply does not include the page into the SERP. So, Google, and other search engines, can perfectly consider the content of the page, to evaluate the authority of the site, this is unrelated to noindexing it.

Robert Charlton

12:15 am on May 11, 2021 (gmt 0)

This both to JS_Harris and to JorgeV....

My emphasis added...

So the crawler knows the page exists, knows its content, it simply does not include the page into the SERP. So, Google, and other search engines, can perfectly consider the content of the page, to evaluate the authority of the site, this is unrelated to noindexing it.

JorgeV, I'm thinking that you may be confusing noindex with nofollow here. When there's a noindex tag, considering that noindex is supposed to be the most private kind of page wrt search that you can have online, I don't believe that Google just "simply" goes ahead and evaluates the page but doesn't show it.

I don't have an inside line to Larry Page here, but I'm guessing that, with "noindex" anyway, Google doesn't parse the content of the page.

Your characterization may well now be true for "nofollow"... where they've given everybody many months notice that they were going to transgress their "not even for discovery" statement they'd made way back and change the nature of what they can discover... but "noindex" is a bit different.

That said, John Mueller did announce that they've been changing the treatment of "noindex" as well... but more like making a longtime noindexed page just drop out of spidering, except occasionally, the way they treat a 404. See our forum discussion here...

Google Will Eventually Stop Following Links on Noindex Pages
Dec, 2017
https://www.webmasterworld.com/google/4881752.htm [webmasterworld.com]

There's sort of a paradox here, which I discuss a little bit near the end of the above thread. Note that in the OP of the thread, this comment...

John Mueller said, Google will eventually stop following links from a page that has noindex on it.

By "eventually", I should note engine's intent in using that word (British usage) is that if the page is noindexed for long enough, Google will stopped following its links. This doesn't mean that "sometime in the future, Google is planning" to stop following the links. This does suggest that Google has parsed the page enough to see that there are links... and it's also implicit with the use of the tag that Google has parsed the page enough to see that there is a noindex robots tag attribute in the head section.

My guess... and it is a guess... is that, for all noindexed pages (which weren't blocked by robots.txt), Google looked at the page enough to see the robots meta tag, and then also looked for links in the html, but did not make note of the text content. Again, a guess.

Robots "noindex" used to default to robots "noindex,follow". Now, the treatment is such that, effectively, the default is "noindex,nofollow".

Why do I take the trouble to go into all this? Because of my thought above about noindexed pages that "they don't parse the content of the page".. Here, I feel that Google does want to evaluate the content of a privacy page enough that they know, without manual inspection, whether it complies with their guidelines, so they do need to parse the content.

That doesn't mean they want to rank the page... that's very unlikely. But they do... I'm assuming... want someone to be able to do a "navigation search", for a query like...

[company-name privacy page]

...and be able to find it in search, and then to look within it if they wanted to. Ditto for a site operator search that might include keywords. They don't want to block those.

With "nofollow", btw, I should emphasize that I very much agree with you. I think Google wants to be able to crawl the whole web, as it might appear in the serps, and to run tests to see whether or not, and how much, "nofollow" is affecting the overall shape of the web.

But, that's not true about noindex, which is what JS_Harris says he's doing with the privacy page.

Again, my thoughts about whether Google wants the privacy page to show up in search are educated conjecture.

JS_Harris

4:58 pm on May 14, 2021 (gmt 0)

*udate* - Turns out the privacy policy page, which has a noindex meta tag, snuck into the xml sitemap submited to search console. If you add a page in a sitemap that has a noindex directive search console will throw an error.

So which is better?
- noindex the privacy policy and leave it out of your sitemap
- Include it in the sitemap and allow it to be indexed

I don't think this is a particularly important choice, however, I can see a situation where search console may not know about the pricacy policy and rank the site accordingly.

not2easy

5:56 pm on May 14, 2021 (gmt 0)

If you have links throughout the site to your privacy page such as those typical in footer site links, Google has visited your privacy page and is 'aware' of what it says but unless there is a reason to want it indexed I would simply remove it from the sitemap and leave it as noindexed.

Google tells us that the sitemap is intended to list those pages you would like to have indexed so you can see why listing it there might have caused confusion.

Just curious, I haven't made the effort, but if you search for any site's privacy page, how often might you find it indexed?

See Mods note below re specifics which ToS does not permit discussing publicy on the forum.

[edited by: Robert_Charlton at 10:39 pm (utc) on May 14, 2021]
[edit reason] added mod's note - see below [/edit]

Robert Charlton

10:31 pm on May 14, 2021 (gmt 0)

Just curious, I haven't made the effort, but if you search for any site's privacy page, how often might you find it indexed?

I just did make a very quick effort, on several major sites, and most are returned in search. On some sites... though they link to their policies prominently, and language used is widely indexed... it's clear that the language is boilerplate, used by many, but does not show up as a quoted search for a site search of their domains.

This prompts me to add a note here, mod's hat on.

Mod's note: In general, in this forum, we do not discuss the specifics of other sites, and that includes calling attention to the specifics of how specific sites implement their privacy policies. Members are free to generalize, but not to refer to specific sites.

The one exception I will make is Google's privacy policy, which is here, and which has been crawled and indexed. You might even say that Google essentially has turned it into a marketing document....

Google Privacy Policy - English version
https://policies.google.com/privacy?hl=en-US [policies.google.com]

The document is an interesting model. I cannot, btw, find any specific guidelines on Google for privacy policies in general. There is a Developer's article on Google for privacy policies regarding Google Actions, which is not, though, a general website guideline.