Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

Canonical followed by NoIndex



7:16 pm on Mar 2, 2013 (gmt 0)

10+ Year Member


We are currently dealing with our duplicate content issues (currently have many of our good pages in 950 penalty.

We have a situation whereby we want to move our pages into a much more organised structure.

So, basically we have:

domain.com/a.html ---> canonical ---> domain.com/b.html (where b.html has NOINDEX tag on the page)

My question is, will Google follow our instructions and canonical to b.html (therefore removing a.html) then honour the NOINDEX on b.html and remove this page too. We cannot simply NOINDEX a.html due to technical reasons.



5:35 am on Mar 3, 2013 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

It sounds like a unique situation to me. I'd assume a will eventually disappear from the index, but it may be an "edge case" that Google's logic doesn't deal with very well.


4:55 pm on Mar 3, 2013 (gmt 0)

Not sure I "get" the cannot noindex A due to technical reasons?

Sounds like you need someone more "technically proficient" internally to me, cause you can: Noindex A via X-Robots-Tag header in the httpd.conf or .htaccess file or on the pages via server-side scripting header or you can 301 redirect A to B for external requests only in the httpd.conf or .htaccess file or server-side scripting language or 301 redirect all requests for A except for a specific user-agent string via httpd.conf or .htaccess or server-side scripting language which allows access to site staff via custom user-agent.

I'm almost positive there's a way to get the noindex directive across for the pages or at least redirect the pages for certain request only if they need to be present and accessible for some internal reason.

I have one dynamic site I work on that saves a static version for public access where the dynamic version to be saved is only accessible via internal request or custom user-agent string on a specific subdomain. To make it work the "bot" that does the saving of HTML pages either requests the pages internally via full file path request to bypass the .htaccess that does the redirecting or it requests the pages externally using the custom user-agent string depending on what the exact assembly process for the pages is.

TL;DR Version:
If you have enough access to put a <link rel> in the <head> you have enough access to redirect or noindex in one way or another, cause in a "worst case" you could 0 sec meta refresh page A to page B and to the best of my knowledge they are still treated almost exactly the same as 301s.


6:08 pm on Mar 3, 2013 (gmt 0)

10+ Year Member

Hi, Let me re-word it, we can noindex a.htm but it would mean including pages we want to keep (this solution is being done across tens of thousands of pages) - i simply want to find out if our "canonical to a no indexed" page will remove both versions... Cheers


6:32 pm on Mar 3, 2013 (gmt 0)

That's something you'll have to test, cause canonical really says "B is the preferred version", but if you're Google and conservative and the canonical points to a noindex version of the page, would you be more likely to think the webmaster is "shooting themself in the foot and show A" or "trust the canonical"?

Personally, I think in that situation I'd err on the side of caution and show A.

The sites I work on are tens of thousands of pages too, so I definitely understand the challenges of doing it another way and redirecting or noindexing specific pages, but I also understand there is almost always another way.

You could even do something based on page length where you "compile the page" with concatenation ($page.="") and a robots noindex in place then check the length of the compiled page prior to output and if it's over N characters str_replace() the robots meta tag with "" or even str_replace() the "noindex" with "index", which would be a more sure way to "pull" short pages from the index, but leave longer pages in.

Trust me, I know it might be a PITA to figure out a different way, but there's almost always another way (actually always in my experience) and I wouldn't trust the canonical to solve the issue permanently, cause even if you test and it works today the way Google handles it could change tomorrow, so you could be back in the same situation until you find another way, so I'd start with "another way" personally. Of course I am crazy about only wanting to "fix" things once and be done with it.


7:15 pm on Mar 3, 2013 (gmt 0)

In thinking about it, if it's a "page text length thing" I would probably get fairly "granular" with the length of the actual text on the page and do something like:

$page.="template stuff";
$page_text.="actual page text stuff";
$page.=$page_text; // Where the text goes
$page.="more template stuff";

if(strlen(strip_tags($page_text)) > N) {

echo $page;


7:35 pm on Mar 3, 2013 (gmt 0)

10+ Year Member

Thanks OI, I think we'll suck and see (we are using canonical because 301 caused a major leakage when using relative URLs), Google indexes us very quickly so I am sure we will soon see what happens when we check the site ops later this week as we are culling nearly 85% of the website in an attempt to get out of this 950 hell :)

Featured Threads

Hot Threads This Week

Hot Threads This Month