Forum Moderators: Robert Charlton & goodroi
I've poured through the old threads on this, and did not find an answer to my question, and am a bit confused. I just reviewed the Google's SEO Starter Guide, and this made me even more confused. Anyway, here goes:
Let's assume that I have 2 files in a particular folder of my website: let's call them Index1.htm and Index2.htm.
Index1.htm is the root page for a website, and it's "robots" meta tag is defined as "index, follow". This file IS found in my website sitemap.
Index2.htm is an old copy of Index1.htm that has been hanging around, and its "robots" meta tag is defined as "noindex, nofollow". This file is NOT found in my website sitemap.
Now, suppose that somehow Google spiders Index2.htm. It should not be indexed at all (by virtue of its local "noindex" tag), but also, should not follow any of its subordinate links (by virtue of its local "nofollow" tag).
Assuming that Index1.htm and Index2.htm have identical subordinate page links, could the spidering of Index2.htm result in any of the following:
1) Removal of any link juice to subordinate pages, that Index1.htm would
provide ?
2) Removal of any subordinate pages from Google's Index ?
Obviously, I can remove the extra page, Index2.htm, from the folder. My questions above assume that the file has remained in the folder.
Thank you in advance. I hope the above is clear.
You can expedite this process by using the URL Removal Request tool. I love the URL Removal Request - I can have URLs out of the SERPs in 2-4 hours, and any associated problems thus fixed in 0-5 days.
Could the spidering of Index2.htm result in any of the following:
1) Removal of any link juice to subordinate pages, that Index1.htm would
provide ?
2) Removal of any subordinate pages from Google's Index ?
Given Google's less than perfect interpretation of dupe content and what to do with it, it is not possible to state with certainty that you wouldn't have problems IF the page were spidered.
But as others have said, whilst Google doesn't always seem to obey robots.txt I've never seen it ignore the metatag noindex command, so it's a moot point.
(2) The removal of index2.htm by using "noindex,nofollow" will not automatically cause any subordinate pages to be removed from the index. However, if index2.htm were providing the only link to a subordinate page, that subordinate page would likely fall out of the index eventually. Given your scenario, this seems an extremely unlikely event.