| 4:55 pm on Nov 10, 2011 (gmt 0)|
We've had tons of pages noindexed for 6-7 months. Many of them still appear in Google's index.
The only thing you may want to check is to make sure that the noindexed pages are accessible. If the crawler can't access the page in the first place, it won't know that the noindex tag has been added.
| 5:02 pm on Nov 10, 2011 (gmt 0)|
Yea, what's your robots.txt look like?
| 5:30 pm on Nov 10, 2011 (gmt 0)|
@ackk and @netmeg; thanks for the feedback. I don't have a robots.txt on site... do you think we should add one?
| 7:58 pm on Nov 10, 2011 (gmt 0)|
See Google employee JohnMu's reply in this thread:
Creating a legitimate, no follow , 2nd mirror site with no penalty to our main site
| 8:33 pm on Nov 10, 2011 (gmt 0)|
there was a conversation about this ages ago to do with baidu, i think, and one of the things that came out of it was that noindex does not do what we think it does.
most search engines will remove the pages from their index if you noindex them, so we have come to believe that's what it does. but technically, it isn't.
it just tells them that they can't crawl it. anything already in their index can stay there. and if they can get the info through a third-party, then that's okay too.
if a third party links to your page, then they can grab the URL and title, or whatever, from that, and there's nothing you can do about it.
| 10:03 pm on Nov 10, 2011 (gmt 0)|
@ potential Geek thanks for the link..
@ londrum thanks for the clarification on the noindex key information
In my case I can't 301 the pages, because we use it for testing.... I think I will look into rel="canonical" - that should do the trick. I actually didn't realize you could do a rel="cononical" on a completely different domain. good to know!
| 10:59 pm on Nov 10, 2011 (gmt 0)|
I see you use large caps in your code. This is smth i was wondering about. Is robots noindex case sensitive? I'm starting to suspect it is, but i don't have any real proof yet.
| 10:42 am on Nov 11, 2011 (gmt 0)|
I'm assuming that there are too many URLs for you to manually remove them using WMT?
| 2:09 pm on Nov 11, 2011 (gmt 0)|
check the cache dates, its not impossible that they was cached before you noindexed the pages and if the site is not heavily linked to, tis possible the crawler hasn't been back to recrawl and therefore de index the pages
| 2:20 pm on Nov 11, 2011 (gmt 0)|
+1 on what londrum says and plus my $0.02 :)
A couple of months ago, we tried noindexing and blocking pages via robots.txt. Unfortunately, Google was still listing these pages in their SERPs but without a snippet below the title.
The only way to get them out of the index was via webmaster tools and even now, there are errors about these pages being blocked by robots.txt *sigh*
| 2:26 pm on Nov 11, 2011 (gmt 0)|
it makes you wonder why people are told to noindex low quality pages, as a way to beat panda. surely it shouldn't have any effect, if google can keep the pages in the index?
| 3:38 pm on Nov 11, 2011 (gmt 0)|
Personally, I have never had a situation (over hundreds of thousands of URLs in aggregate) where a NOINDEXed URL showed up in the index... UNLESS I or someone else had made a mistake and blocked crawling with robots.txt. If you do that, then G can't even get in to *see* the NOINDEX.
I'm not saying it can't happen, but I've never seen it happen without some logical explanation for it.
| 4:04 pm on Nov 11, 2011 (gmt 0)|
|A couple of months ago, we tried noindexing and blocking pages via robots.txt. Unfortunately, Google was still listing these pages in their SERPs but without a snippet below the title. |
What you describe with the URI only listings is the default robots.txt behavior. The META (or X-Robots-Tag) NoIndex is at the document level. If you've Disallowed the bot from accessing the documents that contain the NoIndex directive, it will never see it, that's why your pages are still showing in the index with a URI only listing.
Remove the robots.txt directives and let the document level NoIndex do its thing. It works just as it says on the tin. I've been using it for years and I've never, ever, seen any of those documents appear in the index - ever.
| 5:48 pm on Nov 11, 2011 (gmt 0)|
i managed to find that thread from ages ago
the reply from skrenta is the interesting one
| 11:32 pm on Nov 14, 2011 (gmt 0)|
|If you've Disallowed the bot from accessing the documents that contain the NoIndex directive, it will never see it, that's why your pages are still showing in the index with a URI only listing. |
Sorry for being unclear. I first tried NOINDEX, and then tried the robots.txt route. Neither worked like I thought they should. The WMT removal url/directory worked (past tense). After 3-4 weeks, they results with just urls or titles without snippets are showing back up in the results. Note that they pages are over 10 years old and have several decent backlinks.
Now, even after noindex, robots.txt blocking directories and a directory removal via webmaster tools, I have given up and decided to send a 404 Not Found header to all requests. Now I have a truckload of complaints about the 404s mixed with robots.txt blockage in webmaster tools. *sigh*
| 1:46 am on Nov 15, 2011 (gmt 0)|
The meta robots noindex should fix the problem, but it does take quite a while (sometimes more than 6 months).