homepage Welcome to WebmasterWorld Guest from 54.226.18.74
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
NoIndex not NoIndexing. SO FRUSTRATING!
Lenny2




msg:4385588
 4:42 pm on Nov 10, 2011 (gmt 0)

Ladies and Gents,

I have an application on a server... it's the main website. Then I have a mirror application on another server... it's our testing grounds. back in the hay-day it didn't matter, we kept both active... We had the set-up on two servers so that we could bounce users from one server (in texas) to the other server (in Washington) when traffic got heavy. At the time the application would slow down when there was a lot of traffic. Anyway it never really worked... round-robining as I guess it's called is extremely complicated and good people couldn't/didn't feel like figuring it out and it was less expensive for us to just fix the application so that it didn't slow down with lots of traffic.

Anyway, we kept the other server and the mirror site... as a testing ground. We no-indexed all the pages. My question is, when I do a search for the bck01.site-name.com, even though we no-indexed/nofollowed everything a year ago... I'm still finding some results.

As we were SLAMMED down by the Panda, I'm wondering if perhaps Google is ignoring the noindex and treating this bck01.site-name.com as duplicate content and penalizing us for it.

Here is the code: <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

anybody have any thoughts on why Google would ignore the noindex and index the pages anyway? Does anybody have any insight as to whether or not Google would treat the mirror site as duplicate content? Are there better solutions to using a testing server/site for big site changes?

 

ackk




msg:4385593
 4:55 pm on Nov 10, 2011 (gmt 0)

We've had tons of pages noindexed for 6-7 months. Many of them still appear in Google's index.

The only thing you may want to check is to make sure that the noindexed pages are accessible. If the crawler can't access the page in the first place, it won't know that the noindex tag has been added.

netmeg




msg:4385596
 5:02 pm on Nov 10, 2011 (gmt 0)

Yea, what's your robots.txt look like?

Lenny2




msg:4385613
 5:30 pm on Nov 10, 2011 (gmt 0)

@ackk and @netmeg; thanks for the feedback. I don't have a robots.txt on site... do you think we should add one?

potentialgeek




msg:4385658
 7:58 pm on Nov 10, 2011 (gmt 0)

See Google employee JohnMu's reply in this thread:

Creating a legitimate, no follow , 2nd mirror site with no penalty to our main site
[google.com...]

londrum




msg:4385668
 8:33 pm on Nov 10, 2011 (gmt 0)

there was a conversation about this ages ago to do with baidu, i think, and one of the things that came out of it was that noindex does not do what we think it does.

most search engines will remove the pages from their index if you noindex them, so we have come to believe that's what it does. but technically, it isn't.

it just tells them that they can't crawl it. anything already in their index can stay there. and if they can get the info through a third-party, then that's okay too.

if a third party links to your page, then they can grab the URL and title, or whatever, from that, and there's nothing you can do about it.

Lenny2




msg:4385686
 10:03 pm on Nov 10, 2011 (gmt 0)

@ potential Geek thanks for the link..
@ londrum thanks for the clarification on the noindex key information

In my case I can't 301 the pages, because we use it for testing.... I think I will look into rel="canonical" - that should do the trick. I actually didn't realize you could do a rel="cononical" on a completely different domain. good to know!

zerillos




msg:4385698
 10:59 pm on Nov 10, 2011 (gmt 0)

I see you use large caps in your code. This is smth i was wondering about. Is robots noindex case sensitive? I'm starting to suspect it is, but i don't have any real proof yet.

BenFox




msg:4385849
 10:42 am on Nov 11, 2011 (gmt 0)

I'm assuming that there are too many URLs for you to manually remove them using WMT?

scooterdude




msg:4385895
 2:09 pm on Nov 11, 2011 (gmt 0)

check the cache dates, its not impossible that they was cached before you noindexed the pages and if the site is not heavily linked to, tis possible the crawler hasn't been back to recrawl and therefore de index the pages

Andem




msg:4385899
 2:20 pm on Nov 11, 2011 (gmt 0)

+1 on what londrum says and plus my $0.02 :)

A couple of months ago, we tried noindexing and blocking pages via robots.txt. Unfortunately, Google was still listing these pages in their SERPs but without a snippet below the title.

The only way to get them out of the index was via webmaster tools and even now, there are errors about these pages being blocked by robots.txt *sigh*

londrum




msg:4385903
 2:26 pm on Nov 11, 2011 (gmt 0)

it makes you wonder why people are told to noindex low quality pages, as a way to beat panda. surely it shouldn't have any effect, if google can keep the pages in the index?

netmeg




msg:4385926
 3:38 pm on Nov 11, 2011 (gmt 0)

Personally, I have never had a situation (over hundreds of thousands of URLs in aggregate) where a NOINDEXed URL showed up in the index... UNLESS I or someone else had made a mistake and blocked crawling with robots.txt. If you do that, then G can't even get in to *see* the NOINDEX.

I'm not saying it can't happen, but I've never seen it happen without some logical explanation for it.

pageoneresults




msg:4385941
 4:04 pm on Nov 11, 2011 (gmt 0)

A couple of months ago, we tried noindexing and blocking pages via robots.txt. Unfortunately, Google was still listing these pages in their SERPs but without a snippet below the title.


What you describe with the URI only listings is the default robots.txt behavior. The META (or X-Robots-Tag) NoIndex is at the document level. If you've Disallowed the bot from accessing the documents that contain the NoIndex directive, it will never see it, that's why your pages are still showing in the index with a URI only listing.

Remove the robots.txt directives and let the document level NoIndex do its thing. It works just as it says on the tin. I've been using it for years and I've never, ever, seen any of those documents appear in the index - ever.

londrum




msg:4385974
 5:48 pm on Nov 11, 2011 (gmt 0)

i managed to find that thread from ages ago
[webmasterworld.com ]

the reply from skrenta is the interesting one

Andem




msg:4386938
 11:32 pm on Nov 14, 2011 (gmt 0)

If you've Disallowed the bot from accessing the documents that contain the NoIndex directive, it will never see it, that's why your pages are still showing in the index with a URI only listing.


Sorry for being unclear. I first tried NOINDEX, and then tried the robots.txt route. Neither worked like I thought they should. The WMT removal url/directory worked (past tense). After 3-4 weeks, they results with just urls or titles without snippets are showing back up in the results. Note that they pages are over 10 years old and have several decent backlinks.

Now, even after noindex, robots.txt blocking directories and a directory removal via webmaster tools, I have given up and decided to send a 404 Not Found header to all requests. Now I have a truckload of complaints about the 404s mixed with robots.txt blockage in webmaster tools. *sigh*

g1smd




msg:4386986
 1:46 am on Nov 15, 2011 (gmt 0)

The meta robots noindex should fix the problem, but it does take quite a while (sometimes more than 6 months).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved